Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Director of Cloud, DevOps, and SRE: Emphasis on Execution
We are looking for a Director of Cloud, DevOps, and Site Reliability Engineering (SRE) who will be a hands-on, execution-focused leader responsible for driving the technical strategy, implementation, and continuous operation of our cloud infrastructure and services. This role demands a pragmatic leader capable of translating strategic vision into tangible, high-quality, and scalable results.
Key Responsibilities and Execution Focus
The primary responsibility of the Director is to execute on the cloud, DevOps, and SRE strategy, ensuring immediate and long-term operational excellence.
1. Delivery and Implementation (Execution)
- Lead the migration and deployment of core business applications and services to cloud platforms (e.g., AWS, Azure, GCP), ensuring projects are delivered on time, within budget, and meet defined non-functional requirements (security, scalability, performance).
- Direct the implementation of Continuous Integration/Continuous Delivery (CI/CD) pipelines across all engineering teams, focusing on fully automated, reliable, and repeatable deployments.
- Drive Infrastructure as Code (IaC) adoption (e.g., Terraform, Ansible), establishing a 100% code-driven infrastructure environment with clear governance and review processes.
- Establish and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services, immediately implementing monitoring and alerting to measure against these targets.
2. Operational Excellence and Reliability (SRE Execution)
- Direct the SRE function to minimize operational toil by developing and deploying automation tools and services for routine tasks, incident response, and capacity management.
- Lead major incident response and post-mortem processes, ensuring effective root cause analysis and implementing immediate, execution-driven solutions to prevent recurrence.
- Execute a robust cost management strategy for cloud resources, implementing FinOps practices to optimize spending without compromising reliability or performance.
- Own the security posture of the cloud environment, working hands-on with security teams to implement and automate compliance and security controls (DevSecOps).
3. Team Leadership and Mentorship (Pragmatic Leadership)
- Recruit, develop, and mentor a high-performing team of Cloud Engineers, DevOps Engineers, and SREs, setting clear, execution-focused goals and metrics.
- Foster a culture of ownership, accountability, and execution within the team, emphasizing rapid iteration, collaboration, and bias for action.
- Act as a hands-on leader by actively participating in design reviews, critical deployments, and troubleshooting efforts.
Qualifications and Requirements
Required Skills & Experience (Execution-Driven)
- Minimum of 10 years of progressive experience in infrastructure, operations, or software engineering, with at least 3 years in a Director or Senior Management role overseeing Cloud, DevOps, or SRE teams.
- Deep, demonstrable expertise in a major cloud provider (AWS, Azure, and GCP), including advanced networking, security services, and serverless architectures. Certification at the Professional/Specialty level is a plus.
- Extensive experience implementing and scaling IaC and configuration management tools (e.g., Terraform, Ansible, SaltStack) in a production environment.
- Proven track record of establishing and running SRE practices (SLOs, error budgets, toil reduction) with tangible results in improving service reliability and availability.
- Proficiency in modern scripting/programming languages (e.g., Python, Go, Bash) for automation and tool development.
Education
- Bachelor’s degree in Computer Science, Engineering, or a related field; equivalent practical experience is accepted.
Key Skills
Ranked by relevanceReady to apply?
Join Exela Technologies and take your career to the next level!
Application takes less than 5 minutes

