Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
About the Opportunity
Our client, a rapidly growing fintech innovator specializing in global B2B payment solutions, is seeking a Lead DevOps Engineer to drive architectural, operational, and team leadership across a high-impact DevOps function. The role combines hands-on technical engineering with strategic ownership of AWS cloud infrastructure, secure networking, observability, and cost-efficient operations.
Job Responsibilities
- Guide and mentor the DevOps team, including direct management of two engineers, fostering a culture of reliability, security, and continuous improvement
- Partner with executive leadership on DevOps strategy, roadmaps, and organizational alignment across engineering initiatives
- Architect, implement, and operate AWS infrastructure leveraging Terraform, CloudFormation, or CDK to ensure scalability, resilience, and cost efficiency
- Design and manage secure networking patterns, including VPCs, Transit Gateways, load balancers, DNS, API gateway, and hybrid connectivity
- Build and maintain end-to-end CI/CD pipelines utilizing GitHub and GitHub Actions for automated deployments across multiple teams
- Operate and optimize Kubernetes (EKS and upstream), including lifecycle management, networking, RBAC, autoscaling, and workload security
- Develop and manage containerized environments with Docker, creating standardized images, build pipelines, and enforcing runtime best practices
- Implement and oversee cloud security controls including IAM architecture, secrets management, encryption, and compliance guardrails
- Lead observability strategy using Datadog for comprehensive metrics, logs, traces, dashboards, and alerts across distributed systems
- Drive incident response, root-cause analysis, and long-term reliability improvements
- Develop internal tools and automation to streamline operational processes and enhance developer experience
- Lead efforts on cloud cost management, monitoring AWS spend, optimizing resource usage, and collaborating with finance for cost forecasting
- Maintain high-quality documentation for architecture, operational procedures, and infrastructure standards
- Create and manage runbooks for incident handling, on-call workflows, and operational tasks to ensure reliable production support
- Oversee disaster recovery planning and execution, including RTO/RPO targets and cross-team exercises
- Manage backup and restore processes for critical systems, ensuring integrity, compliance, and regular validation
- Promote and integrate AI-assisted engineering practices to enhance automation, code quality, and team efficiency
- Establish DevOps and SRE best practices at scale, influencing architectural decisions across the organization
Job Requirements
- Extensive professional experience in senior or lead DevOps/SRE roles, including direct mentorship or management of engineering staff
- Advanced expertise designing and operating AWS production environments (EC2, EKS, ECS, Lambda, RDS, DynamoDB, S3, IAM, VPC, CloudFront)
- Strong foundation in networking fundamentals: TCP/IP, DNS, routing, load balancing, VPNs, firewalls, and distributed systems communication
- Deep knowledge of cloud security: IAM, least-privilege access, encryption, secrets management, vulnerability response, and incident handling
- Hands-on experience with Datadog for monitoring, metrics, logs, tracing, and alerting
- Robust proficiency with Docker container lifecycle, including build standards, runtime, and orchestration
- Demonstrated production experience integrating and managing Kubernetes (EKS or upstream clusters)
- Expertise in CI/CD automation using GitHub and GitHub Actions
- Strong scripting or programming skills in Python, Go, or Bash
- Experience implementing infrastructure as code, preferably with Terraform
- Familiarity with AI-assisted development tools and workflows to improve engineering productivity
- Proven ability to lead complex technical initiatives, influence architectural decisions, and drive cross-team collaboration
Preferred:
- Experience with service mesh technologies (e.g., Istio, Linkerd) or zero-trust networking models
- SRE practice knowledge: SLOs, error budgets, chaos engineering, capacity planning
- Prior exposure to distributed systems, high-throughput pipelines, or data platform engineering
- AWS professional or security certifications
- Track record of organizational adoption of AI-powered engineering tools
Key Skills
Ranked by relevanceReady to apply?
Join Atlas Search and take your career to the next level!
Application takes less than 5 minutes

