Atlas Search
DevOps Manager
Atlas SearchUnited States3 days ago
Full-timeInformation Technology

About the Opportunity

Our client, a rapidly growing fintech innovator specializing in global B2B payment solutions, is seeking a Lead DevOps Engineer to drive architectural, operational, and team leadership across a high-impact DevOps function. The role combines hands-on technical engineering with strategic ownership of AWS cloud infrastructure, secure networking, observability, and cost-efficient operations.


Job Responsibilities

  • Guide and mentor the DevOps team, including direct management of two engineers, fostering a culture of reliability, security, and continuous improvement
  • Partner with executive leadership on DevOps strategy, roadmaps, and organizational alignment across engineering initiatives
  • Architect, implement, and operate AWS infrastructure leveraging Terraform, CloudFormation, or CDK to ensure scalability, resilience, and cost efficiency
  • Design and manage secure networking patterns, including VPCs, Transit Gateways, load balancers, DNS, API gateway, and hybrid connectivity
  • Build and maintain end-to-end CI/CD pipelines utilizing GitHub and GitHub Actions for automated deployments across multiple teams
  • Operate and optimize Kubernetes (EKS and upstream), including lifecycle management, networking, RBAC, autoscaling, and workload security
  • Develop and manage containerized environments with Docker, creating standardized images, build pipelines, and enforcing runtime best practices
  • Implement and oversee cloud security controls including IAM architecture, secrets management, encryption, and compliance guardrails
  • Lead observability strategy using Datadog for comprehensive metrics, logs, traces, dashboards, and alerts across distributed systems
  • Drive incident response, root-cause analysis, and long-term reliability improvements
  • Develop internal tools and automation to streamline operational processes and enhance developer experience
  • Lead efforts on cloud cost management, monitoring AWS spend, optimizing resource usage, and collaborating with finance for cost forecasting
  • Maintain high-quality documentation for architecture, operational procedures, and infrastructure standards
  • Create and manage runbooks for incident handling, on-call workflows, and operational tasks to ensure reliable production support
  • Oversee disaster recovery planning and execution, including RTO/RPO targets and cross-team exercises
  • Manage backup and restore processes for critical systems, ensuring integrity, compliance, and regular validation
  • Promote and integrate AI-assisted engineering practices to enhance automation, code quality, and team efficiency
  • Establish DevOps and SRE best practices at scale, influencing architectural decisions across the organization

Job Requirements

  • Extensive professional experience in senior or lead DevOps/SRE roles, including direct mentorship or management of engineering staff
  • Advanced expertise designing and operating AWS production environments (EC2, EKS, ECS, Lambda, RDS, DynamoDB, S3, IAM, VPC, CloudFront)
  • Strong foundation in networking fundamentals: TCP/IP, DNS, routing, load balancing, VPNs, firewalls, and distributed systems communication
  • Deep knowledge of cloud security: IAM, least-privilege access, encryption, secrets management, vulnerability response, and incident handling
  • Hands-on experience with Datadog for monitoring, metrics, logs, tracing, and alerting
  • Robust proficiency with Docker container lifecycle, including build standards, runtime, and orchestration
  • Demonstrated production experience integrating and managing Kubernetes (EKS or upstream clusters)
  • Expertise in CI/CD automation using GitHub and GitHub Actions
  • Strong scripting or programming skills in Python, Go, or Bash
  • Experience implementing infrastructure as code, preferably with Terraform
  • Familiarity with AI-assisted development tools and workflows to improve engineering productivity
  • Proven ability to lead complex technical initiatives, influence architectural decisions, and drive cross-team collaboration

Preferred:

  • Experience with service mesh technologies (e.g., Istio, Linkerd) or zero-trust networking models
  • SRE practice knowledge: SLOs, error budgets, chaos engineering, capacity planning
  • Prior exposure to distributed systems, high-throughput pipelines, or data platform engineering
  • AWS professional or security certifications
  • Track record of organizational adoption of AI-powered engineering tools

Key Skills

Ranked by relevance