Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are looking for a skilled DevOps/SRE professional to drive the stability, scalability, and reliability of our infrastructure across multiple business units. In this role, you will manage large-scale container platforms, build automation-driven infrastructure solutions, and ensure high availability of mission-critical services.
Responsibilities
Cluster Operations & Management
- Manage and maintain container orchestration environments (Kubernetes, Docker) and open-source service clusters (Kafka, Redis, Elasticsearch).
- Ensure high performance, scalability, and reliability of distributed systems across multiple business domains.
Infrastructure Platform Engineering
- Design, build, and enhance internal infrastructure platforms supporting operations teams.
- Develop and maintain CI/CD pipelines, monitoring/alerting systems, and centralized logging platforms.
- Drive standardization, automation, and platform modernization initiatives.
High Availability & Reliability
- Ensure high uptime for production services through proactive monitoring, incident response, and root-cause analysis.
- Continuously optimize system architecture, deployment strategies, and operational workflows.
- Implement and maintain SLA/SLO frameworks and core reliability engineering practices.
Automation & Process Improvement
- Lead the development of automated tooling to reduce manual operations and improve system efficiency.
- Build self-service tools and workflows to empower engineering teams.
- Establish and promote best practices for Infrastructure-as-Code (IaC), configuration management, and environment consistency.
Skills and Qualifications
- 2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- Experience with major cloud providers (AWS, GCP, or Azure) is highly valued.
- Strong knowledge and hands-on experience with Kubernetes, Docker, and production-grade container environments.
- Experience managing CI/CD pipelines and modern deployment tooling.
- Familiarity with infrastructure components (Nginx, MySQL, Redis, Kafka, Elasticsearch).
Key Skills
Ranked by relevanceReady to apply?
Join NEXadept and take your career to the next level!
Application takes less than 5 minutes

