Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
This is a high-impact role at the intersection of software engineering and cloud operations, focused on building and maintaining resilient, large-scale infrastructure for real-time communication systems. You will design, automate, and optimize cloud-native environments that support mission-critical connectivity under strict latency and reliability constraints. The position combines hands-on coding with deep operational ownership, empowering you to shape infrastructure strategy while improving developer productivity. Working in a remote-first, highly technical environment, you’ll collaborate across engineering teams to ensure scalability, security, and performance. If you thrive on solving distributed systems challenges and building production-grade reliability tooling, this role offers both ownership and influence.
Accountabilities
- Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
- Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
- Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
- Leading incident response processes, conducting root cause analyses, and creating automated runbooks to reduce MTTR.
- Architecting and maintaining CI/CD pipelines for backend services, mobile applications, and IoT firmware across cloud and on-prem environments.
- Implementing comprehensive observability using OpenTelemetry, distributed tracing, metrics exporters, and alerting systems.
- Managing data services such as PostgreSQL (RDS), Redis/ElastiCache, SQS, and networking components (ALB/NLB, VPC, IAM).
- Enforcing strong security standards, including IAM policies, encryption, secrets management, vulnerability management, and compliance auditing.
The ideal candidate is both a strong software engineer and an experienced platform reliability expert. Key qualifications include:
- 7+ years of experience in SRE, DevOps, or Platform Engineering roles with daily hands-on coding responsibilities.
- Proficiency in at least one backend language (TypeScript/ Node.js , Python, or Go) for developing automation tools, internal services, and reliability frameworks.
- Deep expertise in AWS services (ECS, EKS, RDS, ElastiCache, SQS, VPC, IAM, CloudWatch).
- Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi), including modular design and state management.
- Proven experience designing and maintaining CI/CD pipelines in both cloud and on-prem environments.
- Solid understanding of container orchestration (Docker, Kubernetes, Helm) and distributed systems patterns such as circuit breakers, retries, and graceful degradation.
- Experience operating production databases (PostgreSQL, Redis) and message queues.
- Strong security knowledge covering network segmentation, encryption, secrets management, and incident response.
- Preferred experience with real-time communication infrastructure (SIP, RTP, WebRTC), telecom systems, IoT pipelines, or satellite/low-bandwidth optimization environments.
- Competitive compensation package
- Flexible remote work environment with autonomy and ownership
- Opportunity to build and scale critical communication infrastructure
- Exposure to cutting-edge technologies across cloud, IoT, telecom, and distributed systems
- High-impact role with direct influence on reliability and platform architecture
- Collaborative, technically advanced engineering culture
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Key Skills
Ranked by relevanceReady to apply?
Join Jobgether and take your career to the next level!
Application takes less than 5 minutes

