Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
🚀 About the Role
We are looking for an experienced DevOps Engineer (Observability & Monitoring) to design, implement, and enhance observability capabilities across hybrid environments. You will play a key role in ensuring system reliability, performance visibility, and proactive incident management through modern monitoring and telemetry solutions.
🔧 Key Responsibilities
- Design, implement, and maintain observability platforms and telemetry pipelines across cloud and on-premise environments
- Instrument applications and systems using OpenTelemetry to collect logs, metrics, and traces
- Develop and optimize dashboards, alerts, and service maps to improve system visibility and performance monitoring
- Support incident analysis and root cause investigations using observability data
- Collaborate with infrastructure, DevOps, and application teams to enhance monitoring, automation, and operational insights
🎯 Required Skills & Experience
- Strong hands-on experience with observability tools such as OpenTelemetry, SigNoz, Prometheus, and Grafana
- Solid understanding of monitoring concepts: logs, metrics, traces, and alerting
- Experience working in hybrid and cloud environments (Kubernetes, containers, distributed systems)
- Scripting and automation skills (e.g. Python, Bash, PowerShell)
- Familiarity with ITSM or incident management tools (e.g. ServiceNow)
- Knowledge of SLO, SLA, and SLI frameworks
- Strong analytical mindset with a proactive and collaborative approach
- Excellent communication skills and ability to work across cross-functional technical teams
🌟 Nice to Have
- Experience with large-scale distributed systems
- Exposure to DevOps and CI/CD practices
- Background in performance engineering or Site Reliability Engineering (SRE)
🤝 What You Bring
You are a problem-solver who thrives in complex environments and enjoys turning data into actionable insights. You are comfortable working across teams and proactively improving system reliability and observability maturity.
Key Skills
Ranked by relevanceReady to apply?
Join Qplox engineering and take your career to the next level!
Application takes less than 5 minutes

