Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
As a DataOps Engineer, you will be the guardian of our modern Lakehouse environment, ensuring that our complex data ecosystem remains reliable, transparent, and high-performing. Positioned at the vital intersection of Data Engineering and Operations, you will lead the charge in proactive monitoring and rapid incident response for mission-critical pipelines, including AWS Glue, dbt, and Kafka.
This is a high-visibility role where you won't just react to issues—you will own the end-to-end incident lifecycle, drive deep root-cause analysis (RCA), and collaborate with development teams to build permanent automated preventions. Whether you are managing real-time operational dashboards or leading daily reviews of platform health, your mission is to guarantee data freshness and platform integrity, turning operational stability into a competitive advantage for our data-driven decision-making.
Your responsibilities as a DataOps Engineer will include, but not be limited to;
- Monitoring & Alerting
- Monitor and act on incidents related to:
- AWS Glue job executions
- Current DMS and dbt pipelines
- Kafka lag and streaming health
- Data freshness SLA breaches
- Data quality issues
- Platform health alerts
- Perform L1 triage or distribute incidents to L2/L3 teams as needed
- Incident Management
- Own the incident response process:
- Initial triage and severity assessment
- Coordinate with development teams for resolution
- Create and assign JIRA tickets with full context
- Track incident resolution and closure
- Escalate high-priority or long-running incidents to management
- Root Cause Analysis & Prevention
- Conduct post-incident root cause analysis
- Maintain incident logs and post-mortem documentation
- Implement preventive measures for recurring issues in collaboration with dev teams
- Operational Reporting
- Manage the TV dashboard providing real-time status of critical flows (with color-coded indicators)
- Deliver a daily 10-15 min operational review of previous day's executions, open incidents, and follow-ups
- Share daily summary emails with the team
- Continuous Awareness
- Stay up-to-date on the status of all critical flows and remediation efforts
- Ensure proactive communication on risks and delays
While technical mastery is the foundation of what we do, the ability to bridge the gap between complex data science and actionable business value is what defines your success with Deeplight.
We're looking for individuals who are not only world-class in their fields of specialism, but also compelling communicators and persuasive advocates for their own skills.
You will be the face of our firm, tasked with building trust, articulating the "why" behind your technical decisions, and effectively "selling" your vision to high-level stakeholders.
If you thrive on the challenge of presenting cutting-edge solutions as much as you do on building them, you will fit right in.
Requirements
You will have experience in:
- DataOps, DevOps, or data engineering roles, with a minimum of 5 years
- AWS Glue, DMS, dbt, and Kafka monitoring
- data freshness SLAs, data quality frameworks, and platform health monitoring
- incident management tools (e.g., JIRA) and alerting systems
- identifying ways to automate their work / repetitive tasks
- troubleshooting and triage process.
- managing multiple incidents and prioritize effectively
- root cause analysis and preventive action planning
- communicating and coordination
- working under pressure and maintain operational discipline
Benefits & Growth Opportunities:
- Competitive salary and performance bonuses
- Comprehensive health insurance
- Professional development and certification support
- Opportunity to work on cutting-edge AI projects
- Flexible working arrangements
- Career advancement opportunities in a rapidly growing AI company
This position offers a unique opportunity to shape the future of AI implementation while working with a talented team of professionals at the forefront of technological innovation. The successful candidate will play a crucial role in driving our company's success in delivering transformative AI solutions to our clients.
At DeepLight AI, we recognise that diversity drives innovation. We are committed to fostering an inclusive environment where individuals with different thinking styles can thrive and contribute their unique strengths to our specialised AI and data solutions.
Our goal is to ensure our application and interview process is accessible, predictable, and fair for all candidates.
If you require any specific adjustments to the application process, or if you require any reasonable adjustments should you be successful in being processed to the interview stage, please do let us know. This information will be kept strictly confidential and will not impact hiring decisions.
By applying to Deeplight, you also agree for us to share your profile, where necessary, with external clients.
Key Skills
Ranked by relevanceReady to apply?
Join Deeplight AI and take your career to the next level!
Application takes less than 5 minutes

