Team Leadership & Operational Management
- Lead and manage the operations team, ensuring effective delivery of day-to-day support activities and operational excellence.
- Provide clear technical direction and mentorship to team members.
- Manage workload distribution, incident prioritization, and escalation handling.
- Ensure adherence to defined SLAs, KPIs, and operational standards.
- Conduct regular team performance reviews, knowledge sharing sessions, and continuous improvement initiatives.
- Minimum 3 years of experience managing or leading technical teams in an operations or engineering environment.
Elastic Stack Subject Matter Expertise (SME)
- Act as the SME for Elastic Observability solutions including:
- Elastic Agent & Fleet
- Logstash (pipeline configuration and optimization)
- Elasticsearch (cluster architecture, hot/warm/cold/frozen tiers)
- Sharding strategies and index lifecycle management (ILM)
- High Availability (HA) design and cluster resilience
- Artifact Server management
- Kibana and observability dashboards
- Design, implement, optimize, and maintain Elastic Stack environments.
- Ensure proper data ingestion architecture, performance tuning, and storage optimization.
- Lead capacity planning and scalability strategies across environments.
Day-to-Day Operations & Support
- Oversee and actively manage daily operational issues, including:
- Monitoring cluster health and performance
- Index management and shard balancing
- Pipeline failures and ingestion bottlenecks
- Agent connectivity issues
- Storage capacity and node utilization
- Backup, restore, and disaster recovery processes
- Lead troubleshooting efforts for critical incidents and provide root cause analysis (RCA).
- Establish preventive measures and long-term fixes to reduce recurring incidents.
- Ensure production stability across multiple environments (Dev, UAT, Production).
- Coordinate release management, patching, upgrades, and environment maintenance.
Incident & Problem Management
- Lead major incident response and provide technical direction during high-severity outages.
- Perform deep-dive troubleshooting across infrastructure, application, and Elastic components.
- Drive problem management processes and implement corrective and preventive actions.
- Ensure proper documentation of operational procedures, runbooks, and knowledge base articles.
Stakeholder & Client Engagement
- Liaise with stakeholders, clients, and cross-functional teams to understand operational requirements and business priorities.
- Provide regular operational reports, performance metrics, and improvement recommendations.
- Translate technical issues into business-impact language for non-technical stakeholders.
- Support service review meetings and continuous service improvement initiatives.
Architecture & Continuous Improvement
- Contribute to architecture design discussions and recommend best practices for Elastic deployments.
- Implement and maintain CI/CD pipelines for Elastic configurations and automation processes.
- Leverage automation tools and source code repositories for configuration management.
- Continuously improve monitoring frameworks, alerting mechanisms, and operational workflows.
- Contribute to internal methodology, tools, and process improvements.
Knowledge & Technical Competencies
- Certified in Elastic (Elastic Certified Engineer or relevant certification preferred).
- Strong expertise in Elastic Observability and distributed cluster architecture.
- Advanced knowledge of Elasticsearch cluster management (hot, cold, frozen tiers, sharding, replication, HA setup).
- Strong experience in Logstash pipeline configuration and performance tuning.
- Proficiency in automation tools, CI/CD pipelines, and source control systems.
- Solid understanding of Linux systems, networking, storage, and cloud environments (AWS, Azure, or GCP).
- Strong analytical and problem-solving skills with attention to detail.
- Excellent communication and stakeholder management skills.
- Strong understanding of ITIL processes and operational best practices.
Academic Qualifications & Certifications
- Bachelor’s Degree in Information Technology, Computer Science, or related field.
- Elastic Certification (required or strongly preferred).
- Relevant Cloud Certification (AWS, Azure, GCP) preferred.
- DevOps or Agile certifications preferred.
Required Experience
- Minimum 3 years of experience managing or leading technical teams.
- Extensive hands-on experience managing Elastic Stack in production environments.
- Experience working in multi-team, multi-geography operational environments.
- Proven experience handling high-volume, mission-critical production systems.
- Strong background in Agile methodologies (SCRUM/KANBAN) and operational frameworks.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Full-Stack Developer | Remote
2026-05-27
Network and Systems Engineer
2026-05-28
Data Science Manager, Geospatial Programme (MTI)
2026-05-19
- Posted
- Mar 12, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Singapore
- Company
- NTT DATA, Inc.
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Full-Stack Developer | Remote
2026-05-27
Network and Systems Engineer
2026-05-28
Data Science Manager, Geospatial Programme (MTI)
2026-05-19