Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Team Leadership & Operational Management
- Lead and manage the operations team, ensuring effective delivery of day-to-day support activities and operational excellence.
- Provide clear technical direction and mentorship to team members.
- Manage workload distribution, incident prioritization, and escalation handling.
- Ensure adherence to defined SLAs, KPIs, and operational standards.
- Conduct regular team performance reviews, knowledge sharing sessions, and continuous improvement initiatives.
- Minimum 3 years of experience managing or leading technical teams in an operations or engineering environment.
Elastic Stack Subject Matter Expertise (SME)
- Act as the SME for Elastic Observability solutions including:
- Elastic Agent & Fleet
- Logstash (pipeline configuration and optimization)
- Elasticsearch (cluster architecture, hot/warm/cold/frozen tiers)
- Sharding strategies and index lifecycle management (ILM)
- High Availability (HA) design and cluster resilience
- Artifact Server management
- Kibana and observability dashboards
- Design, implement, optimize, and maintain Elastic Stack environments.
- Ensure proper data ingestion architecture, performance tuning, and storage optimization.
- Lead capacity planning and scalability strategies across environments.
Day-to-Day Operations & Support
- Oversee and actively manage daily operational issues, including:
- Monitoring cluster health and performance
- Index management and shard balancing
- Pipeline failures and ingestion bottlenecks
- Agent connectivity issues
- Storage capacity and node utilization
- Backup, restore, and disaster recovery processes
- Lead troubleshooting efforts for critical incidents and provide root cause analysis (RCA).
- Establish preventive measures and long-term fixes to reduce recurring incidents.
- Ensure production stability across multiple environments (Dev, UAT, Production).
- Coordinate release management, patching, upgrades, and environment maintenance.
Incident & Problem Management
- Lead major incident response and provide technical direction during high-severity outages.
- Perform deep-dive troubleshooting across infrastructure, application, and Elastic components.
- Drive problem management processes and implement corrective and preventive actions.
- Ensure proper documentation of operational procedures, runbooks, and knowledge base articles.
Stakeholder & Client Engagement
- Liaise with stakeholders, clients, and cross-functional teams to understand operational requirements and business priorities.
- Provide regular operational reports, performance metrics, and improvement recommendations.
- Translate technical issues into business-impact language for non-technical stakeholders.
- Support service review meetings and continuous service improvement initiatives.
Architecture & Continuous Improvement
- Contribute to architecture design discussions and recommend best practices for Elastic deployments.
- Implement and maintain CI/CD pipelines for Elastic configurations and automation processes.
- Leverage automation tools and source code repositories for configuration management.
- Continuously improve monitoring frameworks, alerting mechanisms, and operational workflows.
- Contribute to internal methodology, tools, and process improvements.
Knowledge & Technical Competencies
- Certified in Elastic (Elastic Certified Engineer or relevant certification preferred).
- Strong expertise in Elastic Observability and distributed cluster architecture.
- Advanced knowledge of Elasticsearch cluster management (hot, cold, frozen tiers, sharding, replication, HA setup).
- Strong experience in Logstash pipeline configuration and performance tuning.
- Proficiency in automation tools, CI/CD pipelines, and source control systems.
- Solid understanding of Linux systems, networking, storage, and cloud environments (AWS, Azure, or GCP).
- Strong analytical and problem-solving skills with attention to detail.
- Excellent communication and stakeholder management skills.
- Strong understanding of ITIL processes and operational best practices.
Academic Qualifications & Certifications
- Bachelor’s Degree in Information Technology, Computer Science, or related field.
- Elastic Certification (required or strongly preferred).
- Relevant Cloud Certification (AWS, Azure, GCP) preferred.
- DevOps or Agile certifications preferred.
Required Experience
- Minimum 3 years of experience managing or leading technical teams.
- Extensive hands-on experience managing Elastic Stack in production environments.
- Experience working in multi-team, multi-geography operational environments.
- Proven experience handling high-volume, mission-critical production systems.
- Strong background in Agile methodologies (SCRUM/KANBAN) and operational frameworks.
Key Skills
Ranked by relevanceReady to apply?
Join NTT DATA, Inc. and take your career to the next level!
Application takes less than 5 minutes

