Site Reliability Engineer (SRE) / NOC Engineer
Company: Grid Dynamics
Location: Ukraine (Remote)
Language Level: English B2
Project Overview
Join a high-impact team supporting a top-tier global retail leader specializing in on-trend apparel and lifestyle brands. With over 1,000 stores worldwide and a massive e-commerce presence spanning 80+ countries, our client relies on seamless digital experiences.
Grid Dynamics has been a strategic partner for this client for over a decade, growing a dedicated account of 160+ specialized engineers. This is a mature, stable, and technologically rich environment where you will work alongside experts in Data, DevOps, and SRE to maintain a high-availability retail ecosystem.
Core Responsibilities
- System Observability: Monitor infrastructure, applications, and cloud services using Grafana dashboards and proactive alerting.
- Infrastructure Management: Use Terraform for infrastructure provisioning and maintaining Infrastructure as Code (IaC).
- Incident Orchestration: Triage production incidents to identify severity, scope, and business impact; manage incident queues and execute failovers or restarts per runbooks.
- Tiered Escalation: Act as a critical link between operations and specialized teams (Application, Security, Database) to resolve complex issues.
- Operational Excellence: Analyze recurring incidents to suggest improvements for runbooks and operational procedures to reduce manual "toil."
- Global Communication: Maintain clear status updates during critical events via bridge calls, Slack, and handover notes.
Must-Have Requirements:
- Grafana Mastery: Hands-on experience building monitoring dashboards and managing alerting logic.
- Terraform Expertise: Practical experience with infrastructure provisioning and IaC principles.
- Strong Communication: English level B2 (Intermediate/Upper-Intermediate)
Nice-to-Have Requirements:
- Coding/Scripting: Experience with Java or Python to assist in automation or troubleshooting.
- Data Analysis: Working knowledge of SQL for querying and log investigation.
- Advanced Monitoring Stack: Experience with tools like New Relic (APM), Splunk (Log management), or Quantum Metric (Digital experience monitoring).
What We Offer
- Scale & Stability: A long-term project with a stable Fortune 500 partner.
- Tech Culture: Work in a "people-first" environment with access to a global community of senior engineering talent.
- Remote Flexibility: 100% remote work within Ukraine, supported by a robust IT infrastructure and dedicated delivery management.
- Growth: Opportunities for professional development and exposure to enterprise-level AI and cloud modernization initiatives.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Senior DevOps Engineer
2026-05-16
DevOps Engineer
2026-05-27
Front End Developer - React/TypeScript
2026-05-16
- Posted
- May 12, 2026
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Ukraine
- Company
- Grid Dynamics
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Senior DevOps Engineer
2026-05-16
DevOps Engineer
2026-05-27
Front End Developer - React/TypeScript
2026-05-16