Starlink Qatar
Operation Architect
Starlink QatarQatar4 hours ago
Full-timeRemote FriendlyInformation Technology

The Operations Architect defines and governs the operational model for enterprise platform capabilities delivered by multiple vendors, ensuring solutions are production-ready, observable, secure, and supportable at scale. The role designs end-to-end service management practices (SLOs/SLAs, monitoring, incident/change/problem management, DR, and capacity/cost controls) and ensures operational requirements are embedded from design through delivery.

Working with platform/cloud, security, and solution architects, as well as vendor teams and operations teams, the architect drives operations readiness reviews, creates runbooks and support processes, and enables a consistent, efficient operating model across cloud-agnostic deployments.

Duties & Responsibilities

  • Define operational architecture and service management model across capabilities (ITIL-aligned where applicable).
  • Establish observability standards: metrics/logs/traces/audits, OpenTelemetry instrumentation, dashboarding, alerting, and anomaly detection.
  • Define SLOs/SLAs/OLAs, error budgets, and operational KPIs; ensure vendors deliver evidence and meet acceptance gates.
  • Design incident management workflows (triage, escalation, RCA), integrate with ITSM, and standardize runbooks/playbooks.
  • Define change and release management practices (CAB inputs, deployment rings, canary/rollback, feature flags coordination).
  • Establish resiliency and DR requirements: backup/restore patterns, RPO/RTO targets, DR testing cadence, and failover runbooks.
  • Define capacity, performance, and availability engineering processes (load testing, scaling policies, GPU/TPU capacity planning).
  • Implement security operations integration: SIEM/SOAR alignment, alert routing, vulnerability/patch management SLAs.
  • Define FinOps operational controls: tagging standards, showback/chargeback, budgets, anomaly detection, cost optimization playbooks.
  • Lead operational readiness and handover: L1/L2/L3 training, reverse-shadowing, SOPs, and post-go-live stabilization plans.

Skills & Abilities

  • Strong expertise in operating cloud-native platforms: SRE/ITIL practices, reliability engineering, and service management.
  • Ability to turn NFRs into measurable SLOs, monitoring, and operational acceptance criteria.
  • Solid understanding of observability stacks and telemetry design (OTel, APM, SIEM integration).
  • Experience designing DR/BCP, backup strategies, and operational test plans in regulated environments.
  • Proven capability to drive operational standardization across multiple vendors and teams.

Education & Background

  • Bachelor’s degree in Computer Science, Information Technology, Cybersecurity, or related field; Master’s degree highly preferred.
  • 8+ years in operations architecture, SRE, DevOps leadership, or service management for enterprise platforms.
  • Experience running production systems on Azure plus exposure to at least one other cloud (GCP/AWS) and hybrid setups.
  • Experience with ITSM tooling and processes (incident/change/problem, CMDB), including KPI/SLA reporting.
  • Proven experience with monitoring/APM and security operations integration (SIEM, vulnerability management).
  • Certifications desirable: ITIL, SRE-related training, Azure/AWS/GCP ops certs, Kubernetes CKA/CKS (optional).

Preferred Tools / Soft Skills

Preferred Tools

  • Observability/APM: OpenTelemetry, Dynatrace/Datadog, Prometheus/Grafana/Loki/Tempo (as applicable)
  • ITSM & operations: ServiceNow (or equivalent), CMDB, PagerDuty/Opsgenie-style on-call tooling
  • Security & cloud ops: Microsoft Sentinel, Defender for Cloud, Azure Monitor/Log Analytics, Kubernetes tooling

Soft Skills

  • Calm, structured leadership during incidents and high-pressure escalations
  • Strong facilitation skills for readiness reviews, RCAs, and cross-vendor alignment
  • Clear documentation and operational discipline (runbooks, SOPs, checklists)
  • Continuous improvement mindset and ability to drive measurable reliability gains
  • Strong collaboration and influencing skills across engineering, security, and vendor teams

Key Skills

Ranked by relevance