Presight
Senior Site Reliability Engineer
PresightUnited Arab Emirates13 hours ago
Full-timeEngineering

Position Overview


Seeking a meticulous Engineer - Site Reliability who will support the Presight delivery model that empowers product & technology teams to develop & deliver high-quality products, improve platform infrastructure and strengthen the reliability of products and solutions.

You play a key role in defining & establishing the delivery model deployed in the development of cutting edge, next-gen analytics solutions & services at Presight.


Key Responsibilities


  • Manage the infrastructure required to run our solutions deployed to public or private cloud (air-gapped).
  • Analyze service performance, identify bottlenecks, and provide measurable improvement plans.
  • Maintain the environment’s health by continuously monitoring technical and business metrics, configuring alerts for potential issues, and proactively addressing risks to prevent disruptions
  • Deploy application updates with minimal disruption to services
  • Identify, evaluate, and conduct proof-of-concepts for new technologies.
  • Contribute to the knowledge base.
  • Review and modify CI/CD principles and service maturity iteratively, striving for continuous improvement


Requirements

  • 5+ years of experience in managing Kubernetes clusters.
  • 5+ years of experience in configuring and using monitoring/observability platforms
  • Familiarity with at least one type of database

Experience

  • 5+ years in a SRE/DevOps/Sysadmin/Platform Engineer role


Mandatory skills:

  • Strong background in Linux/Unix Administration
  • Solid hands-on experience deploying and operating Kubernetes or Openshift clusters
  • Experience configuring and maintaining monitoring and observability solutions
  • Ability to troubleshoot and resolve complex production issues efficiently, including performing root
  • cause analysis and restoring services quickly during high-pressure incidents or critical outages
  • Experience in backing up and restoring various systems
  • Working together with project managers and solution architects while serving as subject matter
  • Experts
  • Implementing basic network security (e.g. configuring VPCs, firewalls/security groups, etc.)
  • Understand the dependencies of various GPU cards, and upgrade container images as needed in
  • order to ensure compatibility
  • Deploy and operate products provided by third party providers
  • Creating releases together with the development team and deploying release packages to all
  • required environments


Bonus Skills:

  • Good understanding of typical system architecture and interaction between its components
  • Experience automating tasks using infrastructure-as-code tools, e.g. Ansible, Terraform
  • Thorough understanding of a company's systems, including auxiliary components like caching
  • systems (e.g., Redis, Memcached) and message queues (e.g., RabbitMQ, Kafka)
  • Good understanding of databases, e.g. Postgres, Elasticsearch, Clickhouse
  • Basic scripting
  • Working knowledge of OAuth 2.0, OpenID/OpenID-Connect, SAML 2.0, Kerberos, LDAP


Join us at Presight, where we offer a culture of innovation, outstanding career growth opportunities, and competitive rewards. If you're eager to conquer new frontiers in AI and thrive in a dynamic environment, we welcome you to our community.

Key Skills

Ranked by relevance