EPAM Systems
Senior DevOps Engineer (HPC)
EPAM SystemsLatvia16 days ago
Full-timeRemote FriendlyBusiness Development, Information Technology +1

We are seeking a Senior DevOps Engineer to enhance our high-performance computing services and collaborate closely with the scientific community to optimize research computing.

Join our team to build and operate cutting-edge HPC capabilities using automation and infrastructure-as-code. Apply now to contribute to innovative computational solutions in a dynamic environment.

 

Feel free to work remotely from anywhere across Latvia or connect with colleagues at our Riga office.

 

 

Responsibilities

  • Design, implement, and maintain robust platform infrastructure using Infrastructure as Code tools such as Terraform
  • Develop, deliver, and operate research computing services and applications
  • Apply Site Reliability Engineering principles to manage HPC service deployment, monitoring, and incident response
  • Solve complex technical problems related to HPC services and user applications
  • Manage large-scale HPC, HTC, or BC computing environments for optimal performance
  • Collaborate with scientific users to tailor HPC resources to research needs
  • Automate deployment processes to ensure consistency across HPC infrastructure
  • Maintain and administer large-scale cluster and server computing software such as Slurm, LSF, or Grid Engine
  • Develop and maintain monitoring dashboards using tools like Grafana and Prometheus
  • Work within a DevOps team environment following agile methodologies
  • Operate and utilize virtualized private cloud resources such as OpenStack
  • Administer large-scale parallel filesystems including Weka, GPFS, or Lustre
  • Use configuration management tools like Ansible, Salt, or Puppet to manage IT operations
  • Develop scripts and tools for HPC and DevOps platform operations using Bash and Python

 

Requirements

  • 3+ years of experience with DevOps processes and automation using Infrastructure as Code tools such as Terraform
  • Hands-on experience operating or engineering large-scale HPC or similar computing environments
  • Proven expertise in Linux system administration including TCP/IP networking and storage subsystems
  • Experience administering large-scale cluster management software such as Slurm, LSF, or Grid Engine
  • Knowledge of configuration management tools like Ansible, Salt, or Puppet
  • Experience working in agile DevOps teams
  • Ability to develop and maintain monitoring tools such as Grafana and Prometheus
  • Experience with scripting languages such as Bash and Python for automation and tool development
  • Strong experience managing virtualized private cloud environments like OpenStack
  • Scientific degree or equivalent experience in computationally intensive scientific data analysis
  • Proven ability to manage relationships with third-party suppliers
  • Upper-intermediate proficiency in English (B2+)

 

Nice to have

  • Experience with container technologies such as LXD, Singularity, Docker, or Kubernetes
  • Operation and configuration experience with public cloud platforms like AWS, Azure, or GCP
  • Experience with HashiCorp tools such as Vault, Consul, and Nomad
  • Development experience with programming languages such as Java, C++, Python, Ruby, or Perl
  • Experience with parallel filesystems like Weka, GPFS, or Lustre

 

We offer

  • Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years.
  • Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies.
  • World-Class Clients: Work closely with 295+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact.
  • Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices.
  • GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys.
  • Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life.
  • Hybrid Setup: The flexibility to work from any location in Latvia, whether it's your home or our office in Riga.
  • Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more.

 

About EPAM

EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.With offices in 55+ countries, EPAM has grown in Latvia to over 150+ talented innovators in 3 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us.

 

Salary range €3.8K-€5.1K gross, based on your experience and interview results.

 

Key Skills

Ranked by relevance