Avance Consulting
DevOps Engineer
Avance ConsultingSweden3 days ago
ContractEngineering, Consulting

Job Description:


Senior Operations Engineer HPC

Respond to and resolve operational incidents, identify root causes for critical issues, and implement strategies to prevent recurrence and improve platform resiliency.

 Proactively create and manage monitoring, logging, and alerting systems to ensure high availability, performance, and visibility across all services.

 Take a Site Reliability Engineering approach to our services, improving the deployment, monitoring and incident response end-to-end.

 Solve complex technical problems, with SCP applications, infrastructure and end user’s use of the services.

 Administer platform tools like Ansible, Vault, Consul, Prometheus, and Grafana to support core functions like configuration management,secrets management, monitoring, and observability.

 Mentor and coach junior engineers in the team, fostering a collaborative and high-performing culture.

 Drive automation for deployment and management processes using GitOps workflows as well as CI/CD pipelines.


Essential Knowledge, Skills, and Experience

 Experienced administering, maintaining and troubleshooting a Linux environment

 Competent in automation and bash scripting

 Highly customer focused; able to explain IT technical concepts in a manner which non-IT experts can understand

 Hands-on experience working in a DevOps team and using agile methodologies Plus some of the following areas of expertise:

 Hands-on knowledge of a range of scientific and HPC applications such as simulation software, bioinformatics tools or 3D data visualisation packages

 Experience administering and optimising SLURM

 Experience deploying and administering OpenStack

 Experience with configuration automation and infrastructure as code (e.g.Ansible, Hashicorp Terraform, AWS CloudFormation, Amazon Cloud Developer Kit)

 Experience deploying infrastructure and code to public cloud, especially AWS

 Experience with software distribution frameworks such as Easybuild or Spack

 Familiarity with container runtimes such as Docker, Singularity or enroot

 Experience with frameworks for regression tests and benchmarks for HPC applications, like Reframe HPC

Key Skills

Ranked by relevance