-
iO Associates

HPC Engineer

iO Associates
Sweden · Full-time · Entry

High-Performance Computing (HPC) Engineer - (Stockholm, Sweden)

Overview

Our High-Growth Technology client is seeking an experienced High-Performance Computing (HPC) Engineer to help design, build, and operate large-scale compute platforms supporting demanding workloads (e.g., AI/ML, simulation, rendering, analytics, and research). You will work closely with infrastructure, platform, and research/engineering teams to deliver reliable, high-throughput systems with strong performance, automation, and observability.

Key Responsibilities

  • Design, deploy, and support HPC clusters (on-prem / colocation / cloud-connected) with a focus on performance, resilience, and scalability.
  • Administer and optimise Linux-based compute environments (provisioning, patching, kernel/driver tuning, user access, hardening).
  • Implement and maintain workload scheduling and cluster management (e.g., Slurm or equivalent), including partitions/queues, fair-share policies, and job efficiency improvements.
  • Support GPU-accelerated environments (where applicable): driver/toolkit management, performance profiling, stability troubleshooting.
  • Build and maintain automation for cluster lifecycle operations (IaC, config management, CI/CD-style ops).
  • Partner with networking and storage teams to ensure high-throughput, low-latency performance across the stack.
  • Own incident response and problem management for HPC services; lead root-cause analysis and preventative improvements.
  • Develop monitoring, logging, and capacity planning to meet throughput and availability targets.
  • Produce clear documentation (runbooks, architecture diagrams, operational standards) and contribute to continuous improvement.

Required Skills & Experience

  • Strong hands-on experience as an HPC Engineer / Linux Systems Engineer / Infrastructure Engineer in performance-critical environments.
  • Deep Linux administration skills (systemd, networking basics, storage, performance tuning, troubleshooting).
  • Experience operating HPC or large-scale compute platforms, including one or more of:
    • Schedulers / cluster managers (Slurm preferred; PBS, LSF, Kubernetes for batch, etc.)
    • GPU compute (NVIDIA drivers/CUDA, NCCL awareness, profiling tools)
    • MPI and distributed compute concepts (OpenMPI/MPICH understanding)
  • Solid scripting/automation skills (Bash, Python; plus Ansible/Terraform or similar).
  • Practical understanding of observability (metrics, logs, tracing), and using monitoring stacks to drive reliability.
  • Good knowledge of storage and data movement patterns used in HPC (parallel file systems and/or high-performance shared storage concepts).
  • Strong communication skills-able to work across platform, network, storage, and application teams.

Desirable / Nice-to-Have

  • Experience with high-speed interconnects (e.g., InfiniBand, RoCE) and low-latency network troubleshooting.
  • Experience with containerised HPC or hybrid HPC workloads (Apptainer/Singularity, Docker where appropriate).
  • Familiarity with security best practices in shared compute environments (least privilege, auditing, secrets handling).
  • Background supporting AI/ML infrastructure at scale (GPU fleet operations, job efficiency, capacity optimisation).

Location & Working Model

  • Stockholm, Sweden (based locally).
  • Working model: Hybrid/On-site depending on operational needs.

What Success Looks Like

  • Stable, high-performance clusters with measurable improvements in throughput, utilisation, and job success rates.
  • Strong automation and repeatability across provisioning, configuration, and operations.
  • Clear operational practices (monitoring, alerting, runbooks) that reduce MTTR and improve reliability.

Next Steps

  • Please send me your most recent CV which aligns with this Job Description and your contact information.

Key Skills

Ranked by relevance

storage linux incident response kubernetes simulation python docker cloud bash cicd
Login to Apply
Posted
Feb 05, 2026
Type
Full-time
Level
Entry
Location
Stockholm

Industries

Technology Information Internet

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
EPAM Systems
Related

DevOps Engineer

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
zerothird
Related

Embedded Systems & FPGA Engineer (all genders)

2026-05-28

Full-time
Not Applicable
Austria
Technology
Engineering
View Job Details
iO Associates
Related

Data Scientist

2026-05-21

Full-time
Not Applicable
United Kingdom
Technology
Information Technology