KBC Technologies Group
HPC Infrastructure Engineering
KBC Technologies GroupSweden5 days ago
ContractAnalyst

Role Description and Key Deliverables

We are seeking a passionate HPC engineer. The ideal candidate will have extensive hands-on experience making an impact with HPC technology, delivering HPC services to a high quality, and able to relate to the scientific community and work closely with users to make the best use of research computing services.

The HPC landscape is continually evolving. You will need the skills to help build and operate industry-leading capabilities, including application build frameworks, containerised applications and cloud software-as-a-service. Automated deployment is a key feature, and you will need to be comfortable with DevOps processes and delivering consistency through automation and infrastructure-as-code.

Key Responsibilities

  • Design, implement, and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform, ensuring secure and scalable environments in our private cloud ecosystem.
  • Develop, deliver and operate research computing services and applications.
  • Take a Site Reliability Engineering approach to HPC services, managing the development deployment, monitoring and incident response end-to-end.
  • Solve complex technical problems, both with SCP services and the user’s use of them.

Essential Knowledge, Skills, and Experience

  • 10+ years of hands-on experience operating, crafting or engineering large-scale computing environments, such as HPC, HTC or BC
  • Drive innovative computational solutions and exploit emerging technologies
  • Experience of administration of large-scale cluster and server computing and related
  • Software (e.g. Slurm, LSF, Grid Engine)
  • Hands-on experience working in a DevOps team and using agile methodologies
  • Operating and consuming virtualized private cloud resources (e.g. OpenStack)
  • Understanding of Linux system administration, the TCP/IP stack, and storage subsystems
  • Experience in implementing and administering large-scale parallel filesystems (e.g. Weka, GPFS, Lustre)
  • Proven experience of using configuration management (e.g. ansible, salt, puppet) and technology frameworks in IT operations
  • Experience of developing and managing relationships with 3rd party suppliers
  • Scripting and tool development for HPC & DevOps style platform operations using bash and Python

Desirable Skills and Knowledge

  • Scientific degree, and/or experience in computationally intensive analysis of scientific data
  • Previous experience in high performance computing (HPC) environments, especially at large scales (>10,000 cores)
  • Operation and configuration of public cloud computing infrastructure (e.g. AWS, Azure, GCP) is a plus
  • Managing a virtualized private cloud environment (e.g. OpenStack) is a plus
  • Container technology (e.g. LXD, Singularity, Docker, Kubernetes) is a plus
  • Demonstrated development experience with a variety of programming languages, tools, and technologies (Java/C++, Python/Ruby/Perl, SQL, AWS) is a plus
  • Experience with Hashicorp tools like terraform, vault, consul and nomad is a plus
  • Working experience with high-speed networks (e.g. InfiniBand)

Key Skills

Ranked by relevance