Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Role Description and Key Deliverables
We are seeking a passionate HPC engineer. The ideal candidate will have extensive hands-on experience making an impact with HPC technology, delivering HPC services to a high quality, and able to relate to the scientific community and work closely with users to make the best use of research computing services.
The HPC landscape is continually evolving. You will need the skills to help build and operate industry-leading capabilities, including application build frameworks, containerised applications and cloud software-as-a-service. Automated deployment is a key feature, and you will need to be comfortable with DevOps processes and delivering consistency through automation and infrastructure-as-code.
Key Responsibilities
- Design, implement, and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform, ensuring secure and scalable environments in our private cloud ecosystem.
- Develop, deliver and operate research computing services and applications.
- Take a Site Reliability Engineering approach to HPC services, managing the development deployment, monitoring and incident response end-to-end.
- Solve complex technical problems, both with SCP services and the user’s use of them.
Essential Knowledge, Skills, and Experience
- 10+ years of hands-on experience operating, crafting or engineering large-scale computing environments, such as HPC, HTC or BC
- Drive innovative computational solutions and exploit emerging technologies
- Experience of administration of large-scale cluster and server computing and related
- Software (e.g. Slurm, LSF, Grid Engine)
- Hands-on experience working in a DevOps team and using agile methodologies
- Operating and consuming virtualized private cloud resources (e.g. OpenStack)
- Understanding of Linux system administration, the TCP/IP stack, and storage subsystems
- Experience in implementing and administering large-scale parallel filesystems (e.g. Weka, GPFS, Lustre)
- Proven experience of using configuration management (e.g. ansible, salt, puppet) and technology frameworks in IT operations
- Experience of developing and managing relationships with 3rd party suppliers
- Scripting and tool development for HPC & DevOps style platform operations using bash and Python
Desirable Skills and Knowledge
- Scientific degree, and/or experience in computationally intensive analysis of scientific data
- Previous experience in high performance computing (HPC) environments, especially at large scales (>10,000 cores)
- Operation and configuration of public cloud computing infrastructure (e.g. AWS, Azure, GCP) is a plus
- Managing a virtualized private cloud environment (e.g. OpenStack) is a plus
- Container technology (e.g. LXD, Singularity, Docker, Kubernetes) is a plus
- Demonstrated development experience with a variety of programming languages, tools, and technologies (Java/C++, Python/Ruby/Perl, SQL, AWS) is a plus
- Experience with Hashicorp tools like terraform, vault, consul and nomad is a plus
- Working experience with high-speed networks (e.g. InfiniBand)
Key Skills
Ranked by relevanceReady to apply?
Join KBC Technologies Group and take your career to the next level!
Application takes less than 5 minutes