SiMa.ai
ML Software Engineer (AI2464)
SiMa.aiGermany1 day ago
Full-timeEngineering, Information Technology

Brief Description: SiMa.ai™ is scaling Physical AI in Robotics, Automotive, Industrial, Aerospace and Defense, and Medical markets. We have created the industry's best purpose-built, software centric Physical AI HW/SW platform that leads the industry in Ease of use, performance, and power efficiency. SiMa.ai is led by technologists and business veterans backed by a set of top investors committed to helping customers bring ML on their platforms. SiMa.ai was founded in 2018, has raised $355M and is backed by Fidelity Management & Research Company, Maverick Capital, Point72, MSD Partners, VentureTech Alliance and more. For more information, visit https://sima.ai/.


Job Title: ML Software Engineer

Job Location: Stuttgart, Germany (This position requires a full-time, on-site presence in our Stuttgart, Germany)

Language Requirement: German (Fluent/C1) & English (Very Good)


Job ID: AI2464

Position Summary:

As part of the ErUM-Data DEEP consortium ( erumdata-deep.de ), you will research and optimize real-time (Graph) Neural Network algorithms for the Belle II tracking detectors, and implement them on SiMa SoC.


We are seeking an AI/ML Engineer to bridge the gap between industrial Edge AI and high-energy physics research. In this role, you will be focusing on our contribution to the ErUM-Data DEEP consortium. You will be the bridge between modern Neural Network architectures and our SiMa SoC silicon. You will select and optimize models for SiMa SoC, leverage the SiMa toolchain to quantize, compile, and deploy on our SoC, and ensure they run in real-time with the target KPIs.

Key Responsibilities:

  • Research and select state-of-the-art architectures.
  • Identify "hardware-unfriendly" operators in standard models and redesign them to maximize throughput and minimize latency.
  • Analyze model complexity using metrics beyond just accuracy: FLOPs, MACs, Memory Bandwidth, and SRAM usage.
  • Profile models on our silicon to identify bottlenecks (memory bound vs. compute bound layers).
  • Perform model compression techniques to reduce model size without significant accuracy loss.
  • Execute Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) to convert floating-point models (FP32) to fixed-point (e.g., INT8, INT4 ..etc.) for HW acceleration.
  • Debug graph-level optimizations, such as operator fusion, layout transformation (NCHW vs NHWC), and tiling strategies.
  • Deploy compiled artifacts onto the SiMa SoC embedded environment.
  • Write efficient inference code to integrate the AI model with the application logic.


Requirements:

Must-Have Skills:

  • Education: M.Sc. or Ph.D. in Computer Science, Electrical Engineering, or Physics.
  • Languages: German (Fluent/C1) is mandatory for consortium coordination. Very good English is required.
  • Core ML: Deep understanding of CNNs, GNNs, and Transformers. Proficiency in PyTorch or TensorFlow.
  • Edge AI Flow: familiar with quantization, pruning, and hardware optimization techniques
  • Programming: Strong Python for tooling/training and C++ for deployment/runtime.

Nice-to-Have (Bonus):

  • Experience with HW-NAS
  • Experience with TVM (Apache) or MLIR compiler infrastructure.
  • Knowledge of computer architecture (SRAM vs DRAM, cache hierarchy, SIMD/VLIW).
  • Experience writing custom operators (kernels) in CUDA or OpenCL.
  • Familiarity with Embedded Linux (Yocto).

Personal attributes:

Can-do attitude. Strong team player. Curious, creative, and good at solving problems. Execution and results oriented. Self-driven, thinks big and is highly accountable. Good communication skills.

Key Skills

Ranked by relevance