Artificial Intelligence Engineer

WHD Consulting Ltd.Germany20 hours ago

ContractEngineering

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Agentic & Generative Edge AI Optimization Engineer – Long Term Contract (Munich or Hamburg)

My client are looking to recruit an AI Engineer passionate about Generative AI and Agentic AI systems, someone who thrives on optimizing models for efficient on-device deployment. You will work on large language models (LLMs), large multimodal models (LMMs), and Vision-Language-Action (VLA) models, ensuring they run reliably and efficiently on NPU-based platforms.

Your mission will be to translate cutting-edge research into production-ready solutions, focusing on model compression, system optimizations, and agentic capabilities such as function calling and tool orchestration. Experience with designing secure and reliable agentic workflows, including guardrails and safe tool invocation, is considered a strong plus.

Role / Responsibilities:

Optimize LLMs and multimodal models for on-device deployment

Investigate, develop and apply advanced quantization (8-bit, 4-bit, mixed precision), pruning, and distillation techniques for deriving optimized models for NPU targets.

Accelerate inference performance

Investigate, develop and implement system optimizations such as speculative decoding and other efficient decoding algorithms tailored for edge environments.

Engineer agentic AI capabilities towards tiny agents

Investigate methodologies for enhancing the performance of small language models towards enabling tiny agents at the edge, while ensuring these follow safety principles.

Work with inference engines and deployment frameworks

Deploy optimized models using Ollama, llama.cpp, ONNX Runtime, and TFLite for efficient NPU inference.

Benchmark LLMs and agentic systems

Design benchmarking pipelines for assessing the performance of Generative and Agentic AI systems on-device.

Develop demonstrators and proof-of-concepts

Build technology PoCs for relevant use-cases such as industrial safety monitoring, in-cabin sensing, and other edge AI applications for showcasing key technologies.

Move key technologies from research into product solutions

Translate advanced optimization techniques and agentic AI features into production-ready implementations and collaborate with product teams to integrate these features into SW/HW portfolio.

Your Profile:

5+ years of experience in software/AI engineering with deep exposure to LLMs, VLMs, and systems performance.
Experience with LLM quantization techniques (e.g., SmoothQuant, SpinQuant, QuaRoT), pruning (Wanda, SparseGPT, etc.) and other system optimizations like speculative decoding.
Track-record experience in working with AI frameworks (PyTorch, TensorFlow, etc.), required.
Experience with Agentic AI technologies and familiarity with existing frameworks (e.g., LangChain, Google ADK, SmolAgents, etc.)
Understanding of safety and security considerations for agentic systems (e.g., guardrails, policy enforcement, secure function calling) is a plus.
Understanding of AI toolchains, deployment, portability and inference engines (CUDA, TensorRT, TFLite, ONNX, Ollama, etc.) preferred.
Affinity and experience with embedded systems, and NPU accelerators required.
Experience with embedded software architecture, build systems, version control systems required.
Broad experience with Operating systems GNU/Linux, embedded systems, development boards, and processors, and SW competencies required.
Familiarity with setting up and maintaining related ML-Ops development environments (MLFlow, ClearML, etc.) required.
Knowledge of build systems (YOCTO, OpenEmbedded, etc.) beneficial, working with cross-compilation toolchains for ARM preferred.
C, C++, Python and Bash programming languages on Linux systems required.

If this exciting opportunity could be of interest – please let me know ASAP.Interviews can be arranged at short notice.

Key Skills

Ranked by relevance

Ready to apply?

Join WHD Consulting Ltd. and take your career to the next level!

Application takes less than 5 minutes

Apply