Critical Manufacturing
AI Engineer
Critical ManufacturingPortugal7 hours ago
Full-timeEngineering
Critical Manufacturing is dedicated to empowering high-performance operations to make Industry 4.0 a reality with the most innovative, comprehensive, and modular MES software. We have a global presence, but our headquarters, and the main technical center, are in Porto (Maia), Portugal, where we develop a state-of-the-art solution for Semiconductor, Electronics, Medical Devices, and other Discrete industries.

Recognized for the third consecutive year as a Leader by Gartner, we are part of ASMPT, the world's largest supplier of best-in-class equipment, and technological process partner for the electronics and semiconductor industries.

The role:

You will join an existing AI engineering team focused on building reliable AI infrastructure for manufacturing systems. This is hands-on work developing MCP servers, creating tooling for model observability, telemetry, and retraining pipelines—no leadership required, just solid execution within a collaborative team.

This role is based at our headquarters in Porto, Portugal, where collaboration, experimentation, and rigorous engineering standards are essential. You're expected to stay closely connected—actively participating in technical design reviews, architecture discussions, and engaging with teams across Product, Data, and Platform Engineering. This is a role for someone who cares about building AI systems that are not just smart, but observable, debuggable, and continuously improving.

What you'll do:

Develop MCP Servers

  • Implement and maintain Model Context Protocol (MCP) servers that connect language models to manufacturing domain tools and data sources
  • Optimize server performance and define clear interfaces for tool integration, ensuring models have safe, reliable access to business logic
  • Collaborate with team leads to map complex manufacturing workflows into structured tools and prompts

Build Model Observability and Telemetry Infrastructure

  • Design and implement comprehensive telemetry systems to track model behavior, token usage, latency, and cost in production
  • Create dashboards and alerting systems that give real-time visibility into model performance and anomalies
  • Instrument models to capture structured traces: prompts/system context, tool invocations, inputs/outputs, intermediate artifacts, and decision metadata
  • Contribute to standards for logging, tracing, and distributed observability across all AI systems

Develop Retraining and Continuous Improvement Pipelines

  • Build data collection pipelines that capture production interactions, model failures, and edge cases for retraining
  • Implement automated systems for evaluating model improvements and managing safe rollouts
  • Contribute to feedback loops that allow the platform to learn from real-world usage without manual intervention

Support Team Deliverables

  • Write clean, testable code and contribute to team codebases, documentation, and CI/CD processes
  • Participate in code reviews, technical design reviews, and troubleshooting production issues
  • Experiment with new tools and techniques under team guidance to improve AI system reliability
  • Promote the adoption of agentic coding across teams to accelerate delivery and increase throughput while maintaining quality and security standards
  • Design repositories, CI, and developer tooling that make agent-driven changes safe (linting, typed APIs, contract tests, golden tests, eval gates)

Ensure Production Reliability

  • Implement robust error handling, fallback strategies, and graceful degradation for AI systems
  • Monitor and tune AI systems for performance, uptime, and safety in manufacturing environments
  • Gather feedback from operations and product teams to refine tooling and server implementations

What Success Looks Like

Within your first year, you will have:

  • Deployed production MCP servers handling real manufacturing workloads
  • Built and iterated on observability tools used daily by engineering and ops teams
  • Contributed to retraining pipelines that reduce model staleness and improve prediction accuracy
  • Established clear patterns and best practices that help the team scale AI systems reliably
  • Delivered robust tooling for debugging, monitoring, and managing AI systems in manufacturing environments

Why Join Us

  • Work on AI that powers real factories, solving problems with immediate industrial impact
  • Join a tight-knit engineering team building the backbone of trustworthy AI infrastructure for manufacturing
  • Contribute to systems that manufacturers depend on daily, with full observability and reliability
  • Enjoy the freedom to code, collaborate, and grow technically in a rigorous engineering environment

Requirements

What You Will Bring

  • At least 1 year of hands-on machine learning experience, including training and testing models, and a practical understanding of overfitting, generalization, and bias; plus a solid grasp of common model families (e.g., k-nearest neighbors, decision trees/random forests, support vector machines, linear/logistic regression, and basic neural networks)
  • At least 1 year of hands-on experience with LLMs in production or applied settings, including inference, prompt engineering, and evaluation; with a working understanding of how LLMs are configured and behave (e.g., temperature, top-p, max tokens, context windows, and tool/function calling)
  • Experience with agentic coding workflows or LLM-based code assistance, using tools that accelerate implementation, refactoring, and test generation while maintaining strong engineering rigor (reviews, testing, documentation, and CI discipline)
  • Familiarity with server development, APIs, and containerization (Docker/Kubernetes)
  • Strong problem-solving skills and comfortable writing production code—tests, docs, and all
  • Excellent software engineering fundamentals: version control, testing, code review, documentation
  • Ability to collaborate effectively in a team and work well under technical leadership
  • Excellent spoken and written English communication skills

What we consider a plus (not mandatory):

  • Experience with manufacturing operations, MES systems, or Industry 4.0 concepts
  • Familiarity with MLOps tools, model monitoring platforms, or ML infrastructure
  • Basic knowledge of observability tools (Prometheus, Grafana, or similar) and data pipelines
  • Proficiency in Python and experience with AI frameworks like PyTorch, TensorFlow, or LangChain

Diversity, Equity and Inclusion are a source of commitment and innovation

At Critical Manufacturing, we welcome and encourage applications from individuals of all backgrounds, regardless of disabilities, diverse abilities, identities, or experiences. Our commitment is to create an inclusive environment where everyone has equal opportunities to succeed and thrive.

If you need accommodation during the recruitment process, please let us know—we're happy to support you.

Key Skills

Ranked by relevance