Atos
Machine Learning Engineer
AtosBelgium11 hours ago
ContractRemote FriendlyEngineering

MLOps Engineer

Yearly B2B Contract (Freelance)

Location Brussels-Haren (once a week onsite)


Position Summary

We are looking for an experienced MLOps Engineer to architect and operationalize scalable machine learning infrastructure on Azure within a decentralized data platform environment. You will own the complete ML lifecycle—from development through production—leveraging a hybrid Azure ML and Databricks ecosystem, using infrastructure-as-code practices and MLflow to deliver automated, reliable, and cost-effective ML operations. This role requires building MLOps capabilities that align with data mesh principles, treating data and models as products with clear ownership and domain-driven architecture.


Core Responsibilities

Infrastructure & Automation

  • Collaborate with cross-functional infrastructure and platform teams to design and deploy production-grade MLOps infrastructure on Azure using Terraform, adhering to data mesh principles of decentralized domain ownership
  • Work alongside DevOps and platform engineers to build reusable Infrastructure as Code (IaC) templates for ML environments, covering compute resources, storage, networking, and security
  • Partner with team members to ensure infrastructure is reproducible, version-controlled, and optimized for scalability across multiple domain-oriented data products
  • Contribute to team efforts in establishing infrastructure standards and best practices for ML workloads
  • Provision and manage Azure ML workspaces, compute clusters, and related resources alongside Databricks infrastructure

ML Lifecycle Management

  • Develop automated end-to-end ML pipelines covering training, validation, deployment, and monitoring within a federated data architecture
  • Implement ML workflows using both Azure ML and Databricks, selecting the appropriate platform based on use case requirements
  • Implement experiment tracking, model versioning, and artifact management using MLflow integrated with both Azure ML and Databricks environments
  • Leverage Azure ML's model registry and Databricks MLflow Model Registry for unified model governance across platforms
  • Manage model promotion workflows across development, staging, and production environments
  • Design and implement feature store solutions for centralized feature engineering, versioning, and serving across ML workloads
  • Enable feature reusability and discoverability to support consistent model development across domain teams

Data Mesh & Product Thinking

  • Build MLOps functionalities within a development data platform following data mesh architecture principles
  • Apply data-as-a-product mindset to ML models and features, ensuring they meet quality, discoverability, and usability standards
  • Establish domain-agnostic MLOps capabilities that can be consumed by autonomous domain teams
  • Implement self-serve ML infrastructure enabling domain teams to independently develop, deploy, and manage models
  • Define and enforce data product standards including SLAs, data contracts, and quality metrics for ML features and models

Platform Engineering

  • Configure and optimize both Azure ML compute instances and Azure Databricks clusters for performance and cost efficiency across federated domains
  • Integrate Azure ML pipelines and Databricks workflows with CI/CD systems to enable seamless, automated model deployments
  • Establish interoperability between Azure ML and Databricks ecosystems, enabling data scientists to leverage strengths of both platforms
  • Establish best practices for platform usage and ML workflow orchestration in a decentralized environment
  • Build feature store infrastructure (Azure ML Feature Store, Databricks Feature Store) that supports cross-domain feature sharing while maintaining domain autonomy

Monitoring & Operations

  • Build comprehensive monitoring systems to track model performance, data drift, feature quality, and infrastructure health
  • Implement monitoring solutions that span both Azure ML and Databricks deployments, providing unified observability
  • Design automated alerting and incident response processes for pipeline failures and degradation
  • Maintain operational visibility across the full ML stack using observability tools
  • Implement governance and observability frameworks that provide transparency across domain-owned ML products


Required Qualifications

Cloud & Infrastructure - Hands-on expertise with Azure services including compute, storage, networking, and security tailored for ML workloads - Advanced proficiency in Terraform with proven experience managing complex, multi-environment infrastructure - Demonstrated ability to collaborate effectively with infrastructure and DevOps teams on shared platform initiatives

ML Platform & Tools - Deep knowledge of Azure ML including workspace management, compute resources, pipeline orchestration, model deployment (managed endpoints, AKS), and MLOps capabilities - Deep knowledge of Azure Databricks, including cluster management, job orchestration, and Azure integrations - Experience integrating Azure ML and Databricks ecosystems to create unified ML workflows - Extensive experience with MLflow for experiment tracking, model registry, model serving, and production lifecycle management across both platforms - Proven experience designing and implementing feature stores (Azure ML Feature Store, Databricks Feature Store, or Feast) for online and offline feature serving

Data Mesh & Platform Architecture - Understanding of data mesh principles including domain ownership, data as a product, self-serve data infrastructure, and federated computational governance - Experience building platform capabilities that enable autonomous domain teams while maintaining organizational standards - Ability to design ML systems that support decentralized ownership with centralized governance

Development & Automation - Strong Python programming skills with familiarity in ML frameworks (scikit-learn, TensorFlow, PyTorch) and data processing libraries - Demonstrated ability to build CI/CD pipelines for ML systems using Azure DevOps, GitHub Actions, or similar platforms, including automated testing and deployment strategies - Experience with Azure ML SDK/CLI and Databricks APIs for workflow automation

Deployment & Monitoring - Solid understanding of containerization (Docker, Kubernetes) for ML model deployment and scaling - Experience with Azure ML model deployment options including managed endpoints, AKS, and Azure Container Instances - Experience with monitoring and observability platforms such as Azure Monitor, Application Insights, or equivalent tools for tracking model and infrastructure metrics - Experience implementing data quality monitoring and feature drift detection in production environments

Key Skills

Ranked by relevance