We are looking for an AI/ML Ops Engineer to support the deployment, monitoring, and operational reliability of AI-powered systems in production environments.
This role combines elements of DevOps, cloud engineering, and AI system support. The ideal candidate should be comfortable working with cloud infrastructure, monitoring tools, and modern AI workflows, while collaborating closely with engineering and AI teams.
Key Responsibilities
- Support deployment and operational management of AI/ML applications and services
- Monitor AI systems using logs, metrics, tracing, and observability tools
- Troubleshoot and debug AI workflows, pipelines, and runtime failures
- Assist in maintaining scalable, secure, and reliable cloud infrastructure
- Support prompt experimentation, version tracking, and A/B testing activities
- Collaborate with engineering teams to improve system reliability, performance, and automation
- Maintain CI/CD workflows and deployment pipelines for AI services
- Participate in incident investigation, root cause analysis, and operational support activities
Preferred Skills & Experience
- Experience working with AWS cloud services
- Familiarity with monitoring and observability tools such as:
- Amazon CloudWatch
- Datadog
- Understanding of logging, metrics, tracing, and alerting concepts
- Familiarity with AI/ML workflows and LLM-based applications
- Experience with containers, APIs, and deployment pipelines
- Familiarity with scripting or programming languages such as Python, JavaScript, or Bash
- Understanding of DevOps, cloud infrastructure, and operational best practices
- Knowledge of infrastructure-as-code tools such as Terraform or CloudFormation
- Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field
- 2+ years of experience in DevOps, Platform Engineering, SRE, MLOps, or AI infrastructure-related roles
What We’re Looking For
- Strong problem-solving and debugging capabilities
- Interest in modern AI operational practices and production AI systems
- Good communication and collaboration skills
- Practical mindset focused on reliability, scalability, and continuous improvement
- Ability to work in a fast-paced and evolving technology environment
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Senior Software Engineer (Infrastructure)
2026-05-27
Network Engineer
2026-05-27
Generative AI Engineer
2026-06-01
- Posted
- May 19, 2026
- Type
- Full-time
- Level
- Not Applicable
- Location
- São Paulo
- Company
- 99x
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Senior Software Engineer (Infrastructure)
2026-05-27
Network Engineer
2026-05-27
Generative AI Engineer
2026-06-01