-
View all jobs
About The Company
Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, retrieval-augmented generation (RAG), and autonomous agents. We believe that our work is instrumental to the widespread adoption of artificial intelligence, enabling innovative solutions across various industries. Our commitment to advancing AI technology is reflected in our collaborative environment, cutting-edge research, and focus on impactful applications. We strive to foster a culture of inclusivity, diversity, and continuous learning, ensuring our team members are empowered to contribute their best and drive meaningful change in the AI landscape.
About The Role
Are you energized by building high-performance, scalable, and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications? We are seeking Members of Technical Staff to join our Model Serving team at Cohere. In this role, you will be responsible for developing, deploying, and operating our AI platform that delivers Cohere’s large language models through user-friendly API endpoints. You will collaborate closely with cross-functional teams to deploy optimized NLP models into production environments characterized by low latency, high throughput, and high availability. Additionally, you will have the opportunity to interface directly with customers, creating customized deployments to meet their specific needs. This position offers a unique chance to work on the forefront of AI technology, solving complex technical challenges, and contributing to the evolution of scalable AI infrastructure.
Qualifications
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. If you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.
Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, retrieval-augmented generation (RAG), and autonomous agents. We believe that our work is instrumental to the widespread adoption of artificial intelligence, enabling innovative solutions across various industries. Our commitment to advancing AI technology is reflected in our collaborative environment, cutting-edge research, and focus on impactful applications. We strive to foster a culture of inclusivity, diversity, and continuous learning, ensuring our team members are empowered to contribute their best and drive meaningful change in the AI landscape.
About The Role
Are you energized by building high-performance, scalable, and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications? We are seeking Members of Technical Staff to join our Model Serving team at Cohere. In this role, you will be responsible for developing, deploying, and operating our AI platform that delivers Cohere’s large language models through user-friendly API endpoints. You will collaborate closely with cross-functional teams to deploy optimized NLP models into production environments characterized by low latency, high throughput, and high availability. Additionally, you will have the opportunity to interface directly with customers, creating customized deployments to meet their specific needs. This position offers a unique chance to work on the forefront of AI technology, solving complex technical challenges, and contributing to the evolution of scalable AI infrastructure.
Qualifications
- 5+ years of engineering experience managing production infrastructure at a large scale
- Proficiency in designing large, highly available distributed systems using Kubernetes
- Experience with GPU workloads on Kubernetes clusters
- Hands-on experience with Kubernetes development, deployment, and support in production environments
- Familiarity with cloud platforms such as GCP, Azure, AWS, OCI, and multi-cloud/on-premises/hybrid environments
- Strong background in designing, deploying, supporting, and troubleshooting complex Linux-based computing environments
- Knowledge of compute, storage, network resource management, and cost optimization
- Excellent collaboration and troubleshooting skills for building mission-critical systems
- Ability to adapt and solve evolving complex technical challenges
- Understanding of the computational characteristics of accelerators such as GPUs, TPUs, or custom accelerators, and their impact on latency and throughput
- Working experience with distributed systems architecture and implementation
- Proficiency in high-performance programming languages such as Golang, C++, or similar
- Develop, deploy, and maintain scalable AI infrastructure to support large language models
- Design and implement highly available, low-latency distributed systems using Kubernetes and cloud technologies
- Support GPU and accelerator workloads, optimizing performance and resource utilization
- Collaborate with cross-functional teams to integrate models into production environments and ensure operational excellence
- Troubleshoot and resolve complex system issues, ensuring high system reliability and uptime
- Optimize compute, storage, and network resources to balance performance and cost efficiency
- Interface with customers to understand their deployment needs and deliver customized solutions
- Contribute to the continuous improvement of deployment pipelines, automation, and system robustness
- Stay updated with the latest advancements in AI infrastructure and incorporate best practices
- Inclusive and collaborative work environment fostering innovation
- Opportunity to work with cutting-edge AI research and technology
- Weekly lunch stipend, in-office lunches, and snacks
- Comprehensive health and dental benefits, including mental health support
- 100% parental leave top-up for up to six months
- Personal enrichment benefits covering arts, culture, fitness, well-being, and workspace improvements
- Remote-flexible work arrangements with offices in Toronto, New York, San Francisco, London, and Paris, plus co-working stipends
- Six weeks of vacation (30 working days) to promote work-life balance
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. If you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.
Key Skills
Ranked by relevance
ai
kubernetes
storage
cloud
artificial intelligence
high availability
machine learning
golang
linux
aws
gcp
san
c
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Fullstack Software Engineer
2026-03-22
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Frontend Developer
2026-03-15
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Full-Stack Engineer
2026-05-29
Full-time
Not Applicable
Germany
Technology
Engineering
Login to Apply
- Posted
- Feb 21, 2026
- Type
- Full-time
- Level
- Associate
- Location
- Canada
- Company
- Wiraa
Industries
Technology
Information
Internet
Categories
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Fullstack Software Engineer
2026-03-22
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Frontend Developer
2026-03-15
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Full-Stack Engineer
2026-05-29
Full-time
Not Applicable
Germany
Technology
Engineering