-
View all jobs
About The Company
Our mission is to scale intelligence to serve humanity. We are dedicated to training and deploying frontier models for developers and enterprises who are building AI systems aimed at powering innovative and magical experiences such as content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. We believe that our work plays a crucial role in advancing the widespread adoption of artificial intelligence, making it accessible and beneficial across various industries. Our team is composed of passionate professionals committed to pushing the boundaries of AI technology and delivering impactful solutions that transform the way people interact with digital content and services.
About The Role
We are seeking a highly skilled and motivated Member of Technical Staff to join our Model Serving team at Cohere. In this role, you will be responsible for developing, deploying, and maintaining our AI platform that delivers Cohere's large language models through user-friendly API endpoints. You will work closely with cross-functional teams to optimize NLP models for production environments characterized by low latency, high throughput, and high availability. This position offers the opportunity to interface directly with customers, understand their unique deployment needs, and create customized solutions to meet those requirements. Your expertise will be instrumental in ensuring the reliability, scalability, and performance of our AI systems, enabling seamless integration of advanced NLP capabilities into real-world applications.
Qualifications
We value and celebrate diversity and are committed to creating an inclusive work environment for all employees. We welcome applicants from all backgrounds and are dedicated to providing equal employment opportunities. If you require accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work with you to meet your needs.
Our mission is to scale intelligence to serve humanity. We are dedicated to training and deploying frontier models for developers and enterprises who are building AI systems aimed at powering innovative and magical experiences such as content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. We believe that our work plays a crucial role in advancing the widespread adoption of artificial intelligence, making it accessible and beneficial across various industries. Our team is composed of passionate professionals committed to pushing the boundaries of AI technology and delivering impactful solutions that transform the way people interact with digital content and services.
About The Role
We are seeking a highly skilled and motivated Member of Technical Staff to join our Model Serving team at Cohere. In this role, you will be responsible for developing, deploying, and maintaining our AI platform that delivers Cohere's large language models through user-friendly API endpoints. You will work closely with cross-functional teams to optimize NLP models for production environments characterized by low latency, high throughput, and high availability. This position offers the opportunity to interface directly with customers, understand their unique deployment needs, and create customized solutions to meet those requirements. Your expertise will be instrumental in ensuring the reliability, scalability, and performance of our AI systems, enabling seamless integration of advanced NLP capabilities into real-world applications.
Qualifications
- 5+ years of engineering experience managing production infrastructure at a large scale
- Proficiency in designing large, highly available distributed systems with Kubernetes
- Experience working with GPU workloads within Kubernetes clusters
- Hands-on experience with Kubernetes development, deployment, and support
- Familiarity with cloud platforms such as GCP, Azure, AWS, OCI, and multi-cloud or hybrid environments
- Strong background in Linux-based computing environments, including deployment, support, and troubleshooting
- Experience with compute, storage, network resource management, and cost optimization
- Excellent collaboration, troubleshooting, and problem-solving skills for mission-critical systems
- Grit and adaptability to solve evolving technical challenges
- Knowledge of accelerators like GPUs, TPUs, or custom accelerators and their impact on latency and throughput
- Strong understanding or experience with distributed systems architecture
- Proficiency in programming languages such as Golang, C++, or other high-performance server-side languages
- Develop, deploy, and support scalable NLP models and AI platforms
- Design and implement distributed systems that ensure high availability and low latency
- Optimize GPU and accelerator workloads for inference performance
- Collaborate with cross-functional teams to create customized deployment solutions for clients
- Monitor system performance, troubleshoot issues, and implement improvements
- Support multi-cloud and hybrid deployment architectures
- Manage compute, storage, and network resources efficiently to control costs
- Contribute to the development of best practices for deployment, scaling, and maintenance of machine learning systems
- Engage with customers to understand their needs and deliver tailored AI solutions
- Stay updated with the latest advancements in AI infrastructure and incorporate innovative techniques into our platform
- An open and inclusive culture fostering innovation and collaboration
- Opportunity to work alongside a team at the forefront of AI research and development
- Weekly lunch stipend, in-office lunches, and snacks
- Comprehensive health and dental benefits, including mental health support
- 100% parental leave top-up for up to six months
- Personal enrichment benefits covering arts, culture, fitness, well-being, and workspace improvements
- Remote-flexible work arrangements with offices in Toronto, New York, San Francisco, London, and Paris, plus co-working stipends
- Six weeks of vacation (30 working days) to promote work-life balance
We value and celebrate diversity and are committed to creating an inclusive work environment for all employees. We welcome applicants from all backgrounds and are dedicated to providing equal employment opportunities. If you require accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work with you to meet your needs.
Key Skills
Ranked by relevance
ai
cloud
high availability
kubernetes
storage
artificial intelligence
machine learning
golang
server
linux
aws
gcp
san
c
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Fullstack Software Engineer
2026-03-22
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Frontend Developer
2026-03-15
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Full-Stack Engineer
2026-05-29
Full-time
Not Applicable
Germany
Technology
Engineering
Login to Apply
- Posted
- Feb 24, 2026
- Type
- Full-time
- Level
- Associate
- Location
- Canada
- Company
- Wiraa
Industries
Technology
Information
Internet
Categories
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Fullstack Software Engineer
2026-03-22
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Frontend Developer
2026-03-15
Full-time
Associate
Canada
Technology
Information Technology
View Job Details
Related
Full-Stack Engineer
2026-05-29
Full-time
Not Applicable
Germany
Technology
Engineering