Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
You will be responsible for creating the bridge between the R&D team, who train models, and the applications that consume them. This means developing robust APIs, deploying and optimising models on Triton Inference Server (or similar frameworks), and ensuring real-time, scalable inference.
Responsibilities
API Development
- Design, build, and maintain production-ready APIs for speech, language, and other AI models.
- Provide SDKs and documentation to enable easy developer adoption.
- Deploy models (ASR, LLM, and others) using Triton Inference Server or similar systems.
- Optimise inference pipelines for low-latency, high-throughput workloads.
- Architect infrastructure for handling large-scale, concurrent inference requests.
- Implement monitoring, logging, and auto-scaling for deployed services.
- Work with research teams to productionize new models.
- Partner with application teams to deliver AI functionality seamlessly through APIs.
- Automate CI/CD pipelines for models and APIs.
- Manage GPU-based infrastructure in cloud or hybrid environments.
Core Skills
- Strong programming experience in Python (FastAPI, Flask) and/or Go/Node.js for API services.
- Hands-on experience with model deployment using Triton Inference Server, TorchServe, or similar.
- Familiarity with both ASR frameworks and LLM frameworks (Hugging Face Transformers, TensorRT-LLM, vLLM, etc.).
- Experience with Docker, Kubernetes, and managing GPU-accelerated workloads.
- Deep knowledge of real-time inference systems (REST, gRPC, WebSockets, streaming).
- Cloud experience (AWS, GCP, Azure).
- Experience with model optimisation (quantisation, distillation, TensorRT, ONNX).
- Exposure to MLOps tools for deployment and monitoring
Key Skills
Ranked by relevanceReady to apply?
Join T-Pro and take your career to the next level!
Application takes less than 5 minutes