Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
BCforward is seeking a highly motivated MLOps Engineer to support their pharmaceutical client in San Diego, CA.
Expected Duration: Long term
Location: San Diego, CA or Indianapolis, IN/hybrid
About the Role
We're seeking an experienced MLOps Engineer to build and scale the infrastructure powering next-generation in-silico protein design and engineering. In this role, you'll bridge the gap between cutting-edge AI research and production systems, working at the intersection of machine learning, computational biology, and high-performance computing. You'll collaborate closely with both computational scientists and platform engineers to accelerate the development and deployment of foundational models for protein engineering.
What You'll Do
- Build and maintain ML infrastructure: Design, implement, and optimize CI/CD pipelines using GitHub Actions for model training, evaluation, and deployment workflows
- Orchestrate compute resources: Manage and scale workloads across Kubernetes clusters and SLURM-based HPC environments, ensuring efficient resource utilization for large-scale model training
- Develop ML-ready data pipelines: Build robust, scalable data processing and loading pipelines for diverse biological data types, including protein structure files (PDBs), sequence databases, and experimental assay readouts. The primary focus is to optimize ML-ready data delivery for model training, testing, and benchmarking.
- Enable research velocity: Create tools and infrastructure that empower researchers to iterate quickly on protein language models, diffusion models, and flow-based generative models
- Optimize for scale: Architect systems that can handle multi-modal datasets and train large foundational models efficiently across distributed computing environments
- Monitor and maintain: Implement monitoring, logging, and alerting systems to ensure reliability and performance of production ML systems
Required Qualifications
- Strong MLOps foundation: 3+ years of experience in MLOps, ML infrastructure, or related roles
- CI/CD expertise: Demonstrated experience building and maintaining CI/CD pipelines, particularly with GitHub Actions
- Container orchestration: Hands-on experience with Kubernetes for deploying and managing containerized applications. Familiarity with container registries and Kubernetes.
- HPC systems: Proficiency with SLURM or similar job scheduling systems for high-performance computing environments
- Programming skills: Strong Python skills; experience with ML frameworks (PyTorch, TensorFlow, JAX)
- Data engineering: Experience building scalable data pipelines and ETL processes
- DevOps practices: Familiarity with infrastructure-as-code, version control, and collaborative development workflows
- Cloud platforms: Experience with cloud infrastructure (AWS, GCP, Azure) for ML workloads
Preferred Qualifications
- Performance optimization: Experience with distributed training, mixed precision, gradient checkpointing, and other optimization techniques
- Large model infrastructure: Experience training or deploying large-scale foundational models (billions of parameters)
- Biological data experience: Prior work with scientific data, particularly in computational biology, bioinformatics, or related fields
- Protein AI models: Understanding of protein language models (ESM, ProtGPT, etc.) and their training requirements
- Protein structure expertise: Experience working with protein structure data formats (PDB, mmCIF), structural bioinformatics tools, and 3D molecular data
- Generative models: Familiarity with diffusion models, flow-based models, or other generative approaches for molecular design
- Cross-functional collaboration: Experience working with experimental scientists and translating wet-lab requirements into computational solutions
- Agentic systems: Building and deployment MCP clients/servers, particularly for AI model-as-a-tool use cases, and integration with our internal LLMs
Bonus Points
- Publications or contributions to computational biology or protein engineering projects
- Experience with structure prediction tools (AlphaFold, ESMFold, RoseTTAFold)
- Familiarity with laboratory information management systems (LIMS) or assay data formats, particularly Benchling.
- Contributions to open-source ML or bioinformatics projects
- Experience with vector databases or embedding-based retrieval systems
- Experience deploying multi-agent tooling using MCP, LangGraph, and Langchain
What Success Looks Like
- Researchers can train and evaluate models with minimal friction
- Data pipelines reliably handle TB-scale datasets with diverse formats
- Infrastructure scales seamlessly from prototype to production
- Experimental data flows efficiently from various sources to model training
- Model deployment cycles are measured in hours, not weeks
Key Skills
Ranked by relevanceReady to apply?
Join BCforward and take your career to the next level!
Application takes less than 5 minutes

