TrekAI
DevOps Engineer
TrekAIUnited States7 hours ago
Full-timeEngineering, Information Technology

Role Purpose

TrekAI is at the forefront of reinventing education through AI. We are a high-growth, mission-driven startup where your work directly impacts teachers, students, and entire school systems.


TrekAI is building the next generation of AI-driven education technology, and we need a DevOps Systems Engineer to ensure our cloud platform is fast, reliable, and resilient. This is a hands-on role focused on operational excellence, developer experience, and customer responsiveness. You will automate deployments, harden infrastructure, and make sure TrekAI’s multi-agent learning platform scales securely and smoothly as adoption grows.


You’ll work closely with the Systems Architect to design scalable topologies, with the Engineering Leader to streamline CI/CD pipelines and developer workflows, and with the AI/Data Science Leader to deploy and monitor model-serving infrastructure. Your work will directly impact how quickly TrekAI can respond to schools, ship improvements, and recover from incidents — making you a critical enabler of customer trust and satisfaction.


Key Responsibilities

Platform Automation & CI/CD

  • Build and maintain CI/CD pipelines for microservices and AI models.
  • Automate infrastructure with Terraform, Helm, ArgoCD for reproducibility and speed.

Operations & Monitoring

  • Deploy and manage observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, Alloy.
  • Instrument systems for metrics, logging, tracing, and error detection to improve uptime and recovery.
  • Manage and maintain service-level dashboards and alerting for production systems.

Resilience & BC/DR

  • Implement backup, failover, and disaster recovery strategies to ensure ≥99.9% uptime.
  • Run DR tests and incident simulations to validate recovery plans.

Developer Experience

  • Shorten lead time for changes and improve local-to-production consistency.
  • Provide self-service environments for developers and QA.

Customer Responsiveness

  • Support school pilots, rollouts, and live trials by ensuring platform readiness.
  • Rapidly address production issues to minimize impact on teachers and students.


Required Education & Experience

  • BS in Computer Science, Engineering, or related discipline and/or equivalent 5+ years of hands-on DevOps or systems engineering experience in SaaS or platform environments.
  • Strong cloud experience (AWS, GCP, or Azure) with virtualization technologies, virtual machine environments,  VM and container orchestration (Kubernetes/OpenShift).
  • Solid knowledge and administrative experience with Linux distributions (e.g., Ubuntu, Debian, RHEL, NixOS), cloud networking administration and Windows (client side) / Mac OS (client-side)
  • Solid programming and DB skills: Python, React, Node.js, Java, Json, SQL, NoSQL
  • Expertise with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.).
  • Familiarity with observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, and log aggregation pipelines.
  • Experience implementing BC/DR procedures and failover strategies.
  • Knowledge of networking (routing, TLS certs), secrets management, and secure RBAC.
  • Educational technology (EdTech) or SaaS platform experience is a plus.
  • Startup experience is a plus

Key Skills

Ranked by relevance