Tribus
Artificial Intelligence Engineer
TribusAustralia12 hours ago
Full-timeRemote FriendlyEngineering, Information Technology

Software Engineer - AI


LLM | Python | AWS


We’re partnering with a fast-growing software company building AI-driven products used in high-stakes, real-world workflows.


The focus is on production-quality AI: systems that must be reliable, measurable, and safe at scale.


They’re looking for a Software Engineer with AI experience to join a team responsible for the core AI platform, with a particular emphasis on LLM evaluation, observability, and reliability.


This is a hands-on engineering role, sitting close to product and domain experts, where your work directly influences how AI quality is defined, measured, and enforced in production.


What you’ll work on


  • Building and operating LLM evaluation pipelines that assess model quality, robustness, and safety
  • Defining test sets, metrics, and evaluation workflows, including human-in-the-loop processes where required
  • Translating product and domain constraints into concrete, testable evaluation criteria
  • Running and orchestrating distributed evaluation workloads on AWS, including monitoring compute usage
  • Analysing evaluation results, identifying failure modes, and collaborating on mitigations (prompt changes, data updates, model selection or fine-tuning)
  • Integrating and assessing open-source and vendor evaluation frameworks, writing glue code where needed
  • Contributing to the evolution of the AI evaluation and platform architecture


What they’re looking for


  • Strong Python engineering skills
  • Experience monitoring and evaluating LLM-based applications
  • Hands-on exposure to LLM evaluation tools, benchmarks, and metrics
  • Understanding of common LLM failure modes (e.g. hallucination, bias, toxicity, prompt injection)
  • Experience with cloud ML infrastructure, ideally AWS
  • Familiarity with distributed workloads (e.g. Ray, AWS Lambda, or similar)
  • Comfort working with an evolving LLM observability and evaluation stack
  • Ability to work with non-ML stakeholders and convert qualitative requirements into quantitative tests


Working environment & benefits


  • Flexible hybrid setup, with twice-weekly collaboration in a modern CBD office
  • Strong learning and career development opportunities in a scaling business
  • Wellness focus including additional leave and gym membership
  • Collaborative team culture with regular social events
  • Pool table, snacks, and a genuinely supportive environment


This role is well suited to engineers who care about AI reliability and correctness, and who want to work on systems where evaluation and safeguards genuinely matter.


Must be based in Sydney with full working rights. Remote working or sponsorship is not available for this role.

Key Skills

Ranked by relevance