-
Storyful

AI Architect

Storyful
Ireland · Full-time · Not Applicable

Storyful is an equal opportunity employer

Job Description

Reporting to: Chief Product & Technology Officer (CPTO)

Dublin (Hybrid — 3 days/week in office)

Team: Product & Engineering (foundational hire for Data Science / AI function)

Mission: Build a next-generation Risk & Insights Intelligence Platform that disrupts media monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.

This role is for someone who can architect and build (hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over evaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.

Why This Role Exists

We’re building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You’ll shape the technical direction including network science and explainability early: agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.

What You’ll Do (Responsibilities)

  • Architect and ship agentic GenAI systems
  • Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
  • Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
  • Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).
  • Build RAG + Graph RAG for multi-doc intelligence
  • Deliver RAG chatbots for investigation and exploration across large document sets.
  • Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
  • Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.
  • deep agents or deep research; graph traversal strategies (network science); agentic RAG
  • Multi-document classification + scoring (risk-focused)
  • Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
  • Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
  • Bonus: experience building “risk detection” classifiers and adverse media style pipelines.
  • Context engineering + automatic prompt improvement
  • Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
  • Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.
  • Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval
  • Evaluation: make quality measurable and repeatable
  • Build robust evaluation methodologies for prompts, RAG, summarization, and classification.
  • Apply multiple evaluation techniques, including:
    • offline metrics (precision/recall/F1 where appropriate)
    • retrieval metrics and ablations
    • LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
  • Define quality gates that allow the team to move fast without breaking trust.
  • Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.
  • LLMOps + cost control
  • Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.
  • Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.
  • Deploying and maintaining open source models
  • Lead by influence (and occasionally by direct leadership)
  • Bring “Senior/Lead Engineer” judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
  • Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.
What success looks like (first 6–12 months)

  • A production-grade agentic architecture powering key workflows (investigate → summarize → classify → score → recommend action).
  • A measurable evaluation framework where quality improves release over release.
  • A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
  • Clear cost/performance tradeoffs and observability that make the system operable at scale.
  • A team around you that’s leveled up in GenAI engineering practices.

Required Experience (Must-have)

  • Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.
  • Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).
  • Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.
  • Hands-on experience in AWS and GCP (Azure acceptable as additional).
  • Production experience with:
    • RAG chatbots
    • multi-document summarization (ideally Graph RAG)
    • multi-document classification
    • scoring methodologies (risk scoring is a strong bonus)
  • Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.
  • Strong LLMOps and GenAI product design experience: experimentation → deployment → monitoring → iteration.
Nice-to-have (Strong bonuses)

  • Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
  • Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
  • Experience running annotation programs / building labeled datasets for NLP tasks.

Skills & tools (examples)

We don’t require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.

GenAI frameworks & LLMs

  • LangChain, LlamaIndex
  • OpenAI / Gemini / Claude
  • Vector RAG + Graph RAG patterns

LLMOps / experimentation / observability

  • MLflow (experiments, tracking)
  • Langfuse (prompt & trace observability)

Data & retrieval

  • Neo4j (graph), ElasticSearch
  • Vector stores (Pinecone-style capability), embeddings, semantic chunking

Cloud / infrastructure (examples)

  • AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
  • GCP (plus Azure exposure helpful)

Languages

  • Python (primary), TypeScript, Java (Ruby on Rails experience welcome)

Job Category

Storyful - Product & Technology

Key Skills

Ranked by relevance

product design ai ruby on rails typescript prototypes dynamodb neo4j cloud java ruby aws gcp
Login to Apply
Posted
May 14, 2026
Type
Full-time
Level
Not Applicable
Location
Dublin
Company
Storyful

Industries

Media Production

Categories

Design Art/Creative Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Alma Media
Related

Lead Designer

2026-05-20

Full-time
Not Applicable
Finland
Media Production
Design
View Job Details
MrBeast
Related

Thumbnail Designer

2026-05-06

Full-time
Not Applicable
Brazil
Media Production
Design
View Job Details
MrBeast
Related

Thumbnail Designer

2026-05-06

Full-time
Not Applicable
Poland
Media Production
Design