INSUS - AI Solutions for Sustainable Transformation
DevOps / Platform Engineer
INSUS - AI Solutions for Sustainable TransformationSpain15 hours ago
Full-timeEngineering, Information Technology

DevOps / Platform Engineer

Platform Engineer Level II — SWENG

Europe · Full-Time · GCP Primary / AWS Secondary

The Mission

We are not looking for someone to “run scripts.” We are looking for a Platform Architect who understands that in our environment, a single configuration change propagates across 98 services and 55 products. Every decision has blast radius. Every action must be preceded by assessment.

You will join a team of three Platform Engineers responsible for a massive, shared global estate across GCP and AWS. This role is about building the “paved road” for our software engineers — designing scalable, secure, and automated environments where safety is built-in, not bolted on.

The Operating Reality

SWENG operates a shared platform delivered by 5 engineers. Every engineer operates with full autonomy and full accountability from day one. There is no onboarding ramp that absorbs mistakes at this scale.

Scale

98 services · 22 environments · 55 products · 80+ edge locations across GCP and AWS Team

5 engineers total (3 Platform Engineers). No supervisory capacity. No error correction buffer. Autonomy

You assess context, analyze failure modes, and communicate structured decisions before touching the keyboard. Philosophy

Process discipline is what allows us to move fast. You are a process-oriented engineer who treats infrastructure as a product.

Core Responsibilities

Architectural Ownership

Design and implement highly available, secure infrastructure on GCP (primary) and AWS. You are not just building it — you are ensuring it is cost-effective, scalable, and relevant to 55 products simultaneously.

Infrastructure as Code (IaC)

Treat the entire estate as software using Terraform. Manage complex state files and ensure modularity across all 22 environments. Every infrastructure change is code-reviewed, not clicked.

Guardrail Engineering

Build and maintain CI/CD pipelines (GitHub Actions / Jenkins) that do not just deploy code — they enforce security and governance automatically. The pipeline is the last line of defence before 98 services are affected.

Systems Thinking & Advisory

Act as a consultant to the Software Engineering team. Challenge decisions that are not scalable. Communicate tradeoffs using a structured Impact → Options → Recommendation framework. A well-reasoned advisory is as valuable as the implementation.

Observability

Build the Prometheus / Grafana / Stackdriver telemetry that predicts outages — not just reacts to them. Instrument proactively; alert meaningfully.

MLOps Scaling

Support the scaling of machine learning products (Kubeflow Pipelines) to meet global demand across all environments.

Who You Are — Requirements

Experience

 5+ years in DevOps / SRE with a proven track record in Platform Engineering — managing shared infrastructure for multiple teams simultaneously.

GCP Mastery

 Deep, production-level experience with Google Cloud Platform and Kubernetes (GKE).

 You have operated GCP at scale — not just provisioned resources.

The “Architect” Mindset

This is the most critical requirement. You must demonstrate:

 Structured communication: Problem → Impact → Options → Recommendation, without supervision.

 Blast radius awareness: You do not say “it might break.” You explain how it breaks, what is affected, and what the recovery path is.

 Context-first approach: Before any action, you assess what exists, what is affected, who needs to know, and what the downstream consequences are across the shared estate.

 Failure mode thinking: You anticipate failure scenarios and design for graceful degradation, not just happy-path operation.

Governance-First

 You understand that in a global environment with 55 products, following procedural processes is not overhead — it is a survival requirement.

 You operate within change management frameworks and onboard others into them effectively.

 You distinguish between urgency and risk — a CVSS 9.8 vulnerability requires contextual assessment (exposure, exploitability, blast radius), not a reflexive “drop everything.”

Automation Obsessed

 Expert-level Python scripting and a delete-manual-tasks mentality.

 You automate detection, not just remediation. Manual checking is a process gap, not a strategy.

Accountability Orientation

 You take ownership of outcomes, not just task completion.

 You surface risks proactively to your team lead, with structured status: what is done, what the risk is, what you need.

 You do not patch silently. You communicate clearly before, during, and after changes that affect shared infrastructure.

Location

 Based in Europe for time zone alignment with the team.

Nice to Have

 Hands-on experience with MLOps and Data Science tooling (Kubeflow, Vertex AI).

 Deep knowledge of AWS (EC2, S3, RDS, Lambda) to manage our secondary environment.

 Advanced Log Management (ELK / Splunk).


Compensation : 60.000 - 70.000 EUR (B2B Contract)

Languages : Fluent English


PLEASE DON'T APPLY IF YOU ARE USING AI DURING JOB INREVIEW OR YOU ARE NOT A REAL PERSON.

Key Skills

Ranked by relevance