Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Stefanini Group is seeking a Senior Compute Engineer specialized in Red Hat OpenShift to strengthen our Compute Operations team and provide Level 3 (L3) expert support for enterprise customers running critical workloads on container and virtualization platforms.
This role is a key technical position focused on day-to-day operations, stability, and continuous improvement of OpenShift-based platforms. The engineer will act as the highest escalation point for complex incidents and problems, support platform lifecycle activities (upgrades, patching, performance tuning), and contribute to platform modernization initiatives - including VMware-to-OpenShift virtualization transformation programs.
The ideal candidate combines strong troubleshooting skills, deep infrastructure understanding, and hands-on OpenShift expertise, with the ability to work in a structured operational environment (ITIL/ managed services), while also supporting automation and standardization.
Job Responsibilities:
Level 3 Operations & Technical Escalation (Core Responsibility):
- Act as the L3 escalation point for complex technical issues related to:
- Red Hat OpenShift clusters (control plane, worker nodes, networking, storage, authentication)
- OpenShift Virtualization (KubeVirt) and VM-based workloads hosted on OpenShift
- Linux OS level issues impacting cluster stability or workloads.
- Own and drive resolution of:
- Major Incidents (P1/P2) with deep technical investigation and rapid recovery focus
- Recurring incidents through Problem Management (root cause analysis and permanent fixes).
- Lead deep troubleshooting activities:
- cluster degradation, node failures, API instability, etcd performance issues
- networking issues (ingress, routes, DNS, CNI, service connectivity)
- storage issues (persistent volumes, performance bottlenecks, CSI failures)
- workload failures (pods, operators, deployments, stateful applications).
- Provide clear technical updates during incidents, including impact assessment, recovery plan/ workaround, risks and next steps.
- Plan and execute OpenShift lifecycle activities such as version upgrades (cluster upgrades and operator upgrades), patching and security hardening and certificate management and renewal processes.
- Validate platform readiness before changes: capacity, compatibility, performance, known issues.
- Maintain high availability and resilience: backup/restore strategy support (including etcd backup practices), disaster recovery readiness and operational runbooks.
- Ensure operational compliance with defined maintenance windows and change governance.
- Support enterprise modernization initiatives involving migration from traditional virtualization platforms (VMware) to OpenShift Virtualization
- Contribute to:
- migration approach definition and technical design support
- workload onboarding, validation, and stabilization on OpenShift
- performance tuning and operational model definition for VM-based workloads on OpenShift.
- Ensure production-grade operational readiness: monitoring, alerting, backup, patching and support model aligned with managed services standards.
- Develop and maintain operational documentation, including troubleshooting guides, standard operating procedures (SOPs), build standards and reference architectures, operational runbooks for recurring tasks.
- Support automation initiatives using tools such as: Ansible / Automation Platform (preferred), GitOps practices (ArgoCD) where applicable and scripting (Bash / Python) to reduce manual operations.
- Proactively identify improvements to increase platform stability, recovery speed (MTT), repeatability and reduction of human error.
- Support and improve observability across the platform, including:
- OpenShift monitoring stack (Prometheus / Alertmanager / Grafana)
- log management (e.g., EFK / Loki or enterprise logging platforms).
- Troubleshoot performance issues related to compute resource constraints, scheduling and resource requests/limits and cluster scaling and capacity planning.
- Work with customer stakeholders and internal teams to define alert thresholds, reduce noise and false positives and improve operational dashboards and health reporting.
- Ensure the platform is operated in a secure manner aligned with enterprise expectations:
- RBAC best practices
- integration with enterprise identity providers (LDAP / AD / SSO)
- secure cluster configuration and segregation
- Support vulnerability remediation and platform hardening initiatives
- Collaborate with Security teams for audits, compliance requests, and evidence collection.
Mandatory Technical Skills:
- Strong hands-on experience with Red Hat OpenShift administration and operations.
- Strong Linux background (RHEL preferred), including troubleshooting OS performance, services, networking, and storage.
- Solid understanding of Kubernetes fundamentals: pods, deployments, services, ingress, namespaces, RBAC, operators.
- Experience troubleshooting infrastructure-related issues across compute, network, storage, and platform services.
- Experience working in production environments with uptime and SLA commitments.
- Proven ability to operate as Level 3 support, including deep troubleshooting, structured root cause analysis and ownership until resolution.
- Ability to communicate clearly with customers (technical and non-technical stakeholders) and internal teams (L1/L2/architects/project teams).
- Strong documentation discipline and operational mindset.
- Experience with OpenShift Virtualization (KubeVirt) and VM-based workloads.
- Experience supporting VMware environments and understanding virtualization concepts: vSphere architecture, clusters, HA/DRS, storage/datastores, VM lifecycle.
- Experience with automation tools:
- Ansible / Red Hat Ansible Automation Platform
- GitOps tools (ArgoCD)
- Infrastructure as Code practices.
- Experience with enterprise storage and CSI integrations.
- Experience with enterprise networking topics (DNS, routing, firewall constraints, load balancing).
- Experience with public cloud OpenShift deployments (optional): ROSA / ARO / OCP on AWS/Azure/GCP.
- Red Hat Certified Specialist in OpenShift Administration (preferred)
- Red Hat Certified Engineer (RHCE) (strong advantage)
- Kubernetes certifications (CKA/CKAD) (nice to have).
- Work in an operational environment following ITIL practices (Incident / Problem / Change Management) and managed services delivery model and SLA commitments.
- Participate in on-call rotation, planned maintenance windows and technical escalation duty as required.
- Provide clear handovers and updates to ensure continuity across shifts/regions.
It's best to apply today, because job postings can be taken down and we wouldn't want you to miss this opportunity. In case you need further information, just send us a message at [email protected] and we'll be happy to assist!
The preceding job description had been designed to indicate the general nature and level of work performed by employees within this classification. It is not designed to contain or be interpreted as a comprehensive inventory of all duties and responsibilities required of employees assigned to this job.
Diversity & Inclusion
Here at the Stefanini Group, we value plurality and equity, regardless of race, sexual orientation, disability, age, ancestry, religion, gender, and nationality. We understand and encourage the importance of being you!
About Us
We are the Stefanini group, a global tech consulting company of Brazilian origin that believes in the power of people to transform businesses through technology.
We are present in over 40 countries and operate with the purpose of co-creating solutions TOGETHER WITH OUR CLIENTS that accelerate results and improve the experience of people and organizations.
Here, we like to say that technology is not the end, but the means: what really matters are the people who drive it all.
Our mindset is AI First, meaning we invest in cutting-edge technology in everything we do, focusing on results for our clients.
We are a company, A GROUP, that breathes collaboration and offers a dynamic environment where you will learn by doing, grow alongside the team, and have space to contribute with ideas and projects.
More than just talking about digital transformation, we believe in real transformation that starts with people and impacts real businesses.
If you are looking for a place to develop, innovate, and be part of something bigger, the Stefanini Group is your place.
We want to inform you that there are currently scams targeting job seekers by falsely using our company's name, Stefanini. We sincerely apologize for any confusion or inconvenience this may have caused.
Please remember that legitimate job offers from Stefanini will always come through official channels, including direct communication with our trained recruiters. If you receive any unsolicited messages requesting payment or personal information, please disregard them.
If you suspect you've been targeted, please contact us immediately at [email protected] for verification.
Key Points to Remember:
- Legitimate job offers only follow interviews conducted with our hiring managers or clients.
- We will never ask for payment at any stage of the recruitment process.
Key Skills
Ranked by relevanceReady to apply?
Join Stefanini EMEA and take your career to the next level!
Application takes less than 5 minutes

