Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Project info:
You will join an SRE-aligned operations team responsible for keeping a mission-critical, global cloud platform reliable, performant, and secure. The project focuses on 24/7 cloud operations, proactive monitoring, incident response, and continuous improvement of observability coverage across multi-region GCP environments. You will work closely with SRE, Cloud Engineering, and development teams to maintain high availability, support business continuity, and drive operational excellence.
Responsibilities:
- Monitor cloud infrastructure across multiple regions using advanced observability and monitoring tools.
- Respond to alerts and incidents in real time; provide supporting data for root cause analysis and escalate issues when required.
- Troubleshoot issues related to cloud networking, containers, storage, and APIs.
- Maintain and continuously improve troubleshooting guides (TSGs), incident response procedures, and operational documentation.
- Collaborate with SRE, Cloud Engineering, and development teams to resolve infrastructure and reliability issues.
- Perform routine health checks across the cloud environment.
- Perform routine patching and upgrades of observability and monitoring agents across the platform.
- Ensure compliance with SLAs, security policies, and operational standards.
- Participate in a 24/7 on-call rotation and support disaster recovery and business continuity activities.
- Analyze performance metrics and provide recommendations for optimization and automation.
Job requirements:
- Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent professional experience).
- 1–2 years of experience in a NOC, operations, or cloud infrastructure support role.
- Strong understanding of cloud platforms and services (AWS, Azure, GCP).
- Familiarity with container orchestration technologies (Kubernetes, GKE) and CI/CD pipelines.
- Experience with monitoring and logging tools such as Datadog, Dynatrace, Prometheus, Grafana, ELK, CloudWatch, Splunk, Sumo Logic, New Relic, or similar solutions.
- Proficiency in Linux/Unix environments.
- Basic scripting or automation skills (Python, Bash, PowerShell) and/or Infrastructure as Code exposure (Terraform).
- Strong communication skills, with the ability to document incidents and collaborate effectively during
Must possess a legal work permit in Poland
Benefits:
General benefits - depends on the form of employment
- Hybrid work model combining office & remote work
- Attractively located office with collaboration spaces
- Onsite parking space for employees
- Referral program with financial bonus
- Life Insurance
- Budget for development (including language courses and others), clear career path with the possibility to gain experience in international environment
- Access to internal Learning Platform with multiple trainings oriented for professional growth
Lifestyle benefits:
- Access to MyBenefit platform (Multisport included)
- Team Building activities
- Charity initiatives
- Working environment promoting diversity and inclusion
Health benefits:
- Private medical care - Platinum Package
Key Skills
Ranked by relevanceReady to apply?
Join Infogain Poland and take your career to the next level!
Application takes less than 5 minutes

