Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
What You Will Do
- Cloud infrastructure architecture and operations (Azure and AWS)
- Container orchestration at scale (Kubernetes)
- Data platform infrastructure (Spark/Databricks)
- Platform observability, performance optimization, and cost management
- Infrastructure automation and self-service capabilities
- Incident response and system resilience
Major Initiatives You'll Drive
- Observability: Build comprehensive monitoring, logging, and tracing systems that provide actionable insights into platform health and performance.
- Production Scalability & Reliability: Ensure our platform scales efficiently to meet customer demand while maintaining high availability and performance.
- Optimization: Drive FinOps practices and cost-aware architecture decisions across our Cloud infrastructure.
Responsibilities
- Cloud infrastructure architecture and operations (Azure and AWS)
- Container orchestration at scale (Kubernetes)
- Data platform infrastructure (Spark/Databricks)
- Platform observability, performance optimization, and cost management
- Infrastructure automation and self-service capabilities
- Incident response and system resilience
Qualifications
- 4-6+ years working with production systems at scale
- Hands-on experience with cloud infrastructure (Azure or AWS preferred)
- Programming ability (Python, Rust, Go, Bash, or others)
- Understanding of distributed systems, reliability, and observability
- Experience with infrastructure automation and configuration management
- Demonstrated ability to work effectively in remote environments
Required Skills
- Experience operating Kubernetes in production environments
- Familiarity with data platform infrastructure (Spark, Databricks, or similar)
- Multi-cloud experience (Azure and AWS)
- Hands-on experience with observability tools and practices
- Track record of improving system reliability and scalability
- Experience with cloud cost optimization and FinOps practices
- Infrastructure as Code tools (Terraform, Pulumi, or similar)
Preferred Skills
- Leadership experience or potential
- ML/AI infrastructure experience
- Security and compliance knowledge
- Incident management and on-call practices
- Performance tuning and capacity planning
Key Skills
Ranked by relevanceReady to apply?
Join TalenTown İnsan Kaynakları - İşe Alım Ajansı and take your career to the next level!
Application takes less than 5 minutes

