Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Key Responsibilities:
1.Manage and maintain container clusters and other open-source component clusters across various business lines
2.Build and enhance infrastructure operation platforms, including infrastructure management, CI/CD, monitoring/alerting, and logging systems
3.Respond quickly to incidents and implement effective solutions to minimize service disruption
4.Ensure high availability of production services by continuously optimizing system architecture, deployment strategies, and operational efficiency
5.Lead automation initiatives to improve operational efficiency and reduce manual intervention
6.Collaborate with development teams to implement best practices for infrastructure as code and service reliability
7.Participate in on-call rotations to provide 24/7 support for critical systems
Qualifications:
1.Bachelor's degree in Computer Science or related technical field preferred
2.3+ years of experience in systems operations or site reliability engineering
3.Proficiency with major public cloud platforms (AWS, Azure, GCP) and their respective services
4.Strong Linux system administration skills with extensive experience in day-to-day maintenance operations
5.Advanced scripting abilities using Shell/Python for automation and operational tasks
6.Deep understanding of internet technology architecture and performance optimization for common software including Nginx, MySQL, Redis, Kafka, ElasticSearch, and JVM
7.Extensive experience with Kubernetes and Docker technologies, including production-level container cluster operations
8.Hands-on experience with CI/CD tools such as GitLab CI and ArgoCD
9.Excellent problem-solving skills with the ability to troubleshoot complex systems under pressure
10.Strong communication skills and ability to collaborate effectively in a remote environment
11.Self-motivated with the ability to work independently while maintaining alignment with team objectives
Licence No. : 25S2734
Registration ID: R22107458
Ready to apply?
Join Starry Recruitment and take your career to the next level!
Application takes less than 5 minutes