Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Role: Site Reliability Engineer - Production Support
Rate Max for $50/hr.
Position Overview
seeks a skilled and experienced Production Support Engineer through vendor staffing to support our digital applications. This role combines hands-on production support with Site Reliability Engineering (SRE) principles, focusing on toil elimination, infrastructure automation, and ensuring high availability of critical digital applications and backend systems.
Primary Responsibilities
1. Toil Removal & Infrastructure Maintenance (15%)
· Execute SSL/TLS certificate updates and renewals across production environments
· Perform Windows and Linux server patching and security updates
· Manage NPID password updates and credential rotation protocols
· Implement security vulnerability remediation in production systems
· Identify, document, and eliminate repetitive manual operational tasks
2. Infrastructure & Database Cluster Management (20%)
· Manage and support Elasticsearch cluster operations (deployment, scaling, monitoring, troubleshooting, performance tuning)
· Administer MongoDB clusters including replication, sharding, backup, recovery, and maintenance
· Operate and maintain Redis instances for caching and session management
· Monitor cluster health, capacity planning, and optimization
· Execute failover and disaster recovery procedures
· Ensure data integrity and backup compliance
3. Automation & SRE Activities (15%)
· Develop, maintain, and enhance Ansible playbooks for infrastructure automation
· Build infrastructure-as-code solutions to reduce manual intervention
· Create and maintain comprehensive runbooks and operational playbooks
· Design monitoring, alerting, and observability solutions
· Implement automated remediation for common operational issues
· Quantify and prioritize toil reduction opportunities
4. Production Application Support (50%)
· Troubleshoot and resolve production incidents affecting digital applications
· Collaborate with application development and support teams on issue diagnosis
· Participate in incident response, root cause analysis, and post-mortems
· Monitor and respond to application performance degradation
---
Technical Requirements
Required Expertise (Must-Have)
· Ansible: 2+ years hands-on experience writing playbooks, roles, and automation workflows
· Elasticsearch: 2+ years managing and troubleshooting Elasticsearch clusters in production
· MongoDB: 2+ years with replica sets, sharding, backup/recovery, and performance tuning
· Redis: Proficiency in deployment, configuration, and operational support
· OpenShift: Experience deploying and managing containerized applications on OpenShift
· Azure: Knowledge of Azure cloud services, resource management, and deployments
· Linux Administration: 3+ years with RHEL, CentOS, or Ubuntu in production environments
· Windows Server Administration: Experience with patching, certificate management, and maintenance
· Shell Scripting: Bash scripting for automation and operational tasks
· Incident Management: Experience responding to and resolving critical production incidents
Preferred Skills
· Kubernetes or container orchestration platforms
· Python or Go scripting for automation
· CI/CD pipeline experience (Jenkins, GitLab CI, Azure DevOps)
· Monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog)
· Infrastructure-as-Code tools (Terraform, CloudFormation)
· Security best practices and vulnerability management
· Relevant certifications (AZ-900, CKA, Elasticsearch, etc.)
---
Required Qualifications
· Minimum 5 years of production infrastructure support or SRE experience
· Minimum 3 years with at least 2 of the core technologies (Elasticsearch, MongoDB, Ansible, OpenShift)
· Experience working in regulated financial services environment (preferred)
· Ability to work independently and in teams
· Strong troubleshooting and analytical capabilities
· Excellent documentation and communication skills
· Must be available for on-call support rotation (with reasonable notice)
---
Operational Expectations
· On-Call Rotation: Participates in production support on-call schedule
· Incident Response: Available for critical incident resolution outside standard business hours as required
· Availability: Core business hours + flexibility for critical production issues
· Response Time: First response to critical incidents within 30 minutes
· Documentation: Maintains detailed runbooks, playbooks, and knowledge base articles
· Collaboration: Regular communication with infrastructure, development, and operations teams
Key Skills
Ranked by relevanceReady to apply?
Join Aarorn Technologies Inc and take your career to the next level!
Application takes less than 5 minutes

