Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a Senior Site Reliability Engineer to support a global execution platform and deliver high-quality solutions to trading desks and clients.
You will work closely with top specialists, developing your skills in system management, monitoring, and low-latency technology. Apply now to be part of a team driving innovation in financial technology.
Please note that working from the customer's office in Lisbon is required 2-3 days per week.
Responsibilities
- Develop and implement monitoring, alerting, and incident response strategies
- Automate routine tasks and processes to improve efficiency
- Collaborate with software engineering teams to design and deploy reliable, scalable systems
- Deploy production changes with precision to maintain platform integrity
- Manage incidents including detailed analysis and reporting to ensure high service levels
- Participate in on-call rotations to support critical systems and services
- Communicate effectively with team members to resolve issues promptly
- Maintain documentation for operational procedures and system configurations
- Continuously improve system reliability and performance through proactive measures
Requirements
- Strong knowledge of Unix/Linux systems and networking with 3+ years experience
- Proficiency in Unix/Linux shell scripting and programming languages such as Python, Perl, C, C++, or Java
- Experience with monitoring and observability tools like ITRS Geneos, Dynatrace, Prometheus, and Grafana
- Ability to troubleshoot complex systems and resolve issues efficiently
- Experience working in high-availability, high-traffic environments
- Bachelor’s or Master’s degree in IT engineering or related field
- Ability to work effectively in a team and adapt to new environments
- Self-motivated with strong problem-solving and issue follow-up skills
- Excellent written and verbal communication skills with English level B2+
Nice to have
- Experience with log management tools such as Splunk, ELK, Graylog, or Loki
- Knowledge of network monitoring tools like Corvil
- Familiarity with databases including Oracle, PostgreSQL, MySQL/MariaDB, or KDB/q
- Experience with messaging systems such as IBM MQ, Tibco, Solace, LBM, or Kafka
- Familiarity with Infrastructure as Code tools like Ansible or Terraform
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes

