Site Reliability Engineer

Responsibilities:

  • Deploy, monitor, troubleshoot, and maintain Web3 wallet production environments.
  • Manage and maintain cloud infrastructure (e.g., AWS, GCP, Azure) to ensure system high availability.
  • Configure and manage containerized environments, utilizing Kubernetes for cluster management and automated deployment.
  • Monitor logs, performance, and system health to identify and resolve potential issues promptly.
  • Collaborate with the development team to ensure smooth execution of Continuous Integration and Continuous Deployment (CI/CD) processes.
  • Optimize system architecture to improve performance, scalability, and security.
  • Conduct regular disaster recovery drills and backups to ensure data security and integrity.


Requirements:

  • Proficiency in English communication.
  • Familiarity with Linux/Unix system administration, with at least 2 years of experience in operations and maintenance.
  • Experience with Apollo, Grafana, and TeamCity platforms.
  • Expertise in containerization technologies (Docker, Kubernetes) and virtualization.
  • Hands-on experience in managing cloud platforms (AWS, GCP, Azure) and familiarity with load balancing and auto-scaling technologies.
  • Knowledge of CI/CD processes and experience with tools like Jenkins and GitLab CI.
  • Understanding of database management and performance tuning (MySQL, PostgreSQL, etc.).
  • Scripting skills in Python, Shell, or Go for automation.
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana) for tracking and analyzing system performance.
  • Strong problem-solving skills and ability to work effectively in a team environment.

Post Date
2025-05-16
Job Type
-
Employment type
Full-time
Category
Engineering, Information Technology
Level
Mid-Senior
Country
United Arab Emirates
Industry
Software Development ,
Token 13 Software L.L.C*******