Astra-North Infoteck Inc.  ~ Conquering today’s challenges, achieving tomorrow’s vision!
SRE/DevOps Engineer
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!Canada1 day ago
Full-timeInformation Technology
SRE/DevOps Engineer - Toronto (4 days onsite)

Seeking to hire a Senior Site Reliability Engineer for its Application Maintenance and Transformation, Data Services and Integration team. As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation.

What will you do?
• Set vision for SRE product base (monitoring, alerting, self-healing, reliability testing).
• Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.
• Function as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported applications.
• Actively participate in deploying software applications, automation tools, and IT infrastructure.
• Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.
• Drive transformation by continuously looking for ways to automate existing SRE processes and increase operational efficiency.
• Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.
• Lead in incident management and problem management for applications in scope and RCA action items fulfillment/ownership.
• Debug production issues across services and levels of the stack and provide primary operational support.
• Perform occasional off-hours support.

Must-have:
• Bachelor’s degree in Computer Science, Electrical or Electronics Engineering or related field or equivalent experience.
• 3+ years IT experience in software development and/or maintenance or SRE or DevOps Engineering experience.
• 1+ years experience building Java Spring boot applications and rest API development.
• Experience working on relational databases – MS-SQL Server or MySQL, MariaDB and SingleStore or in-memory distributed databases.
• Experience working on Containerization platforms such as Docker and container orchestration tools like Kubernetes (Azure Kubernetes or OpenShift Kubernetes Service preferred).
• Solid Git skills with experience working on popular CI tools - Jenkins or UCD
• Experience working on Windows and Linux based infrastructure.
• 1+ years developing cloud-native applications using Java or Python.
• Experience writing SQL queries and fine tuning or optimization skills.
• Experience using centralized logging solutions (Splunk, Elk (preferred), etc.) and active monitoring systems (Dynatrace, etc.)
• Experience deploying and operating cloud-native applications in a Private (OpenShift) or public cloud (Azure/AWS preferred)
• In-depth and proactive communication skills around status of projects/issues in production
• Must be a self-starter, motivated, resourceful, and driven to work with cross functional teams in large enterprises with complex org structures to meet business timelines on delivery.
• Financial Services domain knowledge preferably Capital Markets and Wealth Management.
Nice-to-have:
• Experience implementing dashboards to help teams visualize logs, instrumentation, and other data to ensure optimal performance of the platform services, infra, and deployed applications (Grafana preferred).
• Exposure to Datawarehouse’s like Informatica, Snowflake or Databricks and Business intelligence tools like SAP BO or similar.
• Experience creating runbooks, processes, and test plans around reliability, performance, etc. of infrastructure and applications.
• Exposure to PagerDuty, Postman, ServiceNow, SonarQube, NexusIQ and vault tools.
• Exposure to event brokers like Kafka or IBM-MQ, Mainframe tools and environment,
• Exposure to Industry Disaster recovery test exercises.


Key Skills

Ranked by relevance