Site Reliability Engineer – Azure DevOps

EPAM SystemsMexico

Hace 13 días

Descripción del trabajo

1 week ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Join our team as a Site Reliability Engineer , where you will ensure system reliability, manage incident responses, and enable seamless collaboration between operations and development teams.

This role demands a background in Oil & Gas combined with expertise in automation and cloud technologies. Apply now to support critical infrastructure and drive operational excellence.

Responsibilities

Oversee and enhance the product monitoring system
Handle incidents, including troubleshooting, resolution, documentation, and analysis
Distribute knowledge and insights across teams
Facilitate collaboration between operations and development
Create automation for log analysis, testing production systems, and alerting
Track system health, performance, and SLIs / SLOs / SLAs
Maintain documentation for incident management procedures
Conduct incident analyses and implement corrective actions
Respond to on-call support requests during and after business hours
Collaborate with teams to enhance system efficiency and reliability
Leverage tools such as PagerDuty, ELK / Kibana, SEQ logging, Prometheus, and Grafana for system monitoring
Develop scripts and implement automation solutions using Python, C#, and Bash
Manage orchestration and infrastructure through SaltStack and Docker
Support project workflows using Azure DevOps and maintain a comprehensive Wiki
Maintain code repositories and implement version control systems using Git

Requirements

1+ years of experience in creating solutions, particularly in Site Reliability Engineering

Expertise in cloud services and automation scripting with Python and Bash

Background in Oil & Gas operations and incident handling

Skill in managing incident responses and providing on-call support

Familiarity with monitoring tools such as Prometheus and Grafana

Proficiency in logging tools like ELK / Kibana and SEQ logging

Knowledge of orchestration and infrastructure solutions including SaltStack and Docker

Understanding of fundamental networking concepts like inbound / outbound rules and firewalls

Proficiency in tools for project management and issue tracking like Azure DevOps

Capability to manage source code with Git

Strong skills in creating documentation and disseminating knowledge

Competency in conducting detailed post-incident reviews

Excellent troubleshooting abilities and problem-solving skills

Effective communication skills, with an English level of at least B2

Nice to have

Experience using PagerDuty for incident handling

Competency in C# programming

Understanding of SQL and MongoDB databases

Background in Zededa infrastructure

Experience in supporting Oil & Gas field operations

We offer

International projects with top brands

Work with global teams of highly skilled, diverse peers

Employee financial programs

Paid time off and sick leaveUpskilling, reskilling and certification courses

Unlimited access to the LinkedIn Learning library and 22,000+ courses

Global career opportunities

Volunteer and community involvement opportunities

EPAM Employee Groups

Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Seniority level

Associate

Employment type

Full-time

Job function

Engineering, Information Technology, and Business Development

Industries

Software Development, IT Services and IT Consulting, and Nanotechnology Research

#J-18808-Ljbffr

Crear una alerta de empleo para esta búsqueda

Site Reliability Engineer • Mexico

Ofertas relacionadas

Oferta promocionada

Senior Site Reliability Engineer

DuckDuckGoMexico

Teletrabajo

Be among the first 25 applicants.Hi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable ...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Site Reliability Engineer

BaufestMexico

Teletrabajo

En Baufest, nuestra misión es mejorar la vida con tecnología, generando un impacto positivo en la sociedad.Responsabilidades principales : . Diseñar y adaptar el modelo operativo SRE al contexto de la...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Site Reliability Developer 3

Ll OefentherapieMexico

Email me jobs like this Alerts for this search are on.As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in a...Mostrar másÚltima actualización: hace 12 días

Oferta promocionada

Senior Site Reliability Engineer

Incode TechnologiesMexico

Incode is the leading provider of world-class identity solutions that is reinventing the way humans authenticate and verify their identities online to power a world of digital trust.Through our rev...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Sr. Site Reliability Engineer (Remote, Mexico)

NovaMexico

Teletrabajo

Site Reliability Engineer (Remote, Mexico).Site Reliability Engineer (Remote, Mexico).Site Reliability Engineer (Remote, Mexico). Be among the first 25 applicants.Site Reliability Engineer (Remote, ...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

DevOps Azure

Microtalent is becoming INSPYR Global SolutionsMexico

Teletrabajo

Get AI-powered advice on this job and more exclusive features.Direct message the job poster from Microtalent is becoming INSPYR Global Solutions. Talent Acquisition Specialist / LATAM / recruiting r...Mostrar másÚltima actualización: hace 14 días

Oferta promocionada

Senior Site Reliability Engineer

IncodeMexico

Oferta promocionada

Site Reliability Engineer

Canonical Group LtdMexico

Teletrabajo

Canonical is a pioneering open source software company best known for publishing Ubuntu.We operate globally with a distributed workforce and few office-based roles. Teams collaborate in person 2–4 t...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Senior Site Reliability Engineer (SRE)

EPAM SystemsMexico

Teletrabajo

Be among the first 25 applicants.EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employ...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Senior Site Reliability Engineer

PerficientMexico

We currently have a career opportunity for a.Senior Site Reliability Engineer.Mexico or Colombia (only this locations).As a Senior Technical Consultant you will participate in all aspects of the so...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Senior Site Reliability Engineer

AmpstekMexico

Teletrabajo

Senior👨🏻💻Talent Acquisition Executive | UK & EUROPE (“Failure is the opportunity to begin again more intelligently”).Develop and maintain monitoring tools, alerts, and dashboards to provide visi...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Site Reliability Engineer

ConfidencialMexico

Estamos en búsqueda de un / a Ingeniero / a SRE senior para potencialmente sumarse a un proyecto de consultoría.El rol tendrá como objetivo fortalecer la confiabilidad, estabilidad y resiliencia de los...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Site Reliability Engineer

Pyramid Consulting, IncMexico, Mexico

As a Sr Site Reliability Engineer on this team, you’ll be responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and adva...Mostrar másÚltima actualización: hace 8 días

Oferta promocionada

Site Reliability Engineer

New Era TechnologyMexico, Mexico

Site Reliability Engineering (SRE) Engineer! the SRE Engineer we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to de...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Site Reliability Engineer

KI peopleMexico

Teletrabajo

Be among the first 25 applicants.Direct message the job poster from KI people.In Search of the Best Global IT & Digital Talent. The SRE Operations specialist focuses on B2B applications support prov...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Senior Site Reliability Engineer

One IncMexico

Teletrabajo

Senior Site Reliability Engineer.Design and develop internal tools and automation scripts to support SRE and infrastructure tasks. Collaborate closely with SRE team members to identify automation op...Mostrar másÚltima actualización: hace más de 30 días

Oferta promocionada

Junior Site Reliability Engineer – Azure DevOps

EPAM SystemsMexico

Oferta promocionada

Site Reliability Engineer (Azure)

EPAM SystemsMexico

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.EPAM is a leading global provider of digital platform engineering and development services.We are comm...Mostrar másÚltima actualización: hace 19 días