Talent.com
Site Reliability Engineer – Azure DevOps

Site Reliability Engineer – Azure DevOps

EPAM SystemsMexico
Hace 13 días
Descripción del trabajo

Site Reliability Engineer – Azure DevOps

1 week ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Join our team as a Site Reliability Engineer , where you will ensure system reliability, manage incident responses, and enable seamless collaboration between operations and development teams.

This role demands a background in Oil & Gas combined with expertise in automation and cloud technologies. Apply now to support critical infrastructure and drive operational excellence.

Responsibilities

  • Oversee and enhance the product monitoring system
  • Handle incidents, including troubleshooting, resolution, documentation, and analysis
  • Distribute knowledge and insights across teams
  • Facilitate collaboration between operations and development
  • Create automation for log analysis, testing production systems, and alerting
  • Track system health, performance, and SLIs / SLOs / SLAs
  • Maintain documentation for incident management procedures
  • Conduct incident analyses and implement corrective actions
  • Respond to on-call support requests during and after business hours
  • Collaborate with teams to enhance system efficiency and reliability
  • Leverage tools such as PagerDuty, ELK / Kibana, SEQ logging, Prometheus, and Grafana for system monitoring
  • Develop scripts and implement automation solutions using Python, C#, and Bash
  • Manage orchestration and infrastructure through SaltStack and Docker
  • Support project workflows using Azure DevOps and maintain a comprehensive Wiki
  • Maintain code repositories and implement version control systems using Git

Requirements

  • 1+ years of experience in creating solutions, particularly in Site Reliability Engineering
  • Expertise in cloud services and automation scripting with Python and Bash
  • Background in Oil & Gas operations and incident handling
  • Skill in managing incident responses and providing on-call support
  • Familiarity with monitoring tools such as Prometheus and Grafana
  • Proficiency in logging tools like ELK / Kibana and SEQ logging
  • Knowledge of orchestration and infrastructure solutions including SaltStack and Docker
  • Understanding of fundamental networking concepts like inbound / outbound rules and firewalls
  • Proficiency in tools for project management and issue tracking like Azure DevOps
  • Capability to manage source code with Git
  • Strong skills in creating documentation and disseminating knowledge
  • Competency in conducting detailed post-incident reviews
  • Excellent troubleshooting abilities and problem-solving skills
  • Effective communication skills, with an English level of at least B2
  • Nice to have

  • Experience using PagerDuty for incident handling
  • Competency in C# programming
  • Understanding of SQL and MongoDB databases
  • Background in Zededa infrastructure
  • Experience in supporting Oil & Gas field operations
  • We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Employee financial programs
  • Paid time off and sick leaveUpskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
  • Seniority level

    Associate

    Employment type

    Full-time

    Job function

    Engineering, Information Technology, and Business Development

    Industries

    Software Development, IT Services and IT Consulting, and Nanotechnology Research

    #J-18808-Ljbffr

    Crear una alerta de empleo para esta búsqueda

    Site Reliability Engineer • Mexico

    Ofertas relacionadas
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    DuckDuckGoMexico
    Teletrabajo
    Be among the first 25 applicants.Hi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable ...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    BaufestMexico
    Teletrabajo
    En Baufest, nuestra misión es mejorar la vida con tecnología, generando un impacto positivo en la sociedad.Responsabilidades principales : . Diseñar y adaptar el modelo operativo SRE al contexto de la...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Site Reliability Developer 3

    Site Reliability Developer 3

    Ll OefentherapieMexico
    Email me jobs like this Alerts for this search are on.As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in a...Mostrar másÚltima actualización: hace 12 días
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Incode TechnologiesMexico
    Incode is the leading provider of world-class identity solutions that is reinventing the way humans authenticate and verify their identities online to power a world of digital trust.Through our rev...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Sr. Site Reliability Engineer (Remote, Mexico)

    Sr. Site Reliability Engineer (Remote, Mexico)

    NovaMexico
    Teletrabajo
    Site Reliability Engineer (Remote, Mexico).Site Reliability Engineer (Remote, Mexico).Site Reliability Engineer (Remote, Mexico). Be among the first 25 applicants.Site Reliability Engineer (Remote, ...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    DevOps Azure

    DevOps Azure

    Microtalent is becoming INSPYR Global SolutionsMexico
    Teletrabajo
    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from Microtalent is becoming INSPYR Global Solutions. Talent Acquisition Specialist / LATAM / recruiting r...Mostrar másÚltima actualización: hace 14 días
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IncodeMexico
    Incode is the leading provider of world-class identity solutions that is reinventing the way humans authenticate and verify their identities online to power a world of digital trust.Through our rev...Mostrar másÚltima actualización: hace 16 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Canonical Group LtdMexico
    Teletrabajo
    Canonical is a pioneering open source software company best known for publishing Ubuntu.We operate globally with a distributed workforce and few office-based roles. Teams collaborate in person 2–4 t...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    EPAM SystemsMexico
    Teletrabajo
    Be among the first 25 applicants.EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employ...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PerficientMexico
    We currently have a career opportunity for a.Senior Site Reliability Engineer.Mexico or Colombia (only this locations).As a Senior Technical Consultant you will participate in all aspects of the so...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AmpstekMexico
    Teletrabajo
    Senior👨🏻💻Talent Acquisition Executive | UK & EUROPE (“Failure is the opportunity to begin again more intelligently”).Develop and maintain monitoring tools, alerts, and dashboards to provide visi...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    ConfidencialMexico
    Estamos en búsqueda de un / a Ingeniero / a SRE senior para potencialmente sumarse a un proyecto de consultoría.El rol tendrá como objetivo fortalecer la confiabilidad, estabilidad y resiliencia de los...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Pyramid Consulting, IncMexico, Mexico
    As a Sr Site Reliability Engineer on this team, you’ll be responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and adva...Mostrar másÚltima actualización: hace 8 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    New Era TechnologyMexico, Mexico
    Site Reliability Engineering (SRE) Engineer! the SRE Engineer we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to de...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    KI peopleMexico
    Teletrabajo
    Be among the first 25 applicants.Direct message the job poster from KI people.In Search of the Best Global IT & Digital Talent. The SRE Operations specialist focuses on B2B applications support prov...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    One IncMexico
    Teletrabajo
    Senior Site Reliability Engineer.Design and develop internal tools and automation scripts to support SRE and infrastructure tasks. Collaborate closely with SRE team members to identify automation op...Mostrar másÚltima actualización: hace más de 30 días
    • Oferta promocionada
    Junior Site Reliability Engineer – Azure DevOps

    Junior Site Reliability Engineer – Azure DevOps

    EPAM SystemsMexico
    EPAM is a leading global provider of digital platform engineering and development services.We are committed to having a positive impact on our customers, our employees, and our communities.We embra...Mostrar másÚltima actualización: hace 19 días
    • Oferta promocionada
    Site Reliability Engineer (Azure)

    Site Reliability Engineer (Azure)

    EPAM SystemsMexico
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.EPAM is a leading global provider of digital platform engineering and development services.We are comm...Mostrar másÚltima actualización: hace 19 días