Talent.com
Esta oferta de trabajo no está disponible en tu país.
Site Reliability Developer 4

Site Reliability Developer 4

OracleRegión Centro, Jalisco, Mexico
Hace 28 días
Descripción del trabajo

Overview

As part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.

This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.

Responsibilities

What You’ll Do :

  • Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
  • Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.
  • Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.
  • Build tooling to reduce manual operations and eliminate sources of toil.
  • Improve CI / CD pipelines, deployment processes, and validation frameworks for reliability and efficiency.
  • Review and influence architectural designs for distributed systems with a focus on resilience, performance, and fault tolerance.
  • Lead and participate in post-incident reviews, capacity planning, and production-readiness assessments.
  • Provide on-call support on a rotational basis (12-hour shifts, 7-day coverage).

What We’re Looking For

  • Advanced Linux systems administration
  • Strong coding skills in Python (automation-focused)
  • Intermediate experience with Bash / Shell scripting
  • Familiarity with networking principles and distributed systems behavior
  • Basic to intermediate knowledge of databases (e.g., SQL, NoSQL)
  • Understanding of unit testing and modern software engineering practices
  • Experience with CI / CD pipelines and deployment automation
  • Comfortable working in Agile development environments
  • Nice to Have

  • Exposure to monitoring / observability tools (e.g., Prometheus, Grafana, New Relic)
  • Experience building internal tools for operational efficiency
  • Participation in SRE culture : blameless postmortems, runbooks, and service design reviews
  • #J-18808-Ljbffr

    Crear una alerta de empleo para esta búsqueda

    Site Reliability • Región Centro, Jalisco, Mexico