Talent.com
Esta oferta de trabajo no está disponible en tu país.
Lead, Site Reliability Engineer

Lead, Site Reliability Engineer

Royal Caribbean InternationalCiudad de México, Mexico
Hace 14 horas
Descripción del trabajo

Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world.

We are proud to be the vacation-industry leader with global brands — including Royal Caribbean International, Celebrity Cruises and Silversea Cruises — the most innovative fleet and private destinations, and the best people. Together, we are dedicated to turning the vacation of a lifetime into a lifetime of vacations for our guests.

Global eCommerce has an exciting career opportunity for a full time Lead Site Reliability Engineer reporting to the Sr. Manager, Site Reliability Engineer .

This position will work on-site in Mexico City.

Position Summary

The Lead Site Reliability Engineer (Lead SRE) will report to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making. The Lead SRE will use application and user performance metrics collected from various sources and tools to support tasks such as initial triage of critical production incidents, bug analysis, implementation of best practices in site reliability engineering, infrastructure optimization, and seamless collaboration between internal teams and external service providers, among other operational initiatives.

The ideal candidate will have a deep understanding and proven track record in a senior IT support role and could provide leadership toward the development of new employees. The ideal candidate will also have an eye toward the rapidly evolving technology landscape and provide leadership over advanced and emerging concepts, to research and implement proactive and preventative measures that avoid technical incidents.

S / he must be able to work with multiple product and project teams simultaneously, thrive in a fast-paced and dynamic environment and connect unexpected threads across disparate teams. The role will provide direct leadership over the individual contributors who provide Level 1 and Level 2 support. This leader requires the ability to direct teams during high-pressure business critical incidents, to ensure that customer-focused decisions are being made to minimize / eliminate guest / employee experience impacts.

Essential Duties and Responsibilities

At a high level, responsibilities for this role will include :

  • Product Health : Provides leadership over a large team of Level 1 and Level 2 support resources. Is responsible for the Incident Management, Application Performance, Configuration Management and Operational Readiness of the products within her / his ownership. Partners with and collaborate closely with stakeholders from the various teams within IT to ensure that performance tools, configuration tools and monitoring tools meet the needs of her / his products.
  • Incident Management : Responsible for a team prepared to react quickly to production incidents with the goal to restore systems / applications back to normal service operation as quickly as possible and minimize the impact on guest / crew experience or business operations, thus ensuring the best possible service levels and availability are maintained. Review ticket analysis and approve closure of tickets / incidents. Understands architecture of Royal website and escalates incidents as needed to the appropriate team for further triage. Synthesizes and communicates incident details to the production team and stakeholders, including executive level stakeholders. Review postmortem / RCA document and follow up.
  • Application Performance Management (APM) : Ensures proactive monitoring and management of performance and availability of the software applications within the products s / he is responsible for. Strives to detect and diagnose complex application performance problems to maintain an expected level of service. Builds case for prioritizing bug and enhancement tickets. Creates reports on new deployment build performance for product teams to ensure quality.
  • Configuration Management : Leads the team(s) in implementing and maintaining the technology standards and practices across product definition and product configuration. Adjust health thresholds and other monitoring settings based on historical performance. Creates and maintains performance dashboards used by support and product teams. Maintains alerting, communication, and documentation tool chain to ensure it is up to date and efficient.
  • Change Control Governance : Ensuring all production changes required by the product teams are carried out in a planned and authorized manner, within established change control policies and procedures and that all changes are thoroughly tested and validated from the monitoring perspective.
  • Production Operations Readiness : Ensure all product implementations go through an operational readiness review. Establish and maintain clear communication channels (e.g., Slack, Teams) with the scrum and marketing teams. Ensure all team members are informed about relevant updates and changes that may affect the website.

Qualifications, Knowledge and Skills

Technical Expertise :

Proficiency in cloud platforms such as AWS, AWS Elastic Beanstalk.

Understanding of API design principles : REST, SOAP, Graph

Advanced knowledge of monitoring and logging tools (AppDynamics, Datadog, Splunk, New Relic, etc.).

Strong proficiency in Adobe AEM is crucial for guiding technical initiatives and mentoring teams

Problem-Solving Skills :

Strong analytical and troubleshooting skills to diagnose and resolve complex production issues swiftly.

Ability to develop and implement effective incident response plans.

Communication and Collaboration :

Excellent written and verbal communication skills for effective interaction with cross-functional teams and documentation.

Ability to collaborate with Development, QA, IT, and external managed service providers to ensure seamless operations.

Work Environment :

The Lead SRE Engineer may be required to participate in an on-call rotation to handle urgent incidents and ensure 24x7 system reliability.

On-call duties may include evenings, weekends, and holidays as needed.

We know there’s a lot to consider. As you go through the application process, our recruiters will be glad to provide guidance, and more relevant details to answer any additional questions. Thank you again for your interest in Royal Caribbean Group. We’ll hope to see you onboard soon!

It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.

#LI-SS1

#J-18808-Ljbffr

Crear una alerta de empleo para esta búsqueda

Site Reliability Engineer • Ciudad de México, Mexico

Ofertas relacionadas
  • Oferta promocionada
  • Nueva oferta
Site Reliability Engineer

Site Reliability Engineer

Royal Caribbean InternationalCiudad de México, Mexico
Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at.We are proud to offer a competitive compensation and benefits package, and excellent...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
Lead Site Reliability Engineer

Lead Site Reliability Engineer

SimNaucalpan de Juárez, Estado de México, Mexico
Teletrabajo
Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology.If you are an innovative, curious, collaborative person who embraces challenges and wants to gr...Mostrar másÚltima actualización: hace 12 días
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

Translation Back OfficeCiudad de México, Ciudad de México, Mexico
Teletrabajo
We are looking for a highly skilled Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and efficiency of our platforms and services.The ideal candidate will h...Mostrar másÚltima actualización: hace 22 días
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

UST España & LatamMexico City, Mexico
Born digital, UST transforms lives through the power of technology.We walk alongside our clients and partners, embedding innovation and agility into everything they do. We help them create transform...Mostrar másÚltima actualización: hace 8 días
  • Oferta promocionada
  • Nueva oferta
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Next MatterCiudad de México, Mexico
Teletrabajo
The FUB+ Infrastructure & Security Team at Zillow Group supports the Follow Up Boss systems, applications, and software engineering teams that power the businesses of tens of thousands of real esta...Mostrar másÚltima actualización: hace 2 horas
  • Oferta promocionada
  • Nueva oferta
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

NielsenIQCiudad de México, Mexico
Teletrabajo
R25_0014950 — Hybrid position you will have to be in the office 3 x week.Piso 8 Polanco I Sección, Miguel Hidalgo, 11510 Miguel Hidalgo, Distrito Federal. NielsenIQ’s retail infrastructure in a scal...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

EXLMexico City, Mexico
About the Company : We are seeking a highly motivated and skilled Site Reliability Engineer (SRE) to join our team.The ideal candidate will have a passion for continuous learning, a collaborative mi...Mostrar másÚltima actualización: hace 26 días
  • Oferta promocionada
Lead, Site Reliability Engineer

Lead, Site Reliability Engineer

Royal Caribbean GroupCiudad de México, Ciudad de México, Mexico
Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at.We are proud to offer a competitive compensation and benefits package, and excellent...Mostrar másÚltima actualización: hace más de 30 días
  • Oferta promocionada
Senior Site Reliability Engineer

Senior Site Reliability Engineer

ZillowCiudad de México, Ciudad de México, Mexico
Teletrabajo
Senior Site Reliability Engineer.Senior Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features. The FUB+ Infrastructure & Security Team at Zillow Group supports the ...Mostrar másÚltima actualización: hace más de 30 días
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

Tata Consultancy ServicesCiudad de México, Ciudad de México, Mexico
We are looking for a Site Reliability Engineer (SRE) to join our team and help us ensure seamless, high-performing, and reliable technology operations. Azure DevOps - Pipelines, repositories, and au...Mostrar másÚltima actualización: hace 25 días
  • Oferta promocionada
  • Nueva oferta
Mid level Site Reliability Engineer (AI Focus)

Mid level Site Reliability Engineer (AI Focus)

JobsityCiudad de México, Ciudad de México, Mexico
Teletrabajo
We’re looking for a Site Reliability Engineer with a strong AI / ML focus—a hybrid role sitting between traditional SRE and AI Engineering. This engineer will design, deploy, and monitor AI / ML workloa...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
  • Nueva oferta
Lead Site Reliability Engineer

Lead Site Reliability Engineer

SimCorpCiudad de México, Mexico
Teletrabajo
Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology.If you are an innovative, curious, collaborative person who embraces challenges and wants to gr...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
  • Nueva oferta
Site Reliability Engineer

Site Reliability Engineer

Gilder Search GroupCiudad de México, Ciudad de México, Mexico
Teletrabajo
Prediktive - LATAM, United States.Software Development company, based in Los Angeles, CA.Our client’s platform is a mobile-first CMMS, EAM & IIoT suite of solutions that helps teams streamline work...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
Senior Site Reliability Engineer

Senior Site Reliability Engineer

CanonicalEcatepec de Morelos, Estado de México, Mexico
Teletrabajo
Senior Site Reliability Engineer.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used i...Mostrar másÚltima actualización: hace más de 30 días
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

HCLTechMexico City, Mexico
HCLTech is a global technology company, home to more than 223,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by...Mostrar másÚltima actualización: hace 21 días
  • Oferta promocionada
  • Nueva oferta
Site Reliability Engineer I

Site Reliability Engineer I

MastercardCiudad de México, Mexico
Mastercard powers economies and empowers people in 200+ countries and territories worldwide.Together with our customers, we’re helping build a sustainable economy where everyone can prosper.We supp...Mostrar másÚltima actualización: hace 14 horas
  • Oferta promocionada
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Svitla Systems, Inc.Ciudad de México, Ciudad de México, Mexico
Senior Site Reliability Engineer for a full-time position (40 hours per week) in Latin America.Our client is a leading expert network, providing business and government professionals opportunities ...Mostrar másÚltima actualización: hace 5 días
  • Oferta promocionada
Site Reliability Engineer

Site Reliability Engineer

DematicCentro, Mexico
The Site Reliability Engineer will be part of the Platform Operations Global Team, responsible for managing cloud-based infrastructure and services for the custom Java-based and third-party applica...Mostrar másÚltima actualización: hace 2 días