Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build asustainableeconomy where everyone can prosper. We support a wide range of digital payments choices, making transactionssecure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Site Reliability Engineer (Automation & virtualization) Site Reliability Engineer
About the Role
We’re looking for a passionate and skilled Site Reliability Engineer (SRE) to join our Platform Engineering team. This role is pivotal in automating and managing VMware ESXi hypervisors across Dell and Cisco UCS platforms, ensuring high reliability, scalability, and performance of our infrastructure.
You’ll work at the intersection of infrastructure and software, driving automation, observability, and operational excellence across our virtualization stack.
Key Responsibilities
Deploy, configure, and patch ESXi hosts using tools like VMware Update Manager, iDRAC, and UCS Central.
Validate host readiness and enforce consistency across environments.
Build and maintain automation pipelines using PowerCLI, Python, Terraform, and Ansible.
Develop Infrastructure-as-Code (IaC) templates for scalable provisioning.
Administer NSX-T / V for logical switching, routing, and micro-segmentation.
Troubleshoot endpoint tagging and network performance issues between NSX and ESXi.
Implement observability stacks using Prometheus, Grafana, Splunk, and Dynatrace.
Define and track SLOs, SLIs, and error budgets.
Lead modernization efforts including UCS blade decommissioning and Dell R760 upgrades.
Optimize cluster and VM sizing for performance and cost efficiency.
Partner with application, storage, and network teams to align infrastructure with workload needs.
Communicate upgrade plans and maintenance schedules across teams.
Maintain build guides, validation checklists, and operational runbooks.
Contribute to internal wikis and onboarding materials.
Required Skills
Preferred Qualifications
Corporate Security Responsibility
#J-18808-Ljbffr
Site Reliability Engineer • Mexico