HCLTech is a global technology company, home to more than 223,000 people in 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending September 2024 totaled $13.7 billion.
A Site Reliability Engineer (SRE) ensures the availability, performance, scalability, and resilience of production systems and services. This role bridges the gap between software development and IT operations, with a strong emphasis on automation, observability, and proactive problem-solving.
Key Responsibilities
- Design and implement highly available, scalable, and fault-tolerant systems.
- Automate operational tasks such as deployments, monitoring, backups, and disaster recovery.
- Implement and maintain observability tools : monitoring, logging, and tracing.
- Manage infrastructure as code (IaC) and CI / CD pipelines.
- Collaborate with development teams to enhance application reliability and performance.
- Participate in on-call rotations and incident response, including root cause analysis and post-mortems.
- Define and monitor service-level indicators (SLIs), service-level objectives (SLOs), and service-level agreements (SLAs).
- Continuously improve systems through testing, automation, and performance tuning.
Technical Skills
Programming / Scripting : Python, Go, Bash, Java, or similar.Operating Systems : Advanced Linux system administration.Containers & Orchestration : Docker, Kubernetes.Cloud Platforms : AWS, Google Cloud Platform (GCP), or Microsoft Azure.CI / CD Tools : Jenkins, GitHub Actions, GitLab CI, ArgoCD, etc.Infrastructure as Code (IaC) : Terraform, Ansible, Pulumi.Observability Tools : Prometheus, Grafana, ELK Stack, Datadog, New Relic, etc.Databases : Working knowledge of both relational and NoSQL databases.Networking & Security : Solid understanding of networking protocols, firewalls, VPNs, TLS / SSL.Soft Skills
Strong analytical and troubleshooting skills.Effective communication and collaboration across cross-functional teams.Ability to work under pressure and manage high-impact incidents.Proactive, with a mindset focused on automation and continuous improvement.Education & Experience
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).Previous experience in DevOps, system administration, or backend engineering with an infrastructure focus.Preferred Certifications (Not Required)
Google Professional Cloud DevOps Engineer / SRE
AWS Certified DevOps Engineer
Certified Kubernetes Administrator (CKA)
HashiCorp Terraform Associate
Work Modality :
Onsite (Guadalajara, Jalisco)
We offer :
Life insurance.Major Medical Expenses InsuranceMinor Medical Expense Insurance.Savings FundFood vouchers30 days’ Xmas bonus12 days of vacation in the first year.