Sr. Site Reliability Engineer || Mexico / Remote
The Senior SRE will be ultimately responsible for ensuring the reliability, availability, and performance of our technology and systems directly supporting our end customers and internal customers. They will work closely with the product development and platform engineering teams to build and maintain scalable systems and robust automation that supports the company's business goals. The ideal candidate will have a history of successfully implementing and using tools like Terraform, Packer, Splunk, SignalFx, and other observability / IAC tools supporting systems with around the clock availability requirements. In addition, the ideal candidate will possess sufficient software skills to properly scrutinize and troubleshoot applications supporting our customers. They should have a strong aptitude for learning new technologies, embracing and driving solutions to challenging projects and problems. This role requires a seasoned engineer with the ability to collaborate across multiple cross‑functional teams while exhibiting a rich set of problem‑solving skills, along with being self‑motivated and having a passion for quality!
Responsibilities
- Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance.
- Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation.
- Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards.
- Partner closely with other teams on enterprise standards / best practices.
- Identify options for problem resolution and initiate corrective actions.
- Mentor junior members, document and share solutions.
- Collaborate cross functionally.
Qualifications
Minimum 4 years’ experience in any combination of software engineering roles of some type : SRE, DevOps, applications, services, tools / automation, release, etc.Minimum 3 years’ experience with SRE / DevOps practices and automation toolingExperience with observability solutions tools like Splunk, Datadog, SignalFx, etc.Experience deploying, maintaining and supporting software applications / services in the AWS ecosystemProactive approach to identifying problems and solutionsExperience writing code with one or more interpreted languages such as : Python, PHP, Perl, Ruby, Linux ShellExperience with Terraform or Cloud Formation scriptingExperience with configuration management tools like Ansible, Chef or PuppetExperience with standard software development best practices and tools such as code repositories (Git preferred)Experience executing in an agile software development environmentGood understanding of pricing / cost models across AWS services, especially compute, storage, and database offeringsMust be able to multitask and work well with changing priorities in a fast paced, 24x7 environmentMust be highly collaborative and be able to work in a team environment consisting of both technical and business peopleExcellent communication, problem solving and customer service skillsA strong ability to learn and adapt to new technologiesEducation : Bachelor’s degree in computer science, science, engineering or workforce equivalent technical certifications preferredSeniority level
Mid‑Senior levelEmployment type
ContractJob function
Engineering and Information TechnologyIndustries
IT Services and IT Consulting#J-18808-Ljbffr