Job SummaryWe are seeking a skilled System Reliability Specialist to join our team.
This role will be responsible for ensuring the reliability, availability and performance of systems and services.You will work closely with development and operations teams to design, implement and maintain scalable and efficient infrastructure.Develop automation scripts to streamline system processesMonitor system health using various tools and dashboardsRespond to incidents and outages performing root cause analysis and implementing corrective actionsKey Responsibilities : System Monitoring and Incident Response : Use data analytics to identify system performance issuesAnalyze system metrics to optimize performance and efficiencyPerformance Optimization : Implement optimization strategies to enhance system efficiencyCollaborate with cross-functional teams to improve system architecture and designContinuous Improvement : Maintain comprehensive documentation of systems processes and proceduresParticipate in post-mortem reviews to drive continuous improvementRequirements : Proven experience in system reliability or similar fieldStrong knowledge of cloud computing platforms (AWS)Proficiency in scripting and programming languages (e.g Python, Go, Bash)
System Reliability Specialist • Guadalajara, Jalisco, México