Prediktive - LATAM, United States
We are looking for a Site Reliability Engineer based in Latin America to work on a long-term project for one of our clients, a Software Development company, based in Los Angeles, CA.
Our client’s platform is a mobile-first CMMS, EAM & IIoT suite of solutions that helps teams streamline work orders, track assets, and schedule preventive maintenance, all in one place.
Responsibilities
- Configure and operate monitoring, logging and tracing tools. Work with developers to improve application logging with focus on problem detection. Implement modern or replace tooling as needed.
- Build dashboards, alerts, and automation workflows; define and track reliability metrics.
- Monitor system performance and reliability, and implement improvements as needed.
- Collaborate with software engineering teams to design and implement reliable systems.
- Write and maintain robust automation tasks for infrastructure and development processes.
- Participate in a 24 / 7 on-call rotation for alerts and incidents, and engage in root cause analysis and post-mortem meetings as needed.
- Implement and manage security and compliance best practices across infrastructure and pipelines.
- Manage and optimize AWS EKS Kubernetes clusters for deployment, scaling, and operation of containerized applications.
- Design, build, and maintain scalable customer-facing infrastructure on AWS using Terraform.
- Collaborate with developers and QA teams to streamline code deployment and testing workflows.
- Collaborate with Database Administrators to gather requirements and implement client-side configuration of database connections.
- Support Linux-based environments across development, staging, and production.
Qualifications
Advanced Level of English5+ years of professional experience in Site Reliability Engineering or a related role.4+ years of experience scripting with Bash or Shell.3+ years working with Linux-based systems and troubleshooting system-level Issues.3+ years of experience with CI / CD pipelines, preferably using GitHub Actions.2+ years working with Kubernetes (monitoring, deployment, scaling and networking).Expertise in SRE concepts, including SLI / SLOs and Golden Signals.Understanding of AWS services (EC2, EKS, S3, IAM, CloudWatch, etc.).Familiar with Docker and container registries.Strong problem-solving skills and the ability to work independently and collaboratively.Bonus Points
Bachelor’s Degree in Computer Science, Systems Engineering or related fieldsExperience with helm charts for Kubernetes deployments.Knowledge of networking fundamentals and security best practices.What we offer
Long term positionsCompensation in USDPaid time offCool clients and productsWork with great engineers4tech
Posted : Monday, September 22, 2025
Job # 1165
#J-18808-Ljbffr