Talent.com
Esta oferta de trabajo no está disponible en tu país.
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

OnHiresMexico, Mexico
Hace más de 30 días
Descripción del trabajo

Position : Site Reliability Engineer (SRE)

Location : Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)

Working Hours : Availability to work between 5 PM and 8 AM CET, in one of the following shifts : 17 : 00–01 : 00 or 00 : 00–08 : 00.

Company Overview :

Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools , and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.

The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!

Role Overview :

As a Site Reliability Engineer (SRE) , you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable , and performant — even under high transaction volumes and regulatory requirements.

You will work closely with backend engineers , DevOps , InfoSec , and operational teams to build automation , improve observability , and respond to incidents.

Key Requirements :

Experience with AWS or hybrid data center setups

Reading logs and stacktraces to determine the root cause of incidents

Infrastructure as Code : Experience with Terraform , Helm , Ansible , (optional : Werf )

Linux administration and container orchestration (K8s) skills

Experience with monitoring / observability stacks : Prometheus , Grafana , ELK , Loki , etc.

Strong understanding of TCP / IP , DNS , and load balancers

Familiarity with incident response , postmortems , and blameless culture

Availability to work between 5 PM and 8 AM CET, in one of the following shifts : 17 : 00–01 : 00 or 00 : 00–08 : 00

Bonus Skills :

Background in high-throughput environments (e.g., financial, trading, iGaming)

Experience with CDNs , and real-time log aggregation

Proficiency in one or more scripting languages ( Python , Bash , Go )

Knowledge of Java , PHP with their respective web-development frameworks

Hands-on experience with MSSQL , PostgreSQL , MongoDB , etc.

Exposure to Kafka , Redis , or other event-driven systems

Key Responsibilities :

Maintain and improve SLA / SLO / SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments)

Manage and support highly available, scalable infrastructure ( K8s , cloud, and bare metal)

Implement and manage monitoring , logging , and alerting systems (e.g., Prometheus , Grafana , Loki , ELK )

Automate deployments and operations using CI / CD pipelines (e.g., Jenkins , ArgoCD , Helm )

Conduct post-incident reviews , define action items, and reduce mean time to recovery (MTTR)

Participate in on-call rotation to ensure 24 / 7 system reliability

Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance)

Collaborate with Dev , QA , DevOps , and Ops to improve services' stability and uptime

Success Metrics : SLO 99.95%

95% of infrastructure managed via code and automation

Documented runbooks and alert playbooks per service group

Why You'll Love Working Here :

International Team : Be part of a respectful, supportive, and goal-driven team.

Freedom & Responsibility : We trust you to take ownership of your work.

Сompetitive Salary : We offer competitive compensation based on your skills and experience.

Fully Remote : Work from anywhere, with optional access to our offices in Limassol , Kyiv , London , or Tbilisi .

Flexible Schedule : We measure performance, not time.

Unlimited Paid Time Off : Enjoy paid vacation and sick leave days for a great work-life balance.

Career Development : Opportunities for continuous learning and growth.

Team-Building & Fun : Enjoy awesome corporate parties and team-building events throughout the year.

Referral Bonuses : Earn rewards when you refer talented friends to join us.

Private Medical Insurance : Choose the right coverage for you, with full / partial compensation based on cost.

Flexible Benefits : Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.

Learning Foundation : Participate in our biannual raffle for the chance to learn something new outside of your role.

Crear una alerta de empleo para esta búsqueda

Site Reliability Engineer • Mexico, Mexico