Position : Site Reliability Engineer (SRE)
Location : Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)
Working Hours : Availability to work between 5 PM and 8 AM CET, in one of the following shifts : 17 : 00–01 : 00 or 00 : 00–08 : 00.
Company Overview :
Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools , and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.
The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!
Role Overview :
As a Site Reliability Engineer (SRE) , you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable , and performant — even under high transaction volumes and regulatory requirements.
You will work closely with backend engineers , DevOps , InfoSec , and operational teams to build automation , improve observability , and respond to incidents.
Key Requirements :
Experience with AWS or hybrid data center setups
Reading logs and stacktraces to determine the root cause of incidents
Infrastructure as Code : Experience with Terraform , Helm , Ansible , (optional : Werf )
Linux administration and container orchestration (K8s) skills
Experience with monitoring / observability stacks : Prometheus , Grafana , ELK , Loki , etc.
Strong understanding of TCP / IP , DNS , and load balancers
Familiarity with incident response , postmortems , and blameless culture
Availability to work between 5 PM and 8 AM CET, in one of the following shifts : 17 : 00–01 : 00 or 00 : 00–08 : 00
Bonus Skills :
Background in high-throughput environments (e.g., financial, trading, iGaming)
Experience with CDNs , and real-time log aggregation
Proficiency in one or more scripting languages ( Python , Bash , Go )
Knowledge of Java , PHP with their respective web-development frameworks
Hands-on experience with MSSQL , PostgreSQL , MongoDB , etc.
Exposure to Kafka , Redis , or other event-driven systems
Key Responsibilities :
Maintain and improve SLA / SLO / SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments)
Manage and support highly available, scalable infrastructure ( K8s , cloud, and bare metal)
Implement and manage monitoring , logging , and alerting systems (e.g., Prometheus , Grafana , Loki , ELK )
Automate deployments and operations using CI / CD pipelines (e.g., Jenkins , ArgoCD , Helm )
Conduct post-incident reviews , define action items, and reduce mean time to recovery (MTTR)
Participate in on-call rotation to ensure 24 / 7 system reliability
Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance)
Collaborate with Dev , QA , DevOps , and Ops to improve services' stability and uptime
Success Metrics : SLO 99.95%
95% of infrastructure managed via code and automation
Documented runbooks and alert playbooks per service group
Why You'll Love Working Here :
International Team : Be part of a respectful, supportive, and goal-driven team.
Freedom & Responsibility : We trust you to take ownership of your work.
Сompetitive Salary : We offer competitive compensation based on your skills and experience.
Fully Remote : Work from anywhere, with optional access to our offices in Limassol , Kyiv , London , or Tbilisi .
Flexible Schedule : We measure performance, not time.
Unlimited Paid Time Off : Enjoy paid vacation and sick leave days for a great work-life balance.
Career Development : Opportunities for continuous learning and growth.
Team-Building & Fun : Enjoy awesome corporate parties and team-building events throughout the year.
Referral Bonuses : Earn rewards when you refer talented friends to join us.
Private Medical Insurance : Choose the right coverage for you, with full / partial compensation based on cost.
Flexible Benefits : Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.
Learning Foundation : Participate in our biannual raffle for the chance to learn something new outside of your role.
Site Reliability Engineer • Mexico, Mexico