2 days ago Be among the first 25 applicants
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.
As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.
Responsibilities
- Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
- Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
- Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
- Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
- Optimize observability by implementing monitoring, logging, and tracing solutions
- Maintain and manage CI / CD processes within Azure DevOps or similar environments
- Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
- Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively
Requirements
Minimum 3+ years of programming experience, preferably in GoLangHands-on experience with at least one scripting language (e.g., Bash or Python)Proficiency with Kubernetes, with at least 3 years of practical expertiseFundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platformsSkills in configuring and managing CI / CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deploymentBackground in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectivelyNice to have
Familiarity with multiple cloud providers, including AWS and GCP alongside AzureExpertise in GitOps packaging and deployment tools like Argo CD and HelmUnderstanding of service meshes like Istio for Kubernetes-based microservices architecturesCompetency in infrastructure-as-code tools such as TerraformBackground in software development with experience across multiple domainsWe offer
International projects with top brandsWork with global teams of highly skilled, diverse peersHealthcare benefitsEmployee financial programsPaid time off and sick leaveUpskilling, reskilling and certification coursesUnlimited access to the LinkedIn Learning library and 22,000+ coursesGlobal career opportunitiesVolunteer and community involvement opportunitiesEPAM Employee GroupsAward-winning culture recognized by Glassdoor, Newsweek and LinkedInSeniority level
Seniority level
Mid-Senior level
Employment type
Employment type
Full-time
Job function
Job function
Engineering, Information Technology, and Business Development
Industries
Software Development, IT Services and IT Consulting, and Nanotechnology Research
Referrals increase your chances of interviewing at EPAM Systems by 2x
Senior Site Reliability / Gitops Engineer
Senior DevOps Software Engineer - Argentina or Mexico Based
Aguascalientes, Aguascalientes, Mexico 1 month ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
J-18808-Ljbffr