OverviewEPAM is a leading global provider of digital platform engineering and development services.
We are committed to having a positive impact on our customers, our employees, and our communities.
We embrace a dynamic and inclusive culture.
Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow.
No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.We are seeking a highly skilled Senior Operational Intelligence Developer to join our team, responsible for supporting, enhancing, and maintaining our Elastic & Observability Platform deployed across GCP and Elastic Cloud.
This role will involve developing innovative solutions, maintaining platform reliability, and enabling self-service capabilities to empower platform consumers while participating in an on-call rotation to oversee platform health and functionality.ResponsibilitiesEnsure availability, functionality, performance, and security of observability and search platforms to meet business SLAsRespond to incidents and resolve escalations promptly during on-call periodsMaintain platform documentation, standard operating procedures, and operational guidelinesCollaborate with stakeholders and vendors to manage operational requirements, installations, and upgradesEnhance platform features and self-service capabilities, including Elastic Synthetics and chargeback automationDesign proof-of-concepts for operational improvements like AI-driven observability or Kubernetes migrationBuild, deploy, and maintain Elastic clusters using Infrastructure-as-Code (IaC) tools like Terraform and AnsiblePerform platform lifecycle management activities such as component upgrades, capacity planning, and cost optimisationFine-tune ELK stack performance across ingestion, indexing, and query layersConfigure and manage comprehensive alerting and incident management workflows, including Kibana Rules, Watchers, and PagerDutySupport ingestion, enrichment, backup, and restoration of platform dataPlan and manage SSL certificate rotations and cluster scalability requirementsRequirements3+ years of experience in Operational IntelligenceProven expertise in implementing, operating, and managing Elastic clustersKnowledge of Elastic Stack components, including Elasticsearch, Kibana, and LogstashProficiency in Infrastructure-as-Code (IaC) tools such as Terraform and Ansible, with flexibility to use Jenkins CISkills in Python for automation and extending platform functionalityUnderstanding of incident management workflows with tools like PagerDuty and UptrendsBackground in troubleshooting and resolving complex platform issues efficientlyCompetency in managing scalable, fault-tolerant platforms with a focus on performance and securityStrong communication skills in English (B2 level or higher) for collaborating with technical and non-technical stakeholdersNice to haveFamiliarity with additional tools such as Groovy, Linux Administration, and Jenkins CI pipelinesCapability to optimise observability workflows using advanced integrations in Uptrends and PagerDutyShowcase of previous work with Elastic Synthetics for advanced monitoring and testingWe offerInternational projects with top brandsWork with global teams of highly skilled, diverse peersEmployee financial programsPaid time off and sick leaveUpskilling, reskilling and certification coursesUnlimited access to the LinkedIn Learning library and 22,000+ coursesGlobal career opportunitiesVolunteer and community involvement opportunitiesEPAM Employee GroupsAward-winning culture recognized by Glassdoor, Newsweek and LinkedIn
#J-18808-Ljbffr
Senior Developer • México