Overview
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI / ML, and our people-first culture has earned us multiple Best Place to Work awards.
Be among the first 25 applicants. 3 weeks ago
What you will do
- Shift : Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call
- On call shifts : every 6 weeks, for one week as primary responder and next week as secondary
- Manage alerts daily, check systems, and escalate issues as needed
- Be part of a team that provides 24×7 on-call support for critical SaaS events
- Be available in case of emergencies when team members are not available or need help
- Document issues and remediation steps
- Proactively create appropriate monitors in the EKS / K8S ecosystem
- Deploy to EKS / K8s cluster using Terraform and Helm
- Learn and maintain existing infrastructure running under Docker Swarm
- Improve existing infrastructure health by implementing checks and scripts to correct known issues
- Maintain and develop deployment code
- Automate manual tasks
- Implement / integrate new technologies in our Cloud Infrastructure
- Collaborate with other teams and departments to provide the highest level of support and assistance
- Apply a real customer focus when planning deployments / updates, having the customer in the forefront of the mind, and considering the impact on them before making changes
- Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers
- Perform RCA and take necessary corrective actions to prevent the recurrence of issues
- Create and assign alert-related actions to the appropriate team after the investigation
- Handle support requests for environment-specific actions
- Identify and provide automation requirements to improve RCA
Must haves
2+ years of professional experienceExperience working with DatadogHands-on experience as an AWS Cloud EngineerWorking knowledge of EKS / Terraform / HelmWorking experience with Docker and Docker SwarmGood understanding of AWS IAM roles and policiesExperience logging and monitoring AWS resources using CloudWatch logsExperience working in a Linux environmentProficient in Bash and / or Python scriptingA strong understanding of web technologies such as REST APIsWorking experience with monitoring solutions, such as Grafana and PrometheusExcellent oral and written communication skills; customer-facing communication skills to explain issues and RCAsExperience in Product / Application Support for SaaS-based productsUnderstanding of APIs, Databases, Systems Architecture, and DesignDesigning, implementing, and operating in a DevSecOps environmentExcellent communication skills, both written and verbalAbility to work independently as well as within a collaborative environmentA technical aptitude with the desire to learn new and evolving technologiesUpper-Intermediate English levelNice to have
Experience with GCP or AzureCertifications : AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking SpecialtyPerks and benefits
Professional growth : Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.Competitive compensation : We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.A selection of exciting projects : Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.Flextime : Tailor your schedule for an optimal work-life balance, with options for remote or in-office work to suit your needs.Seniority level
Mid-Senior levelEmployment type
Full-timeIndustry
IT Services and IT ConsultingReferrals increase your chances of interviewing at AgileEngine.
#J-18808-Ljbffr