Browse thousands of opportunities across our network of innovative companies
Site Reliability Engineer - Logging Metrics and Monitoring
Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
We are looking for a Site Reliability Engineer to join our Cloud Infrastructure Engineering & Operations division. Ultimately your work will focus on improving the performance and efficiency of our teams by building world-class tools and automated workflows which will produce improved outcomes for our business.
Infrastructure Engineering Automation is responsible for building tools and automating workflows for multiple engineering teams within the Cloud Engineering & Operations zone. Our services are highly visible and used every day by teams all across Athena to develop, monitor, troubleshoot and scale their web services. The team is responsible for collecting and hosting large volumes of metrics and log data; we do this by running large scale distributed, fault tolerant systems to collect and host all this data.
Our team has a big impact on productivity of hundreds of developers all across athena.
In a typical week, our engineers work on problems ranging from tuning performance, scaling services to debugging hard problems. They will introduce new features and partner with development teams to solve their pressing monitoring and logging issues. We work in an agile, sprint-based schedule running daily standups and work in both the private and public cloud
- Automate deployment of Logging and Metrics services using configuration management with puppet
- Work on production incidents and resolve them using your Linux administration and engineering skills
- Develop metrics dashboards, alert criteria to monitor and scale services
- Work on weeklong on call in rotation alongside other team members
- Support development teams to refine their logging and metrics collection
- Ability to handle on-call rotations every several weeks
- Prior experience of 3 – 5 years in a production environment with exposure to AWS and On-Prem Infrastructure and their corresponding troubleshooting methodologies, this includes AWS, Kubernetes, On-Prem Infrastructure.
- Hands on experience with configuration management using Puppet, Chef or Ansible
- Sysadmin, Devops skills for running services in Linux environment
- Experience operating production services in Linux environment and serving on call rotations
- Experience with multiple of: Bash scripting, Ruby, Python, Ruby, Perl, C++, Java, Golang
- Develop deployment templates for services in the public cloud using cloudformation, terraform
- Ability to be flexible and change with environment and business demands
- Solid understanding of Linux operating system
- Experience managing large server fleets in production
- Experience with performance analysis of services
- Experience with relevant technologies: fluentd, kafka, elasticsearch, graphite, clickhouse, terraform, prometheus, grafana, graylog, AWS cloudformation, docker containers, jenkins, load balancers, git.
- Experience with tcpdump, wireshark, or other protocol analyzers
Here’s our vision: To create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
What’s unique about our locations?
From an historic, 19th century arsenal to a converted, landmark power plant, all of athenahealth’s offices were carefully chosen to represent our innovative spirit and promote the most positive and productive work environment for our teams. Our 10 offices across the United States and India — plus numerous remote employees — all work to modernize the healthcare experience, together.
Our company culture might be our best feature.
We don't take ourselves too seriously. But our work? That’s another story. athenahealth develops and implements products and services that support US healthcare: It’s our chance to create healthier futures for ourselves, for our family and friends, for everyone.
Our vibrant and talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our goal. We continue to expand our workforce with amazing people who bring diverse backgrounds, experiences, and perspectives at every level, and foster an environment where every athenista feels comfortable bringing their best selves to work.
Our size makes a difference, too: We are small enough that your individual contributions will stand out — but large enough to grow your career with our resources and established business stability.
Giving back is integral to our culture. Our athenaGives platform strives to support food security, expand access to high-quality healthcare for all, and support STEM education to develop providers and technologists who will provide access to high-quality healthcare for all in the future. As part of the evolution of athenahealth’s Corporate Social Responsibility (CSR) program, we’ve selected nonprofit partners that align with our purpose and let us foster long-term partnerships for charitable giving, employee volunteerism, insight sharing, collaboration, and cross-team engagement.
What can we do for you?
Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs.
In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. And we provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued.
We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation.