Amazon Web Services announced Amazon DevOps Guru, a fully-managed operations service that uses machine learning to make it easier for developers to improve application availability by automatically detecting operational issues and recommending specific actions for remediation.
Amazon DevOps Guru applies machine learning informed by years of Amazon.com and AWS operational excellence to automatically collect and analyze data like application metrics, logs, events, and traces for identifying behaviors that deviate from normal operating patterns (e.g. under-provisioned compute capacity, database I/O over-utilization, memory leaks, etc.).
When Amazon DevOps Guru identifies anomalous application behavior (e.g. increased latency, error rates, resource constraints, etc.) that could cause potential outages or service disruptions, it alerts developers with issue details (e.g. resources involved, issue timeline, related events, etc.) via Amazon Simple Notification Service (SNS) and partner integrations like Atlassian Opsgenie and PagerDuty to help them quickly understand the potential impact and likely causes of the issue with specific recommendations for remediation.
Developers can use remediation suggestions from Amazon DevOps Guru to reduce time to resolution when issues arise and improve application availability and reliability with no manual setup or machine learning expertise required. There are no upfront costs or commitments with Amazon DevOps Guru, and customers pay only for the data Amazon DevOps Guru analyzes.
As more organizations move to cloud-based application deployment and microservice architectures to globally scale their businesses and operations without the limitations of on-premises deployments, applications have become increasingly distributed to meet customer needs, and developers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues.