In Season 5, Episode 13, Karl and Jon discuss a packed lineup of AWS news, including the general availability of AWS DevOps Agent with autonomous incident response capabilities, support for EC2 instance store in Amazon ECS Managed Instances for latency-sensitive workloads, and the introduction of managed daemons for managed instances, similar to Kubernetes DaemonSets. They also cover how to build high-performance applications with AWS Lambda managed instances, a migration guide for moving from Amazon ElastiCache for Redis to ElistiCache for Valkey, and the European Commission data breach involving a compromised AWS account through a supply chain attack on Aqua Security’s Trivy vulnerability scanner. And along the way, the guys realize that Karl’s muscle memory for intro titles is apparently so bad, he could probably forget his own name if he took a week off.
03:24 - AWS DevOps Agent General Availability and autonomous Incident Response with DevOps Agent
AWS DevOps Agent has officially moved from preview to general availability. This service acts as an autonomous incident investigation tool that can analyze logs, telemetry, and infrastructure metrics to help teams understand what's going wrong during incidents. Rather than replacing human SREs, it accelerates the investigation phase by correlating data from multiple sources (CloudWatch logs, monitoring tools, error messages) and reducing the time spent in manual troubleshooting. The tool can be integrated with existing monitoring platforms like PagerDuty, Datadog, New Relic, and Grafana. It supports "skills" (essentially runbooks or if-then rules) that can be customized for known failure patterns specific to an organization's infrastructure. Currently in GA, it can perform investigations but cannot yet execute remediation actions, though this is expected as a future capability. Notable customers in production include Western Governors University, ZenChef, T-Mobile, and Granola.
This article provides a practical walkthrough for implementing DevOps Agent in AWS environments to handle incident response workflows. It demonstrates how to set up the integration between incident management systems and DevOps Agent, allowing automated investigation workflows to be triggered when alerts fire. The article shows bidirectional integration with services like PagerDuty (which can feed alerts into DevOps Agent) and Slack (for notifications), and outbound capabilities to create incidents or update existing ones. The key value proposition is that the tool can handle approximately 80% of the incident investigation burden—the time-consuming process of correlating logs, metrics, and events—while human engineers remain responsible for decision-making and remediation approvals.
14:44 - Amazon ECS Managed Instances Support for EC2 Instant Store and Amazon ECS Managed Daemons for Managed Instances
Amazon ECS Managed Instances now supports EC2 instant store volumes, which are high-performance local storage options connected directly to physical instances. Instant store provides lower latency than EBS volumes since it's attached directly to the hardware rather than accessed over a network. This feature is primarily useful for highly latency-sensitive containerized workloads that require extremely fast disk access. While the number of use cases for this is relatively niche, it enables scenarios where applications need local, high-speed temporary storage without the network latency overhead of EBS volumes. This represents one of several enhancements to ECS Managed Instances announced recently.
ECS Managed Instances now supports managed daemons, a capability analogous to Kubernetes DaemonSets. This feature ensures that exactly one instance of a specified container runs on every node in an ECS cluster. This is particularly useful for system-level services that need to be present on all instances—such as monitoring agents (New Relic, Datadog), log collectors, or security scanning tools. Previously, this functionality was available for traditional self-managed EC2 compute but was missing from managed instances. The feature automatically scales with cluster size: adding a new instance to the cluster automatically deploys the daemon, and removing an instance removes it accordingly. This brings ECS Managed Instances to feature parity with self-managed EC2 deployments for daemon-like workloads.
20:10 - Building High-Performance Apps with AWS Lambda Managed Instances
AWS has published guidance on using Lambda managed instances for high-performance computing scenarios. Lambda managed instances allow developers to run Lambda functions on dedicated EC2 instances that AWS manages, providing higher resource availability than traditional Lambda. This hybrid approach enables use cases requiring consistent high CPU capacity, GPU access, or sustained high concurrency that traditional Lambda (which has memory/CPU scaling limits) cannot efficiently support. However, this represents a shift from Lambda's original value proposition of serverless simplicity. The article frames this as a solution for specialized scenarios where traditional Lambda's constraints become limiting, though experts note this use case may better serve customers who already understand their infrastructure needs and that the distinction between Lambda managed instances and containerized solutions like Fargate becomes increasingly blurred.
25:00 - Migrating to Amazon ElastiCache for Valkey from Redis
This AWS database blog article provides best practices for migrating from Amazon ElastiCache for Redis to ElastiCache for Valkey. Valkey is Amazon's open-source Redis fork that aims to provide API compatibility with Redis while offering approximately 30% cost savings. The article presents a real-world case study of a global travel technology company that successfully migrated, achieving significant cost reduction (approximately $200/day savings) with minimal downtime and only brief periods of slightly elevated latency. The migration can be performed using in-place upgrades or snapshot-based migration approaches. AWS provides console-based one-click migration tools, though for production workloads, testing thoroughly in staging environments first is recommended. The key appeal is that Valkey maintains feature parity with open-source Redis while reducing costs, making it an attractive option for organizations with substantial caching infrastructure investments.
31:25 - European Commission Data Breach via Supply Chain Attack
A data breach affected the European Commission's AWS environment, resulting in the theft of approximately 350 gigabytes of data from multiple databases. The root cause was not an AWS vulnerability but rather a compromise of the Commission's API keys through a supply chain attack. Specifically, hackers gained access to sensitive credentials through a GitHub Actions workflow vulnerability in Aqua Security's Trivy vulnerability scanner. This compromise led to malicious code being distributed, which allowed attackers to extract the Commission's AWS API keys. This incident exemplifies the broader cybersecurity trend of supply chain attacks, where adversaries find it easier to compromise upstream dependencies than to directly breach well-hardened targets. The incident underscores that cloud security relies heavily on customer credential management and that zero-day vulnerabilities in widely-used tools can have cascading effects across organizations using those tools.