Corey Martin values storytelling. It's just one way developers can share their experiences in order for others to take lessons. To that end, this episode takes a close look at production issues from two different applications to examine what went wrong and how it was fixed.
Meg Viar is a Senior Software Developer at Nomadic Learning, an e-learning platform. One day, they noticed that, for a certain group of users, a column of information in their database row was nulled. It didn't look like any user--either internally or externally--intentionally changed these values, and there hadn't been any new code deployed in days. The only clue was that the data was all changed at the same time. It turned out that a weekly cron job was deleting some data on an in-memory list. However, the database ORM they use also overloads the delete keyword, and was actually deleting the production data. Restoring the data from a backup was easy, and reworking the code to not use the data was a quick fix. However, going forward, Meg and her team came up with several ways to adjust the process around code changes like this from occurring again.
Brendan Hennessy is the co-founder and CTO at Launchpad Lab, a studio that builds custom web and mobile applications. One of their clients is an SAT/ACT test prep app, and students complained that the app was extraordinarily slow. Brendan was accustomed to seeing such feedback on testing days, when heavy volume brought added strain to servers, and they accounted for this by increasing capacity. But this was different: there weren't any tests scheduled during the period. Instead, one of their own services was inadvertently DDOSing an endpoint, expecting a response; when one didn't arrive, it just kept making requests. They reworked this code to make a request once and simply wait for a response without trying again. In the future, they committed themselves to doing more in-person blitzes of new features, since issues like this only arise after multiple users use the app--something automated tests have trouble simulating.
Links from this episode118. Why Writing Matters for Engineers
117. Open Source with Jim Jagielski
116. Success From Anywhere
115. Demystifying the User Experience with Performance Monitoring
114. Beyond Root Cause Analysis in Complex Systems
113. Principles of Pragmatic Engineering
112. Managing Public Key Infrastructure within an Enterprise
111. Gift Cards for Small Businesses
110. Scaling a Bernie Meme
109. Meditation for the Curious Skeptic
108. Building Community with the Wicked CoolKit
I Was There: Stories of Production Incidents II
107. How to Write Seriously Good Software
106. Growing a Self-Funded Company
105. Event Sourcing and CQRS
104. The Evolution of Service Meshes
103. Chaos Engineering
102. Whether or Not to Repeat Yourself: DRY, DAMP, or WET
101. Cloud Native Applications
100. Math for Programmers
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
Black Wolf Feed (Chapo Premium Feed Bootleg)
Bannon`s War Room