J.T. Wolohan is the author of "Mastering Large Datasets with Python," a book that helps Python developers adopt functional programming styles in their their project prototyping, in other to scale up towards big data projects. Greg Nokes, a Master Technical Architect with Heroku, initiates their conversation by lying out what Python is and what it's being used for. As a high-level scripting language, Python was primarily used by sysadmins as a way to quickly manipulate data. Over the years, an ecosystem of third-party packages have manifested around scientific and mathematical approaches. Similarly, its web frameworks have shifted towards asynchronous flows, allowing developers to ingest data, process them, and handle traffic in more efficient ways.
J.T.'s book is all about how to move from small datasets to larger ones. He lays out three stages which every project goes through. In the first phase, a developer can solve a problem on their individual PC. This stage typically deals with datasets that are manageable, and can be processed with the compute hardware on hand. The second phase is one in which you still have enough compute power on your laptop to process data, but the data itself is too large. It's not unreasonable for machine learning corpus to reach five terabytes, for example. The third phase proposed is one where an individual developer has neither the compute resources to process the data nor the disk space to store it. In these cases, external resources are necessary, such as cluster computing and some type of distributed data system. J.T. argues that by exercising good programming practices in the first phase, the third "real world" phasing will require little modification of your actual data processing algorithms.
Links from this episode118. Why Writing Matters for Engineers
117. Open Source with Jim Jagielski
116. Success From Anywhere
115. Demystifying the User Experience with Performance Monitoring
114. Beyond Root Cause Analysis in Complex Systems
113. Principles of Pragmatic Engineering
112. Managing Public Key Infrastructure within an Enterprise
111. Gift Cards for Small Businesses
110. Scaling a Bernie Meme
109. Meditation for the Curious Skeptic
108. Building Community with the Wicked CoolKit
I Was There: Stories of Production Incidents II
107. How to Write Seriously Good Software
106. Growing a Self-Funded Company
105. Event Sourcing and CQRS
104. The Evolution of Service Meshes
103. Chaos Engineering
102. Whether or Not to Repeat Yourself: DRY, DAMP, or WET
101. Cloud Native Applications
100. Math for Programmers
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Lex Fridman Podcast