J.T. Wolohan is the author of "Mastering Large Datasets with Python," a book that helps Python developers adopt functional programming styles in their their project prototyping, in other to scale up towards big data projects. Greg Nokes, a Master Technical Architect with Heroku, initiates their conversation by lying out what Python is and what it's being used for. As a high-level scripting language, Python was primarily used by sysadmins as a way to quickly manipulate data. Over the years, an ecosystem of third-party packages have manifested around scientific and mathematical approaches. Similarly, its web frameworks have shifted towards asynchronous flows, allowing developers to ingest data, process them, and handle traffic in more efficient ways.
J.T.'s book is all about how to move from small datasets to larger ones. He lays out three stages which every project goes through. In the first phase, a developer can solve a problem on their individual PC. This stage typically deals with datasets that are manageable, and can be processed with the compute hardware on hand. The second phase is one in which you still have enough compute power on your laptop to process data, but the data itself is too large. It's not unreasonable for machine learning corpus to reach five terabytes, for example. The third phase proposed is one where an individual developer has neither the compute resources to process the data nor the disk space to store it. In these cases, external resources are necessary, such as cluster computing and some type of distributed data system. J.T. argues that by exercising good programming practices in the first phase, the third "real world" phasing will require little modification of your actual data processing algorithms.
Links from this episode11. The Agony and Ecstasy of Maintaining Good Documentation
10. How to Learn Something New
9. Coordinating Remote Work
8. Sharing Data with Dataclips
7. Application Performance and Building SaaS on PaaS
6. Making Remote Work Work
5. Solving Social Problems with Data Science
4. Delivering Amazing Presentations
3. Spreading the Database Love
2. Ruby, Regexes and Risk: Aaron Patterson Explains Why Hiring Open Source Developers Will Make Your Company Stronger
1. Running Grails in Production
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Lex Fridman Podcast