J.T. Wolohan is the author of "Mastering Large Datasets with Python," a book that helps Python developers adopt functional programming styles in their their project prototyping, in other to scale up towards big data projects. Greg Nokes, a Master Technical Architect with Heroku, initiates their conversation by lying out what Python is and what it's being used for. As a high-level scripting language, Python was primarily used by sysadmins as a way to quickly manipulate data. Over the years, an ecosystem of third-party packages have manifested around scientific and mathematical approaches. Similarly, its web frameworks have shifted towards asynchronous flows, allowing developers to ingest data, process them, and handle traffic in more efficient ways.
J.T.'s book is all about how to move from small datasets to larger ones. He lays out three stages which every project goes through. In the first phase, a developer can solve a problem on their individual PC. This stage typically deals with datasets that are manageable, and can be processed with the compute hardware on hand. The second phase is one in which you still have enough compute power on your laptop to process data, but the data itself is too large. It's not unreasonable for machine learning corpus to reach five terabytes, for example. The third phase proposed is one where an individual developer has neither the compute resources to process the data nor the disk space to store it. In these cases, external resources are necessary, such as cluster computing and some type of distributed data system. J.T. argues that by exercising good programming practices in the first phase, the third "real world" phasing will require little modification of your actual data processing algorithms.
Links from this episode51. Best Practices in Error Handling
50. High Energy, Low Power: A Bluetooth Christmas Story
49. Building Effective Distributed Teams
48. From NodeConf EU 2019
47. Working with an Event-Driven Architecture
46. Go at Heroku
45. Illuminating Poetry with Technology
44. GraphQL's Benefits and Costs
43. The GitHub Student Developer Pack
42. How to Prepare for Coding Interviews
41. Architecting Multi-Tenancy
40. Operating Open Collective
39. Evolving Alongside your Tech Stack
38. Building with Web Components
37. Bonus: Organizing a Memorable Tech Conference
36. Supporting Open Source through Open Collective
35. Bringing Open Source to Work
34. An Introduction to Rust
33. GopherCon 2019 Spotlight, Part 2
32. GopherCon 2019 Spotlight, Part 1
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Lex Fridman Podcast