What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)
Data Science at Home

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)

2020-07-19
In this episode I speak about data transformation frameworks available for the data scientist who writes Python code. The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context.  In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts. Don't forget to join our Discord channel and comment previous episodes or propose new ones.   This episode i...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free