#84- FineWeb, the best dataset to pre-train LLMs.
Life with AI

#84- FineWeb, the best dataset to pre-train LLMs.

2024-06-13

Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results.


Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1

Instagram of the podcast: https://www.instagram.com/podcast.lifewithai

Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai

Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free