What happens to a neural network trained with random data?
Are massive neural networks just lookup tables or do they truly learn something?
Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University.
Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on
What makes deep learning unique?
I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of?
Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations.
What really improves the training quality of a neural network?
We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima?
Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation.
Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy.
As always we spoke about the future of AI and the role deep learning will play.
I hope you enjoy the show!
Don't forget to join the conversation on our new Discord channel. See you there!
References
Homepage of Stanisław Jastrzębski https://kudkudak.github.io/
A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394
Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623
Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489
Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491
Rust in the Cosmos Part 2: testing software in space (Ep. 255)
Rust in the Cosmos: Decoding Communication Part I (Ep. 254)
AI and Video Game Development: Navigating the Future Frontier (Ep. 253)
Kaggle Kommando's Data Disco: Laughing our Way Through AI Trends (Ep. 252)
Revolutionizing Robotics: Embracing Low-Code Solutions (Ep. 251)
Is SQream the fastest big data platform? (Ep. 250)
OpenAI CEO Shake-up: Decoding December 2023 (Ep. 249)
Careers, Skills, and the Evolution of AI (Ep. 248)
Open Source Revolution: AI’s Redemption in Data Science (Ep. 247)
Money, Cryptocurrencies, and AI: Exploring the Future of Finance with Chris Skinner [RB] (Ep. 246)
Debunking AGI Hype and Embracing Reality [RB] (Ep. 245)
Destroy your toaster before it kills you. Drama at OpenAI and other stories (Ep. 244)
The AI Chip Chat 🤖💻 (Ep. 243)
Rolling the Dice: Engineering in an Uncertain World (Ep. 242)
How Language Models Are the Ultimate Database(Ep. 241)
Elon is right this time: Rust is the language of AI (Ep. 240)
Attacking LLMs for fun and profit (Ep. 239)
Unlocking Language Models: The Power of Prompt Engineering (Ep. 238)
Erosion of Software Architecture Quality in the Age of AI Code Generation (Ep. 237)
The new dimension of AI: Vector Databases (Ep. 236)
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
A Prairie Home Companion: News from Lake Wobegon