Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
References
Is Apple M1 good for machine learning? (Ep.136)
Rust and deep learning with Daniel McKenna (Ep. 135)
Scaling machine learning with clusters and GPUs (Ep. 134)
What is data ethics? (Ep. 133)
A Standard for the Python Array API (Ep. 132)
What happens to data transfer after Schrems II? (Ep. 131)
Test-First Machine Learning [RB] (Ep. 130)
Similarity in Machine Learning (Ep. 129)
Distill data and train faster, better, cheaper (Ep. 128)
Machine Learning in Rust: Amadeus with Alec Mocatta [RB] (ep. 127)
Top-3 ways to put machine learning models into production (Ep. 126)
Remove noise from data with deep learning (Ep.125)
What is contrastive learning and why it is so powerful? (Ep. 124)
Neural search (Ep. 123)
Let's talk about federated learning (Ep. 122)
How to test machine learning in production (Ep. 121)
Why synthetic data cannot boost machine learning (Ep. 120)
Machine learning in production: best practices [LIVE from twitch.tv] (Ep. 119)
Testing in machine learning: checking deeplearning models (Ep. 118)
Testing in machine learning: generating tests and data (Ep. 117)
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Acquired