Multimodal Chain-of-Thought Reasoning in Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
InstructPix2Pix: Learning to Follow Image Editing Instructions
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Why do Nearest Neighbor Language Models Work?
Text2Poster: Laying Out Stylized Texts on Retrieved Images
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Reversible Column Networks
The Forward-Forward Algorithm: Some Preliminary Investigations
Cramming: Training a Language Model on a Single GPU in One Day
TorchGeo: deep learning with geospatial data
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Editing Models with Task Arithmetic
What do Vision Transformers Learn? A Visual Exploration
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
Programming Is Hard - Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
DAMO-YOLO : A Report on Real-Time Object Detection Design
Join Podbean Ads Marketplace and connect with engaged listeners.
Advertise Today
Create your
podcast in
minutes
It is Free
gm! crypto
The WAN Show
Big Technology Podcast
Cyber Security Headlines
Risky Business