We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31 × 31, in contrast to commonly used 3 × 3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g ., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive ﬁelds and higher shape bias rather than texture bias.
2022: Xiaohan Ding, X. Zhang, Yi Zhou, Jungong Han, Guiguang Ding, Jian Sun