歡迎大家持續關注,也歡迎做相關研究的讀者們積極投稿。
引用文獻: 1. A Survey of the Recent Architectures of Deep Convolutional Neural Networks https://arxiv.org/ftp/arxiv/papers/1901/1901.06032.pdf 2. Deep Learning in Image Classification: A Survey Report https://conferences.computer.org/ictapub/pdfs/ITCA2020-6EIiKprXTS23UiQ2usLpR0/114100a174/114100a174.pdf3. MLP-Mixer: An all-MLP Architecture for Vision https://arxiv.org/pdf/2105.01601.pdf4. Do You Even Need Attention? A Stack of Feed-Forward Layers Does SurprisinglyWell on ImageNet https://arxiv.org/pdf/2105.02723.pdf5. RepMLP: Re-parameterizing Convolutions into Fully-connected Layers forImage Recognition https://arxiv.org/pdf/2105.01883.pdf6. 7 Papers & Radios | 純 MLP 圖像分類架構;基於強注意力的跟蹤器網絡 https://mp.weixin.qq.com/s/dYXfGskKHGjuSfI6gJZW9w7. 震驚!無需卷積、注意力機制,僅需 MLP 即可實現與 CNN、ViT 相媲美的性能 https://mp.weixin.qq.com/s/rNp4wLT2NwAo22gaLbVEQA8. MLP 迴歸,無需卷積、自注意力,純多層感知機視覺架構媲美 CNN、ViT https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650814947&idx=1&sn=7cce32919afc573f1ddc8090ca90a74d&chksm=84e5fd9db392748be72058d2b872a44469490bb0ebe9d74800c86f3f94e6a9da30837bb95b0d&scene=21#wechat_redirect9. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.1192910. Training data-efficient image transformers & distillation through attention https://arxiv.org/abs/2012.1287711. DeepViT: Towards Deeper Vision Transformer https://arxiv.org/abs/2103.1188612. Going deeper with image transformers https://arxiv.org/abs/2103.1723913. Crossvit: Cross-attention multi-scale vision transformer for image classification https://arxiv.org/abs/2103.1489914. Rethinking spatial dimensions of vision transformers https://arxiv.org/abs/2103.1630215. Levit: a vision transformer in convnet’s clothing for faster inference https://arxiv.org/abs/2104.0113616. Cvt: Introducing convolutions to vision transformers https://arxiv.org/abs/2103.15808