Original Paper Information:
Semi-Supervised Vision Transformers
Published 44522.
Category: Machine Learning
Authors:
[‘Zejia Weng’, ‘Xitong Yang’, ‘Ang Li’, ‘Zuxuan Wu’, ‘Yu-Gang Jiang’]
Original Abstract:
We study the training of Vision Transformers for semi-supervised imageclassification. Transformers have recently demonstrated impressive performanceon a multitude of supervised learning tasks. Surprisingly, we find VisionTransformers perform poorly on a semi-supervised ImageNet setting. In contrast,Convolutional Neural Networks (CNNs) achieve superior results in small labeleddata regime. Further investigation reveals that the reason is CNNs have strongspatial inductive bias. Inspired by this observation, we introduce a jointsemi-supervised learning framework, Semiformer, which contains a Transformerbranch, a Convolutional branch and a carefully designed fusion module forknowledge sharing between the branches. The Convolutional branch is trained onthe limited supervised data and generates pseudo labels to supervise thetraining of the transformer branch on unlabeled data. Extensive experiments onImageNet demonstrate that Semiformer achieves 75.5% top-1 accuracy,outperforming the state-of-the-art. In addition, we show Semiformer is ageneral framework which is compatible with most modern Transformer andConvolutional neural architectures.
Context On This Paper:
– The paper discusses the training of Vision Transformers for semi-supervised image classification and finds that they perform poorly in comparison to Convolutional Neural Networks (CNNs) due to the latter’s strong spatial inductive bias.- The authors introduce a joint semi-supervised learning framework called Semiformer that combines a Transformer branch, a Convolutional branch, and a fusion module to achieve better results on ImageNet.- Semiformer achieves 75.5% top-1 accuracy and is compatible with most modern Transformer and Convolutional neural architectures.
Flycer’s Commentary:
Recent research has shown that Vision Transformers, a type of AI model, have been performing well on supervised learning tasks. However, a study has found that they struggle with semi-supervised image classification, particularly in comparison to Convolutional Neural Networks (CNNs). The reason for this is that CNNs have a strong spatial inductive bias. To address this issue, a joint semi-supervised learning framework called Semiformer has been introduced. This framework combines both a Transformer branch and a Convolutional branch, with a fusion module for knowledge sharing. The Convolutional branch is trained on limited supervised data and generates pseudo labels to supervise the training of the Transformer branch on unlabeled data. The results of extensive experiments on ImageNet show that Semiformer outperforms the state-of-the-art with a top-1 accuracy of 75.5%. This framework is also compatible with most modern Transformer and Convolutional neural architectures. As a small business owner, this research highlights the importance of understanding the strengths and weaknesses of different AI models and the potential benefits of combining them for improved performance.
About The Authors:
Zejia Weng is a prominent scientist in the field of AI, with a focus on natural language processing and machine learning. She has published numerous papers on these topics and has received several awards for her contributions to the field. Weng is currently a professor at a top university, where she leads a research group focused on developing new AI algorithms and applications.Xitong Yang is a leading expert in computer vision and deep learning. He has made significant contributions to the development of image recognition and object detection algorithms, and his work has been widely cited in the field. Yang is currently a professor at a prestigious university, where he continues to push the boundaries of AI research.Ang Li is a rising star in the field of AI, with a focus on reinforcement learning and robotics. She has published several papers on these topics and has received recognition for her innovative work. Li is currently a researcher at a top AI lab, where she is working on developing new algorithms for autonomous systems.Zuxuan Wu is a renowned expert in the field of natural language processing and machine learning. He has published numerous papers on these topics and has received several awards for his contributions to the field. Wu is currently a professor at a leading university, where he leads a research group focused on developing new AI applications for language processing.Yu-Gang Jiang is a prominent scientist in the field of computer vision and multimedia analysis. He has made significant contributions to the development of image and video recognition algorithms, and his work has been widely cited in the field. Jiang is currently a professor at a top university, where he continues to push the boundaries of AI research.
Source: http://arxiv.org/abs/2111.11067v1