In the last blog, we discussed the opportunities and risks of foundational models. Foundation models are trained on a broad dataset at scale and are adaptable to a wide range of downstream tasks. In this blog, we extend that discussion to learn about self-supervised learning, one of the technologies underpinning foundation models.
NLP has taken off due to Transformer-based pre-trained language models (T-PTLMs). Transformer-based models like GPT and BERT are based on transformers, self-supervised learning, and transfer learning. In essence, these models build universal language representations from large volumes of text data using self-supervised learning and then transfer this knowledge to subsequent tasks. This means that you do not need to train the downstream(subsequent) models from scratch.
In supervised learning, training the model from scratch requires many labelled instances that are expensive to generate. Various strategies have been used to overcome this problem. We can use Transfer learning to learn in one context and apply it to a related context. In this case, the target task should be similar to the source task. Transfer learning allows the reuse of knowledge learned in source tasks to perform well in the target task. Here the target task should be similar to the source task. The idea of transfer learning originated in Computer vision, where large pre-trained CNN models are adapted to downstream tasks by including few task-specific layers on top of the pre-trained model, which are fine-tuned on the target dataset.
Another problem was: Deep learning models like CNN and RNN cannot easily model long-term contexts. To overcome this problem, the idea of transformers was proposed. Transformers contain a stack of encoders and decoders, and they can learn complex sequences.
The idea of Transformer-based pre-trained language models (T-PTLMs) evolved by combining transformers and self-supervised learning (SSL) in the NLP research community. Self-supervised learning allows the transformers to learn based on the pseudo supervision provided by one or more pre-training tasks. GPT and BERT are the first T-PTLMs developed using this approach. SSLs do not need a large amount of human-labelled data because they can learn from the pre-trained data.
Thus, Self-Supervised Learning (SSL) is a new learning paradigm that helps the model learn based on the pseudo supervision provided by pre-training tasks. SSLs find applications in areas like Robotics, Speech, and Computer vision.
SSL is similar to both unsupervised learning and supervised learning but also different from both. SSL is similar to unsupervised learning in that it does not require human-labelled instances. However, SSL needs supervision via the pre-training stage (like supervised learning).
In the next blog, we will continue this discussion by exploring a survey of transformer-based models
Source: Adapted from
Image source pixabay – Children learning without supervision