With its 175 billion parameters and a massive corpus of data on which it is trained – GPT-3 is already enabling some innovative applications
But GPT-3 could help pave the way for a new way of developing AI models
The GPT-3 paper is called Language Models are Few-Shot Learners
The main innovation of GPT-3 could be the uptake of approaches like few shot learning
The few shot learning model flips development of AI models
Traditionally, we start with data for a problem and develop the model based on the data.
The model is specific to the problem.
A new problem calls for a new model – and in turn new data for the model
If you want to train a model to predict traffic patterns in New York, you build a model of New York traffic patterns.
If you want to model air pollution in New York, that’s a different model
With GPT-3 you start with the model instead of the data.
You then use techniques like few-shot learning to answer a variety of questions which the model can answer without supplying new data or retraining the model
This is the main innovation behind GPT-3
The caveat is of course we need a large model (such as GPT-3)
But the principle of a model that has learnt an entire domain is compelling
So, you could model the entire NHS (UK health system) and then create AI models using few-shot learning relating to specific aspects
If we consider the example of New York, a single model for New York could be developed which would answer multiple queries about New York such as traffic patterns or air pollution. In contrast, currently we need data for each model
This flips the AI model development on its head
Problem – Data – Model – Inference
We can go
Model – Problem – Inference
i.e. just the forward pass
As long as you have a massive model pre-trained for each domain (like I said for NHS / New York etc)
It took me a bit of reading to understand this model but the references below help
The original GPT-3 paper is a great reference
Also, this hour plus video which goes into detail about the paper
GPT-3: Language Models are Few-Shot Learners (Paper Explained)
and this more concise blog
Image zero shot, one shot and few shot learning from the paper Language Models are Few-Shot Learners