7 GenAI & ML Concepts Explained in 1-Min Data Videos

Not your typical videos: it’s not someone talking, it’s the data itself that “talks”. More precisely, data animations that serve as 60-seconds tutorials. I selected them among those that I created in Python and posted on YouTube. Each frame represents a new data or training set (real or synthetic), a different model in a particular family, different parameters or hyperparameters, or a new iteration in some evolving system. The videos consist of hundreds of frames, with between 4 and 20 frames per second. For detailed explanations and Python code, see “source” below each video.

1. Gradient Descent

At the core of most ML and GenAI algorithms, this concept is fundamental to neural networks. This version has no math, no learning rate, and you can generalize it to a lot more than a 2-dimensional feature space.

Source: here.

2. Sampling Outside the Observation Range

Many GenAI techniques produce poor results when the training set is small. The reason is because none of the existing methods can sample artificial yet realistic values outside the training set range: below the minimum, or above the maximum. Not even for a single feature, let alone in higher dimensions with correlated features. All of them rely on quantiles generation at some point, and none of the quantile functions in Python offer this possibility. The classic solution consists of using bigger and bigger training sets or trillions of weights, to fix sampling issues. But you can do it a lot faster with much less data. The video below starts with the empirical distribution observed on a small training set, and then extends it as if your training set was far bigger. Pure magic, like reconstructing invisible observations! And you can generalize easily to higher dimensions.

Source: here.

3. Cloud Regression

You can do all types of regression with just one simple distribution-free method. Why bother about learning 200 types of regression models when a generic one encompasses all of them, and a lot more. Perhaps the most intriguing usages are for clustering or regression without response: that is, when the Y feature is absent. In short, unsupervised regression! It is obvious to see what I mean by looking at the video. Here each frame represents a different training set with its own model fitting. The method also comes with prediction intervals, despite the absence of probability distributions or statistical theory.

Source: here.

4. Approximate Nearest Neighbor Search

Fast approximate vector search is a core component of most LLM/GPT apps, to find prompt-derived embeddings similar to existing ones stored in backend embedding tables built on crawled data. My xLLM system uses key-value rather than vector databases and variable-length embeddings (VLE) rather than fixed size, but the nearest neighbor search applies to both architectures, and in many more contexts.

Source: here.

5. Synthetic Universe

What would happen if some stars had a negative mass, or the law of gravity was repulsive rather than attractive, or a combination of both? This is an example of agent-based modeling, one of the GenAI techniques to simulate the evolution of complex systems. Blue stars have a positive mass; red ones have a negative mass. Depending on the parameters, some stars may collide. Actually, I used the technique to generate synthetic collision graphs. The video below is the only example where the Python code crashed when producing the last frame. If you watch till the end, it indeed fells like to whole thing is about to blow up violently, creating a singularity.

Source: here.

6. GPU Classification: The Father of Neural Networks

These days, GPUs are used to train neural networks that have nothing to do with images or videos. Yet they were initially built to accelerate image processing and video games. Back to the original usage, the classification method in this data animation does the opposite: turning the training set (tabular data) into an image bitmap, perform the fuzzy classification as a bitmap transform in GPU, then turn the last frame back into tabular data. And voila! You performed classification in GPU. Ironically, without neural networks, just using a high-pass image filter.

Well, you may argue that it is a neural network in disguise, indeed one of the first use cases. A frame is just a deep layer. If the filtering window is very small as in the video, the neural network is very sparse and very deep with hundreds of hidden layers. If the filtering window is very large, one or two layers will do the job, and boundaries will be smoother. I won’t share my opinion on whether or not this is a neural network. Clearly, the computations and architecture are nearly identical.

The first frame is the original training set transformed into a bitmap. Black zones are regions unclassified yet. After a while, the whole feature space is classified, with relatively stable group boundaries: in short, we observe stochastic convergence.

Source: here.

7. AI Art

I created a large collection of images and videos, as well as soundtracks and 3D videos, arising from number theory and chaotic dynamical systems. They are governed by a rather large number of parameters. You can classify these images in a number of groups and sub-groups. With experience, you know what kind of image a specific set of parameters with produce. This project is slowly turning into GenAI abstract art.

Source: here.

For more video, watch my YouTube channel, here.

Author

Towards Better GenAI: 5 Major Issues, and How to Fix Them

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com and GenAItechLab.com, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Follow Vincent on LinkedIn.