SurvAIval Insights
Posts
AI In Plain English, For Normal People. PART 4.b Of 7

AI In Plain English, For Normal People. PART 4.b Of 7

LLMs: The magic is in the parameters

Rafa TV
January 13, 2024

We continue with the best guide to understand AI and don't get behind in the future!

Last week, we talked about the first stage of the creation of LLMs, or the training of the base model. In this one, we will complete it with:

LLMs: The first stage and the magic is in the parameters

And as always, at the bottom, you have a selection of the news of the week to stay updated and spark your curiosity, as well as cool tools and lessons that can enhance your life.

That’s damn right!

(approximate reading time: 5 minutes)

This week, we will continue talking about the first stage of the creation of our friend, the LLM, and how these parameters are created from the corpus text.

The magic is in the parameters.

What goes on inside the neural network to create this magic?

The lines of code (neural network architecture) are algorithmically understood by the creators. We know the structure in paper, and we know how to adjust and optimise these parameters over time to make the Neural Network (NN) better as a whole in next-word prediction. We don't know how these parameters collaborate between them to perform that.

The knowledge database is strange and weird. For example, it cannot be accessed in multi-dimentional ways, only in one direction. The model answers this question correctly: Who is Tom Cruise's mother? Mary Lee Pfeiffer. Good. But if we ask the following: Who is Mary Lee Pfeiffer's son? The model won't get a correct answer.

What this means is that the creators of these models don't know why Deep Neural Networks Work. Mind blowing.

To get an idea of how NNs work, let's review some concepts.

Language modeling: "I have some context; I will predict what comes next."

The NN is trying to predict the next word in a sequence of words. Input: The cat is on the... The input enters the model, and the NN understands that in the given context, the next word would be... with a high % of probability.

What happens is that predicting the next world forces the LLM to learn A LOT about the world inside the parameters of the NN (in our case, about the cat), besides understanding how to generate the language. The LLM has to learn about the world of the cat. What does it look like? What does it like and don't? What forces him to do something or not, etc.

The model doesn't parrot the exact extracts we have inside the corpus text in the NN; it creates them. It can use similar phrases as the one given in the corpus data and remove the parts it doesn't need to fill it with the ones that serve its goal.

The LLM knows the information, the gestalt (an organized whole that is perceived as more than the sum of its parts), or the main idea of what it is talking about. It uses compressed knowledge to predict what comes next to complete the phrase with a high % of probability. When this imperfect magic is not accurate or doesn't work, people call it hallucinating.

We can only measure if it works or not, and in what probability. Scientists will need more time to understand what is going on inside these models. In the meantime, they could be developing intellectual superpowers, and we don't even know it.

Large language models are still at the point of what might be called alchemy, and companies are running to build them without a real, a priori sense of what the right design is for the right problem.

Scary?

Ok, cool! We are done.

Nope!

If you put ChatGPT out there, it is not going to behave as we want it to. This magic NN by its own is not helpful; it is going to act as an internet document sampler, and that is not what we want from it.

We want it to write us a poem. X)

To make this model useful, we need the second stage of the process: fine-tuning.

See you next week!

Thank you for your time.

My News Picks

Google trains robots with new method using video and large language models.
Volkswagen is going to start putting ChatGPT in its cars starting in the middle of 2024
Apple Vision Pro Headset set to release February 2nd
A take on Cyborgs.
Rabbit R1 the new AI-powered gadget that can do almost anything
Perplexity AI video, a new AI-powered way to search the Internet. The app continues to grow, now valued at $520 M.

We are happy to announce that we've raised $73.6 million in Series B funding led by IVP with participation from NVIDIA, NEA, Bessemer, Elad Gil, Jeff Bezos, Nat Friedman, Databricks, Tobi Lutke, Guillermo Rauch, Naval Ravikant, Balaji Srinivasan.
— Perplexity (@perplexity_ai)
2:27 PM • Jan 4, 2024