AI In Plain English, For Normal People. PART 4.a Of 7

LLMs: The first stage or the training of the base model

December 30, 2023   |   Read Online

AI In Plain English, For Normal People. PART 4.a Of 7

Happy New Year to all of you!

We continue with the best guide to understand AI and don't get behind in the future!

Last week, we talked about the AI buzz and its future. In this one, we will go through:

  • LLMs: The first stage or the training of the base model

And as always, at the bottom, you have a selection of the news of the week to stay updated and spark your curiosity, as well as cool tools and lessons that can enhance your life.

That’s damn right!

(approximate reading time: 5 minutes)

How do these models work?

We can divide the hole process into two stages or parts. This week we will start talking about the first one: the training of the base model.

Let's get a little bit more technical here, but not a lot, don’t worry.

The base model is composed of two files (and a computer to run them).

  • One that contains the text corpus or the large collection of text data used for training the model (this data includes a wide range of text from books, websites, articles, and other written sources),

  • and the other that contains the lines of code that run the neural network architecture that 'teaches' the model.

I told you this was going to be easy. =)

First Stage: The training of the base model or Generative Pre-Trained Transformer (GPT)

First of all, let me clarify what are Large Language Models (LLMs) in a general sense.

We can understand LLMs as the new computing paradigm.

Think of LLMs as OS (operating systems). There are Mac and Microsoft, and we have an open-source Linux structure that allows other OS to be created in that shared system of structure, readily available to the general public.

The major disagreement between Elon Musk and OpenAI, a company he helped start, is due to Musk’s initial desire for the company to be open source, similar to Linux. However, OpenAI has chosen a different direction.

ChatGPT is only one model in a hungry and competitive industry, and as in the OS world, some models are open source and others are closed source.

Closed Source LLMs:

  • ChatGPT from OpenAI

  • Bard from Google

  • Bing from Microsoft

  • Claude 2 from Anthropic (in collaboration with Amazon)

  • Where are you, Apple?...

Open-source LLMs:

  • LlaMa from Meta

  • and others.

Let's go back to the two files with the example of the model LlaMa-2 70b.

LlaMa-2 refers to a second iteration of the LlaMa model, and 70b refers to the number of parameters that the model is trained on. In this case, there are 70 billion parameters.

Wait, is there any difference between the text corpus and the parameters?

The text corpus provides the model with a diverse and extensive set of examples of human language (it is kind of a zip file or compression of the internet; hundreds of TB of text, encyclopedias, etc.). This allows the model to learn patterns, contexts, grammar, semantics, and various aspects of language from real-world text.

Essentially, the text corpus is the raw material from which the model learns about language.

The parameters are what the model "learns" as it processes the text corpus (relationships between words, the structure of sentences, the meaning conveyed in different contexts, etc.). Let's call them the rules and patterns that the robot learns and remembers, so it can understand and create language on its own.

In large models like GPT, there are billions of these parameters, and they collectively form the model's understanding of language. They are not the text itself, but rather the distilled knowledge and patterns extracted from the text.

GPT4 has 1 trillion parameters (100 billion words or text); to put it into context, the human brain has a hundred trillion parameters.

The more parameters, the more tasks the model can do, and the better it gets at doing stuff. The better the model gets at doing stuff, the more money it will provide. The more money we have, the more money we’ll invest in creating a bigger text corpus and better models, and BIM!; Welcome to the new scaling LLM capitalism circus.

To collect all this data, these companies use a cluster of 6,000 very advanced, high-power GPUs (not the ones you buy at Best Buy). This would represent an investment of $2 million, $10 million, or more if we speak about OpenAI or others. (GPT4 = $100 million)

And remember, where there’s money, there’s progress.

The interesting part in all of these is how the model learns from the corpus of text. The computational complexity comes in when we get those parameters.

Let’s leave the magic of the parameters for next week!

See you then!

Thank you for your time.

My News picks

 

Cool Tools

  • SunoAI is an AI-based music creation platform compared to Midjourney but for music, accessible to all.

  • Laterbase is an AI powered manager that simplifies saving and searching bookmarks. Bookmarks reimagined

  • Reface is a face-swap AI video generator. Complex AI technologies into easy-to-use products.

  • Polar Habits offers a unique, guilt-free approach to habit tracking

     

Educational

  • OpenCVUniversity offers a 100-day AI career challenge designed to jumpstart your AI career.

And that’s all. I hope these insights, news, and tools help you prepare for the future!

Have a really nice week.

Stay kind.

Rafa TV