Understanding the basic architecture of LLMs

Architecture of LLMs

In this article, we're going to discover the key components in general architecture that make up typical large language models.

Introduction to LLM Architecture:

Now, large language models will have things like the transformer neural networks at their very underlying pieces of it. Remember, these transformers are just a large number of parameters are attached to this transformer model and these LLMs are able to then understand and generate accurate responses because it's using this type of neural network at its core. In fact, what you're going to see is that these LLMs are referred to as foundation models and it's all it means is a foundation model is so large and impactful that it serves as its own class of artificial intelligence so that we can start experimenting with these types.

Transformer Model Structure:

It's just a way for us to classify these models that are being used today. Now, what exactly is a transformer model? We'll get to that in a little bit. But at its core, it's just the most common architecture for a large language model, and it consists of multiple layers in our neural network, such as an encoder and a decoder. And a transformer model will process this data using these encoders and decoders. Now, the training of our dataset comes from the idea that we have to use large textual data sets from sites like Wikipedia, GitHub, and basically just generally the Internet. These data sets consist of trillions of words and their quality is going to affect the large language model's performance. And the stage that these large language model engages in unsupervised learning, it can process the data sets fed to it without specific instructions.

Fine-Tuning and Prompt Tuning:

That's the idea behind it. There's also things known as fine-tuning. Well, if in our training set, really what we're trying to do is use an unlabeled dataset and we're just trying to understand whether the word right means correct or the opposite of left. The fine-tuning approach will go one step further, and this is where we want to use this kind of approach. For a large language model to perform a specific task, such as translation, you really need to be fine-tuning it to that particular activity so it goes beyond the training step and gets to the idea of how do I tune this for these types of tasks that I'm interested in.

It could be translation, it could be anything else, but fine-tuning will optimize the performance of your specific tasks that you're trying to use the LLM for. There's also known as prompt tuning. It's similar to fine-tuning, in that, it's going to optimize the performance, but really what we're trying to do here is we're just trying to figure out how to get the model to perform a specific task through what's known as few-shot prompting or zero-shot prompting. As in the we want to keep the number of prompts that are required to generate this information as low as possible. And when I refer to a prompt here in this context, just to be clear, is if you're using GPT-4, for example, and/or any other large language model, there'll be a place for you to enter your text or ask your question.

Neural Network Layers in LLMs:

This is what we refer to as a prompt in our inference step in our large language model. So, what are the pieces? Well, we said that the neural network layers in the LLMs are just transformers or transformal neural networks. Well, there are certain pieces to these neural networks. One of them is known as recurrent layers. There's also things like feedforward layers, embeddings, and attention layers. Let's go into each of these in just a few details. Let's start with a simple one known as the embedding layer. The embedding layer will create embeddings from the input text. What this is, is a vector embedding or an embedding is a way to convert words and sentences, and other data into numbers that capture their meaning and relationships. And what it allow us to do is build the context on these words.

Now, what's interesting to know is that the large language model is good at generating text, but what's actually happening underneath the hood is that we're assigning numbers or embeddings or tokens to those words that we're providing to the large language model. And it's actually very good in understanding relationships between numbers, not so much the words. But from there, it can actually build context and you can do things like sentiment analysis or translation or really anything else. There's things like the feedforward network. Now, this is a type of a large language model that's used of multiple fully connected layers that are going to transform everything into input embeddings. In fact, what you'll see is that they're going to enable the understanding of high-level abstractions of our data.

So, this allows us to understand the user's intent with the text input by using these feedforward layers. There's also things known as the recurrent layer. Now this is just going to interpret the words in the input text sequence, but at its core, what we're trying to do is capture the relationship in the sentences between the words. So, the attention mechanism is the one that's actually very interesting. The attention mechanism will enable a large language model to focus on a single part of the input text that is relevant to the task at hand. This will allow us to ensure top accuracy and it's going to give us the most accurate outputs because it's only looking at the things that it's concerned with for the task that you gave it.

Conclusion:

Now this is all of the pieces to a basic transformer, but it's also the pieces to our large language models and there are going to be different architectures that are going to be coming out as time goes on. But the underlying piece of the engine of it is going to be these main layers that we've just discussed about.

W3google