Let's understand basic to intermediate aspects of Large Language Models

Large Language Models

In this article, we're going to discuss large language models and explain how they are used in the world of AI.

What is LLM and how does it work?

So, let's start with a brief definition of what large language models are. So, a large language model is just a type of artificial intelligence algorithm that uses deep learning techniques and massively large data sets to understand, summarize, and even predict new content. Now what you'll see here with the generative AI is that it's closely connected with things like LLMs. Now all that you can think of is an LLM is just a type of generative AI that's specifically architected to help generate text content for us. As a result, these LLMs use a deep learning algorithm that can perform a variety of these natural language processing tasks.

This can include things like, well, words, grammar, and semantics are what the LLM is very good at understanding. It's able to look at the training set and identify keywords and how those words are used along with grammar but it even goes beyond that. It can learn the context of similar words, and they could have meanings in different circumstances. So, it can even understand what those words are and how to use them in a particular context of the input that you're providing. As a result, these large language models are really advanced with natural language processing. This is a common application where a user will input a query in a natural language such as English or French or whatever, just some text space and the model will be able to generate a result and provide it in the same language that you'd like.

Parameters and Training of LLMs:

Now, because LLMs are very advanced, they actually hold what's known as billion different parameters. It's really hard to tell exactly how many there are, but it's a very large amount of parameters that has to fit. Now, when I refer to parameters, I'm referring to a machine learning term for the variables that are present in the model, and you have to provide the training data so that the model can then select the right values for each of these parameters. So, if you think about it, you're going to require a large amount of data to be able to train all of those parameters and get the right value and tune them appropriately and this is what's going to be used. They're going to use these parameters to infer new content later on when you get to the inference step after your large language models have been trained.

Understanding some basic applications of LLM:

Now, there in LLMs, there are many different applications and use cases for it. Obviously, the first one would be text generation. It can generate text on any topic that the LLM has been trained on, and it's a primary use case for what we're seeing. Translation is another great tool for this. LLMs are trained on multiple languages. The ability to translate from one language to another is really common use case for this. In fact, if you look at tools like ChatGPT-4, it's actually trained on web pages from all over the Internet, and they could be written in many different languages, not just English. So, you can actually ask it to translate text for you to another language, and it's able to understand the context of the English language, and the context of the French language, and be able to translate it into whatever language you'd like.

I'd use French for that example. Be anything. Chatbots are another great tool. Conversational AI and chatbots LLMs can enable a conversation with many users and it's typically more natural than some of the older AI technologies that have been tried with this use case before. It's even goes beyond this. You can actually give it data, text data, and ask it to classify and categorize it. It's able to categorize any content that you provide for it. Summaries is a very important use case for this. What it can do is it can provide some text and just ask it in three sentences.

For example, take this paper that I gave for you and somebody wrote and summarized it for me in the main classification points and it's able to do just that. Another great tool for LLMs is sentiment analysis. Sentiment analysis is used a lot in marketing to determine if a customer is happy in exchange with a company or if they're writing a tweet about a product. You can sort of feed it a whole bunch of tweets and it could tell you exactly what is the general sentiment that it's able to get. Are they happy, excited, angry? All sorts of things would come from that kind of analysis. And these LLMs are great for doing so. Now, there are some advantages for using LLMs.

What are the advantages and limitations of it?

One of them is the fact that it's great performance. Modern LLMs are typically high-performing and they generate rapid, low-latency responses. Another thing is that it's powerful as in its accuracy. The number of parameters, and the volume of this trained data that you're using to grow an LLM, what you're going to be using is you're going to have to deliver increasing levels of accuracy, the more parameters you use and the more training data that you have, so it's scalable. Flexibility is another great one. One LLM can be used for many different tasks, and you can actually deploy it across organizations, users, and applications. And the ease of training is really important when it comes to these. It's a little different.

Many LLMs are trained on unlabeled data so you don't even have to actually use a supervised approach. You can use just an unlabeled dataset and it can help accelerate the training process. Finally, there's the extensibility and as well and the adaptability of these models. They can serve as a foundation for customized use cases. You can have additional training on top of an existing LLM, and you can create a finely-tuned model for a niche area of expertise or your organization's needs if you like. There are some disadvantages for each advantage that we have with LLM so it's important to consider these. One of them is known as the bias. So, risk with any highly trained AI on unlabeled data is bias. It's really not always clear what these models will do because we're using unlabeled data and we don't really know what that known bias has been removed, right, in our dataset.

So, it's important to go through and make sure that we're including the right information in our training set to try and remove the bias because the model itself will not be able to remove that. There are some costs associated with this. They're expensive. After the training and development period, the cost of even just keeping your LLM alive in the inference step or keeping it there to actually be used is very high for our host organization. So, it's important to outweigh the pros and cons of how much it's going to cost and what value you're going to get out of it. There's some complexity with this as well, billions of parameters you're going to see in these LLMs.

Modern LLMs are really complicated technologies and they are pretty complex to troubleshoot because of the size of these models. And hallucination is an important one. So, I use AI hallucination and people can sometimes think, well, what does that actually mean? Like, is it thinking of something? No, it has nothing to do with that. AI hallucination really just refers to inaccurate responses that is not based on trained data. It created something out of thin air to try and guess it. In the case of something like a large language model, it could be making up some facts to try and get its point across. It's not exactly true, so it's something to watch out for when it comes to these tools.

LLMs and its types:

Now, there are some common types of LLMs that I'd like to discuss. One of them is known as the generic or raw language models. What this means is that we're going to be predicting the next word based on the language and the training data, and you'll see that these models will perform information retrieval tasks primarily. Another thing is known as instruction-tuned language models, and these are trained to predict responses to the instructors and they're usually given in an input.

It allows them to reform things like sentiment analysis like we discussed earlier, or you can even generate text or even code. And then there's dialog-tuned language models. Now what this means is that we're going to be training them to have a dialog by predicting the next response. So, think of the chatbots or conversational AI. These are great dialog-tuned large language models. And then finally, there are some more common types that I'd like to discuss. One of them is known as the fine-tuned or domain-specific models. Say, for example, you have a very general model such as GPT-4, you can ask it just about anything you'd like.

That's a very broad, a large language model. But, for example, what if we wanted to have a fine-tuned or domain-specific model such as something for a medical professional to use and ask questions to troubleshoot problems that they're experiencing or to help diagnose something or a rare illness? This is where we'd use a fine-tuned or domain-specific large language model to help those people do that job. There's also something known as a zero-shot model. Now, this is a large generalized model, and it's exactly what you'd think. It's on a generic corpus of data and it's able to give a fairly accurate result on general use cases. GPT-4 is an example of a zero-shot model. Now we have a language representation model.

Now, an example of a language representational model is something coming from BERT. This is something that's developed by Google and it makes use of deep learning and transformers and it's used for natural language processing and able to transform languages from one language to the other. And then there's multimodal model. Now originally LLMs, if you think about it, are specifically tuned for text, but you could have a multimodal approach and it's possible to handle both text and images. And so, there are some language models such as GPT-4 that allows you to do just that. And so, those are just a brief overview of what large language models, when the types of large language models they are, and how they work underneath the hood.

Also read: Walking through some benefits and limitations of ChatGPT

W3google