Large Language Models(LLMs) — An Overview

JATIN SALVE
7 min readApr 9, 2023

--

A large language model is a type of artificial intelligence (AI) model designed to process and generate human language. These models are based on deep learning neural networks and are trained on massive amounts of text data.

The goal of a large language model is to understand and generate natural language in a way that is similar to how humans use language. This means that they are capable of understanding the nuances of language, including grammar, syntax, context, and semantics, and can generate human-like responses to questions or prompts.

Encoder and decoder are the two components of the Transformer architecture. Encoder and decoder are essentially the same (with a few differences). Additional information is provided in the Transformer primer. Moreover, see Autoregressive vs. Autoencoder Models for the advantages and disadvantages of the encoder and decoder stack.

Examples of Large Language Models —

There are several large language models in use today, with some of the most well-known being GPT-3, BERT, and T5. These models have significantly advanced the capabilities of natural language processing and have numerous practical applications in industries such as healthcare, finance, and customer service.

GPT-3 (Generative Pre-trained Transformer 3) is currently one of the most powerful and widely used large language models. It has been trained on a massive amount of text data and can generate highly coherent and contextually appropriate responses to a wide variety of prompts. GPT-3 has been used for a range of applications, including language translation, text generation, and chatbots.

BERT (Bidirectional Encoder Representations from Transformers) is another powerful large language model. It is specifically designed to understand the context of language and can generate highly accurate responses to a range of prompts. BERT has been used for applications such as sentiment analysis, language translation, and chatbots.

T5 (Text-to-Text Transfer Transformer) is a large language model that is specifically designed for text-to-text tasks such as language translation, summarization, and question-answering. T5 is capable of generating highly accurate responses to a wide range of prompts and has been used for applications such as language translation and summarization.

How LLMs work?

Large language models (LLMs) work by utilizing deep learning neural networks that are trained on vast amounts of text data. These models are typically based on the transformer architecture, which was first introduced in a 2017 paper by Google researchers.

The transformer architecture is specifically designed for natural language processing tasks and consists of several layers of attention mechanisms. Attention mechanisms are a way for the model to focus on specific parts of the input data and give more weight to certain words or phrases based on their relevance to the task at hand.

During training, the LLM is presented with large amounts of text data and learns to recognize patterns and relationships between words and phrases. The model is then fine-tuned on specific tasks, such as language translation or text classification, to further improve its performance.

Once the LLM has been trained, it can be used for a wide range of natural language processing tasks. For example, to generate text in response to a prompt, the model is first given the prompt as input. It then uses its knowledge of language and the attention mechanisms to generate a response that is contextually appropriate and grammatically correct.

One of the key advantages of LLMs is their ability to generate human-like responses to a wide range of prompts. This is made possible by the massive amounts of training data that the models are trained on, which enables them to recognize and learn from the nuances of human language.

In summary, LLMs work by utilizing deep learning neural networks that are trained on vast amounts of text data. The models use attention mechanisms to focus on specific parts of the input data and generate responses that are contextually appropriate and grammatically correct. The result is a powerful tool for natural language processing that has numerous practical applications across a wide range of industries.

Similarity Computation

Similarity computation is the process of measuring how similar or related two objects or data points are to each other. This is often used in machine learning and data analysis to compare items based on their features or characteristics.

There are several different methods for similarity computation, depending on the type of data being compared and the specific task at hand.

  1. Cosine similarity

Cosine similarity is a measure of similarity between two vectors in a multi-dimensional space. It is commonly used in natural language processing (NLP) to compare the similarity between two pieces of text.

In NLP, the vectors are typically created based on the frequency of occurrence of words in a piece of text. Each word in the text is assigned a weight based on its frequency, and these weights are used to represent the text as a vector. For example, consider the following two sentences:

  • The cat sat on the mat.
  • The dog lay on the rug.

To represent these sentences as vectors, we first create a list of all the unique words in both sentences:

  • The
  • cat
  • sat
  • on
  • mat
  • dog
  • lay
  • rug

Next, we assign a weight to each word based on its frequency in each sentence. For example, the vector representation of the first sentence would be:

[1, 1, 1, 1, 1, 0, 0, 0]

This indicates that the word “the” appears once, as do the words “cat”, “sat”, “on”, and “mat”. The remaining words do not appear in the sentence, so their weights are set to 0. The vector representation of the second sentence would be:

[1, 0, 0, 1, 0, 1, 1, 1]

This indicates that the word “the” appears once, as do the words “on”, “dog”, “lay”, and “rug”. The remaining words do not appear in the sentence, so their weights are set to 0.

Once we have the vector representations of the two sentences, we can calculate their cosine similarity. Cosine similarity is calculated as the cosine of the angle between the two vectors, and is defined as:

Cos(x, y) = x . y / ||x|| * ||y||

cosine_similarity = dot_product(vector1, vector2) / (magnitude(vector1) * magnitude(vector2))

where “dot_product” is the dot product of the two vectors, and “magnitude” is the magnitude of a vector. In other words, cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space.

In the case of the two example sentences, the cosine similarity is calculated as:

cosine_similarity = (11 + 01 + 01 + 11 + 01 + 00 + 00 + 00) / (sqrt(5) * sqrt(4)) = 0.4472

This indicates that the two sentences are relatively dissimilar, as their cosine similarity score is less than 1. A score of 1 indicates that the two vectors are identical, while a score of 0 indicates that they are completely dissimilar.

Overall, cosine similarity is a widely used method for measuring the similarity between two pieces of text in NLP. It is a simple yet powerful technique that can be used to compare large volumes of text data and find patterns and relationships between words and documents.

Reasoning

  • Let’s move over to talk about how reasoning works in LLMs; we will define reasoning as the “ability to make inferences using evidence and logic.” (source)
  • There are a multitude of varieties of reasoning, such as commonsense reasoning or mathematical reasoning.
  • Similarly, there are a variety of methods to elicit reasoning from the model, one of them being prompting which can be found here.
  • It’s important to note that the extent of how much reasoning an LM uses in order to give its final prediction is still unknown.

Performance Enhancement

One of the best way to enhance the Performance is that providing the external information.

  • In most recent research and release of new chatbots, it’s been shown that they are capable of leveraging knowledge and information that is not necessarily in its weights.
  • There are several ways we can accomplish this, first of those being leveraging another neural network or LM by iteratively calling it to extract information needed.
  • In the image, we get a glimpse into how iteratively calling LM works:

LLMs Summary

  • The following table (source) offers a summary of large language models, including original release date, largest model size, and whether the weights are fully open source to the public:

Conclusion

Large language models are a powerful tool for processing and generating human language. They have numerous practical applications across a wide range of industries, and their capabilities are continually improving. As the field of natural language processing continues to evolve, large language models are likely to play an increasingly important role in shaping the way we interact with language in the digital world.

--

--