8 min readJul 30, 2023

The Secrets of Large Language Models Parameters: How They Affect the Quality, Diversity, and Creativity of Generated Texts [With Examples]

The picture has a human hand interacting with a robot hand. The human hand represents the natural language and creativity of humans, while the robot hand represents the artificial intelligence and computation of Large Language Models. The picture shows how human and AI can work together to achieve amazing results. — Photo by Tara Winstead

Introduction:

Have you ever wondered how large language models (LLMs) can generate texts that sound like humans wrote them? How do they know which words to use, how to structure sentences, and how to convey meaning and emotion?

Well, Spoiler Alert: the answer is not magic, but mathematics.

LLMs are basically computer programs that use mathematical formulas and algorithms to analyze huge amounts of data and learn the patterns and rules of natural language.

They can then use this knowledge to produce new texts based on some input which we call “Prompt”. For example, if you give a LLM a sentence like “The sky is”, it can complete it with something like “blue” “cloudy” or “full of stars”.

But how does the LLM decide what word to choose? How can it make the text more interesting, diverse, and creative? How can it avoid repeating itself or making mistakes? Well, that’s where the LLM parameters come in.

What are LLM Parameters?

LLM Parameters are settings that you can adjust to control how the LLM generates texts. They can affect the quality, diversity, and creativity of the generated texts. Some of the common LLM parameters are temperature, number of tokens, top-p, presence penalty, and frequency penalty.

In this post, I will explain what these parameters are and how to use them effectively. I will also show you some examples of texts that can be generated by LLMs with different parameter values.

Temperature:

Temperature is a parameter that controls how much randomness is introduced in the text generation process. A higher temperature means more diversity and unpredictability, while a lower temperature means more coherence and consistency.

In other words, a higher temperature means that the model is more likely to generate tokens that are less likely to occur, which can lead to more creative and unexpected text. A lower temperature means that the model is more likely to generate tokens that are more likely to occur, which can lead to more coherent and consistent text.

Explaining the temperature large language model parameter with an example — Temperature Parameter Example

To illustrate this, let’s imagine a LLM generating some texts based on the same prompt: “The best way to learn programming is”. I will use different temperature values and see how the texts change. Here are some examples:

Temperature = 0.1: The best way to learn programming is to practice a lot and follow some online tutorials. This is a very coherent and consistent text, but also very boring and predictable.
Temperature = 0.5: The best way to learn programming is to experiment with different languages and frameworks. This is a more diverse and interesting text, but still somewhat reasonable and logical.
Temperature = 1.0: The best way to learn programming is to travel back in time and meet the inventors of programming languages. This is a very diverse and unpredictable text, but also very imaginative and adventurous.

As you can see, the temperature parameter can affect the quality, diversity, and creativity of the generated texts. You can use different temperature values depending on your purpose and preference.

When generating text, adjusting the temperature value can affect the style of the output. A lower temperature will result in more realistic and dependable text, while a higher temperature will produce more creative and humorous output. However, it's important to avoid extremes in temperature, as this can lead to nonsensical text.

The number of tokens:

The number of tokens is a parameter that controls how long the generated text is. A higher number of tokens means more detail and information, while a lower number of tokens means more conciseness and simplicity.

To illustrate this, let’s imagine a LLM generating some texts based on the same prompt: “What is artificial intelligence?”. I will use different “number of tokens” values and see how the texts change. Here are some examples:

Number of tokens = 10: What is artificial intelligence? It is a field of computer science. This is a very concise and simple text, but also very vague and incomplete.
Number of tokens = 50: What is artificial intelligence? It is a field of computer science that studies how to create machines and systems that can perform tasks that normally require human intelligence. This is a more detailed and informative text, but still somewhat general and broad.
Number of tokens = 100: What is artificial intelligence? It is a field of computer science that studies how to create machines and systems that can perform tasks that normally require human intelligence, such as reasoning, learning, planning, decision making, natural language processing, computer vision, speech recognition, etc. Artificial intelligence can be divided into subfields based on the goals, methods, or applications of the research. This is a very detailed and informative text, but also very specific and comprehensive.

As you can see, the “number of tokens” parameter can affect the length, detail, and information of the generated texts. You can use different “number of tokens” values depending on your purpose and preference.

A lower “number of tokens” value is suitable for generating brief and uncomplicated texts. Alternatively, a higher “number of tokens” value will produce longer and more detailed texts. However, it is important to ensure that the “number of tokens” value is not set too low or too high, as this could result in irrelevant or redundant texts.

Top-P:

Top-p is a parameter that controls how many words are considered candidates for the next word in the text generation process. A higher “Top-p” means more diversity and creativity, while a lower “Top-p” means more accuracy and reliability.

To illustrate this, let’s imagine a LLM generating some texts based on the same prompt: “The most important skill for a programmer is”. I will use different “Top-p” values and see how the texts change. Here are some examples:

Top-p = 0.1: The most important skill for a programmer is problem-solving. This is a very accurate and reliable text, but also very common and expected.
Top-p = 0.5: The most important skill for a programmer is communication. This is a more diverse and interesting text, but still somewhat reasonable and logical.
Top-p = 0.9: The most important skill for a programmer is telepathy. This is a very diverse and creative text, but also very absurd and illogical.

As you can see, the “Top-p” parameter can affect the diversity, creativity, and accuracy of the generated texts. You can use different “Top-p” values depending on your purpose and preference.

If you aim to create consistent texts, it's recommended to use a lower “Top-p” value. However, if you want to generate creative and humorous content, a higher “Top-p” value may be more suitable. Finding a balance and avoiding extremes in the “Top-p” value is important, as this can affect the quality and diversity of the generated texts.

Presence Penalty:

Presence penalty is a parameter that controls how much the generated text reflects the presence of certain words or phrases in the output text so far. A higher “presence penalty” encourages the model to explore new topics and makes it less repetitive, while a lower “presence penalty” means more repetition and less exploration.

To illustrate this, let’s imagine a LLM generating some texts based on the same prompt: “The best sport is”. I will use different presence penalty values and see how the texts change. Here are some examples:

Presence penalty = 0.0: The best sport is soccer, soccer, soccer. This is a very repetitive and emphatic text, but also very boring and monotonous.
Presence penalty = 0.5: The best sport is soccer, basketball, and tennis. This is a more varied and novel text, but still somewhat coherent and consistent.
Presence penalty = 1.0: The best sport is soccer, rugby, and chess. This is a very varied and novel text, but also very diverse and unpredictable.

As we can see, the “presence penalty” parameter penalizes the model when reusing tokens that have appeared in the output text so far, regardless of whether they are in the prompt or not. You can use different presence penalty values depending on your purpose and preference.

If you're looking to create repetitive and abstract text, consider using a lower “presence penalty” value. Conversely, if you want more variety and exploration in your generated text, a higher presence “presence penalty” may be more appropriate. However, it's important to find the right balance and avoid setting the penalty too high or low, as this can lead to irrelevant or vague text output.

Frequency Penalty:

Frequency penalty works the same way the Presence penalty works but it includes tokens present in the prompt when applying the penalty.

It scales based on how many times that token has appeared in the text so far including the prompt. So, a token that has already appeared more times gets a higher penalty and then this token has less probability of appearing.

To illustrate this, let’s imagine a LLM generating some texts based on the same prompt: “The best sport is soccer”. I will use different “frequency penalty” values and see how the texts change. Here are some examples:

Frequency penalty = 0.0: The best sport is soccer, soccer is fun, and soccer is exciting. This is a very repetitive and emphatic text, but also very boring and monotonous.
Frequency penalty = 0.5: The best sport is soccer, soccer is fun and exciting. This is a more varied and novel text with less repetitions, but still somewhat coherent and consistent.
Frequency penalty = 1.0: The best sport is soccer, football is fun and exciting. This is a very varied and novel text, but also very diverse and unpredictable, notice that it didn’t repeat the word “soccer” present in the prompt.

As you can see, the “frequency penalty” parameter can affect the variation, novelty, and repetition of the generated texts. You can use different “frequency penalty” values depending on your purpose and preference.

If you're looking to create persuasive and impactful text, try a lower “frequency penalty” value. If you want more unique and varied text, use a higher “frequency penalty” value. However, it's important to find a balance and avoid setting the penalty too high or low, as this can result in text that lacks coherence or meaning.