How to Limit LLM Response Length with Max Tokens in Python

In this post, we'll briefly learn what max tokens means in the context of large language models, how it controls the length of generated responses, and how to set it effectively for different tasks in Python. The tutorial covers:

  1. What are Max Tokens?
  2. How Tokens are Counted
  3. Installation and Setup
  4. Setting Max Tokens for Short Responses
  5. Setting Max Tokens for Long Responses
  6. Detecting a Truncated Response
  7. Max Tokens for Structured Output Control
  8. Estimating Token Count Before Sending
  9. Choosing the Right Max Tokens Value
  10. Conclusion

Let's get started.

How to Use Top-P and Top-K Sampling in LLMs

In this post, we'll briefly learn what Top-K and Top-P sampling are, how they differ from temperature, and how to tune them to control the quality and diversity of LLM output in Python. The tutorial covers:

  1. What are Top-K and Top-P Sampling?
  2. How Top-K Sampling Works
  3. How Top-P Sampling Works
  4. Installation and Setup
  5. Effect of Top-K on Output
  6. Effect of Top-P on Output
  7. Comparing Top-K and Top-P Directly
  8. Combining Temperature, Top-K, and Top-P
  9. Choosing the Right Sampling Parameters
  10. Conclusion

Let's get started.

How to Control LLM Output Randomness with Temperature in Python

In this post, we'll briefly learn what temperature is in the context of large language models, how it controls the randomness of generated text, and how to set it correctly for different tasks in Python. The tutorial covers:

  1. What is Temperature?
  2. How Temperature Works
  3. Installation and Setup
  4. Comparing Temperature Values Side by Side
  5. Low Temperature for Factual and Structured Tasks
  6. High Temperature for Creative Tasks
  7. Temperature and Top-p Sampling
  8. Choosing the Right Temperature
  9. Conclusion

Let's get started.

How to Use System Prompts to Control LLM Behavior

In this post, we'll briefly learn what a system prompt is, why it is the most powerful lever for controlling LLM behaviour, and how to craft effective system prompts for a variety of real-world scenarios in Python. The tutorial covers:

  1. What is a System Prompt?
  2. How System Prompts Work
  3. Installation and Setup
  4. Setting Tone and Persona
  5. Constraining the Output Format
  6. Restricting the Topic Domain
  7. Controlling Response Length and Style
  8. Chaining System and Few-Shot Prompts
  9. Conclusion

Let's get started.

How to Run a Local LLM in Python with Ollama

In this post, we'll briefly learn what Ollama is, how to set it up, and how to run a local large language model (LLM) entirely on your own machine using Python. The tutorial covers:

  1. What is Ollama?
  2. Installation and Setup
  3. Pulling a Model
  4. Basic Chat Completion
  5. Streaming Responses
  6. Multi-turn Conversation
  7. Generating Embeddings
  8. Using the OpenAI-Compatible API
  9. Conclusion

Let's get started.

Semantic Text Similarity with LLM Embeddings in Python

In this post, we'll briefly learn what Semantic Text Similarity is, how LLM Embeddings enable it, and how to measure the semantic closeness between sentences in Python. The tutorial covers:

  1. What is Semantic Text Similarity?
  2. What are LLM Embeddings?
  3. Installation
  4. Loading an Embedding Model
  5. Cosine Similarity Between Two Sentences
  6. Ranking Sentences by Similarity
  7. Batch Similarity with a Query
  8. Similarity Heatmap for a Sentence Set
  9. Conclusion

Let's get started.