AI Context: Making the Most Out of Your LLM Context Length

AI Context: Making the Most Out of Your LLM Context Length

As the influence of Large Language Models (LLMs) continues to expand across various industries, the task of selecting the most suitable LLM becomes a critical decision for companies. This choice is not just about the model's capabilities, but also about how well it aligns with their unique workflows and long-term objectives. One key factor that plays a pivotal role in this decision-making process is the LLM’s context length.

In this article, we delve into the concept of LLM context length, its significance, and the advantages and disadvantages of varying context lengths. Furthermore, we will explore how you can enhance the performance of your model by applying specific AI context in Pieces Copilot.

What is Context Length in LLMs?

Context length in Large Language Models (LLMs) refers to the maximum number of tokens that a model can process simultaneously. It's akin to the model's memory or attention span and is a predefined attribute in transformer-based models like ChatGPT and Llama.

Tokens are the model's method of encoding words into numerical representations, through a process called positional encoding. For instance, approximately 130 tokens represent 100 words. If a model encounters an unfamiliar word, it dissects the word into multiple tokens.

The context length of an LLM determines the maximum volume of information it can accept as input for a query. In simpler terms, a larger context length or LLM context window allows a user to input more information into a prompt to elicit a response.

While it's intuitive to consider LLM context length in terms of words, language models actually quantify content based on token length. Typically, a token corresponds to four characters in English or roughly ¾ of a word. Therefore, 100 tokens equate to about 75 words.

With that in mind, here are the context lengths of some of the most prominent LLMs.

Llama: 2K

Llama 2: 4K

GPT-3.5-turbo: 4K. However, GPT-3.5-16k has a context length of 16K.

GPT-4: 8K. Similarly, GPT-4-32k has a context window of up to 32K.

Mistral 7B: 8K

Palm-2: 8K

Gemini: 32k token context length​​.

Why is LLM Context Length Important?

Large context LLMs hold significant importance due to the following reasons:

  1. Input Complexity: It sets a limit on the number of words a language model can process, thereby controlling the complexity of the input.
  2. Summarization Capability: The length of an article a model can summarize is directly proportional to its context length. Longer context lengths allow for summarizing more extensive articles, leading to higher AI context awareness.
  3. Long-term Planning: Tasks that require long input sequences necessitate a larger context length.
  4. Richer Output: Longer and more complex inputs, facilitated by greater context lengths, tend to generate more detailed and richer output content.
  5. Memory and Coherence: In chat applications, the context length determines the extent of the previous conversation that the model can recall. Models like ChatGPT or Llama 2-Chat are stateless and do not inherently remember past interactions. They can reference past conversations only if they are included in the current input before being processed by the model.

The significance of AI context extends to various aspects of an LLM's functionality and effectiveness:

  1. Input Scope and Complexity: Increasing the context length enhances an LLM's capacity to manage more intricate and detailed inputs. This, in turn, influences its applicability and the extent of its use. For instance, a 4K context window, as seen in models like GPT 3.5 or Llama 2, is equivalent to six pages, while a 32K context length corresponds to 49 pages. The context length, therefore, plays a pivotal role in determining an LLM's suitability for tasks such as summarization, which are constrained by the context length.
  2. Coherence: Since LLMs lack inherent memory, the context length dictates the amount of previous input it can recall, thereby influencing the coherence and precision of the output.
  3. Accuracy: A larger context window increases the likelihood of the model generating a relevant response by enabling a more comprehensive understanding of the input.

In essence, context length is vital for comprehending and generating coherent and pertinent responses. However, extending the context length is not a simple task. It necessitates a deep understanding of the model's architecture and behavior, along with rigorous testing and modifications to ensure the model's ability to handle longer sequences.

Problems of Having Large Context Windows vs Using Retrieval Instead?

At first glance, a large context window might seem advantageous — simply feed the model with all your training data and let it do the heavy lifting. However, this "context stuffing" strategy often comes with drawbacks and underperforms expectations in the real world:

The quality of responses can deteriorate, and the risk of generating irrelevant or nonsensical outputs, known as "hallucinations," increases. Costs also rise linearly with larger contexts. Processing extensive contexts demands more computational resources, and since LLM providers charge per token, a long context (i.e., more tokens) makes each query pricier.

Studies indicate that LLMs yield superior results when provided with fewer, but more pertinent documents in the context, as opposed to a large volume of unfiltered documents.

This is where retrieval systems prove their worth. Retrieval systems have been developed and optimized over decades, and are specifically designed to extract relevant information on a large scale, doing so at a significantly reduced cost. Furthermore, the model parameters of these AI systems are explicitly adjustable, offering more flexibility and opportunities to optimize compared to LLMs. The strategy of using retrieval systems to supply context to LLMs is known as Retrieval Augmented Generation (RAG).

LLMs can struggle to discern valuable information when overwhelmed with vast amounts of unfiltered data. Employing a retrieval system to locate and supply concise, relevant information enhances the model's efficiency per token, leading to reduced resource usage, improved accuracy, and a more context-aware AI.

Using AI Context in Pieces Copilot

Pieces Copilot now has a more contextual AI than ever, thanks to Retrieval Augmented Generation. Your interactions within the Pieces Desktop Application and plugins continuously re-ground your personal AI engine, tailoring its responses to your unique requirements.

Moreover, Pieces Copilot empowers you to set AI context in real-time and ask context-specific questions. The contextual AI assistant responds based on the provided context, rather than relying solely on internet references. Even if your context lacks the necessary data, rest assured, Pieces Copilot has got it covered. You can enrich the response from the copilot by adding context from the folders, files, code snippets, websites, messages, and seamless IDE integration.

By applying AI context, you can:

  • Obtain comprehensive explanations and use cases of the provided code
  • Identify potential capabilities and limitations of the code
  • Ask questions specific to the provided code
  • Generate additional code snippets to tackle tasks using the applied context

To further personalize your experience, you can set specific contexts for each conversation, including personal repositories, code snippets, website URLs, and videos, for highly relevant results tailored to your unique project.

Asking questions specific to your code can often be beneficial. Therefore, you might want to provide additional information to the copilot before posing a question. With Pieces Copilot, you can set your context based on a specific set of files or a directory.

Let’s take a look at how you can set context in Pieces Copilot inside Pieces For Developers:

  1. Open the Copilot Chats view from the dropdown next to your search bar.
  2. Click "Set your Context" at the bottom of the copilot chat.
  3. Choose how you want to set your AI context:

  4. Directories: Select one or multiple directories from the file picker. Pieces will recursively use the files in these folders to answer your questions

  5. Files: Add individual code files for reference when asking your copilot questions
  6. Code Snippets: Utilize code snippets that you've previously created and saved to Pieces to assist you in asking questions about your code
  7. Websites: You can add websites for reference, and Pieces Copilot will scrape the related copy and code snippets from the page to add to the AI context recognition
  8. Message: Once you’ve started a conversation, you can add individual messages to context to avoid hallucinations with longer conversations

If you're interested in learning how to build your own copilot using Pieces OS SDK and add context to it, read the linked blog post and join the Discord hosted by Open Source by Pieces.

Setting Context in Different Developer Tools with Pieces Copilot

Our VS Code, Jetbrains, Obsidian, JupyterLab, Chrome, and other web extension integrations all offer the same copilot chat features, ensuring a consistently powerful experience wherever you need it.

Similar to the Desktop App, you can use files, folders, and snippets for context in AI conversations with your personalized copilot.

In some of our integrations like VS Code, you can utilize directives to quickly reuse materials as context. You can create your own custom creative, which allows you to define your own frequently used context sets for your questions and there are some default directives such as:

- @recent utilize the recently opened files as context

- @workspace utilize your current workspace as context

If the copilot used a file as relevant context, it will show it in the chat window, and you can click on it to view it.

Conclusion

In conclusion, understanding and effectively utilizing LLM context length is crucial in harnessing the full potential of Large Language Models. While a larger context length can initially seem advantageous, it's important to consider the trade-offs, such as increased computational costs and potential decrease in response quality.

Pieces Copilot, enhanced with Retrieval Augmented Generation, offers a solution that balances these factors. It allows for a more personalized and contextual AI coding experience, enabling you to set specific contexts for each conversation and ask context-specific questions. This approach not only improves the relevance and accuracy of the responses, but also optimizes resource usage.

Whether you're summarizing extensive documentation, asking questions specific to your code, or generating additional code snippets, the advanced AI context of Pieces Copilot can significantly enhance your productivity and coding experience. By continually refining and adapting to your unique requirements, Pieces Copilot exemplifies the power of AI and context, and their transformative impact on developer workflows.