How to Build a Copilot Using Local Large Language Models with Pieces Client

How to Build a Copilot Using Local Large Language Models with Pieces Client

Open Source by Pieces: Build Your Own Copilot.

In this article, we’ll walk you through how to change between different Large Language Models (LLMs) that are in the cloud and how to properly download Local Large Language Models and use them entirely on-device, all through the Pieces OS Client. At the end of this series, you’ll learn how to build a copilot using Open Source by Pieces.

You can download the Pieces Vanilla Typescript Project to take a look at the example code.

Prerequisites

We suggest reading the first article in this series if you haven’t already, and you will need to install Pieces OS.

Understanding the Models

When chatting with generative AI tools like ChatGPT or Pieces Copilot, these tools run on a specific large language model (LLM)— it could be Mistral, Llama2, or any of the other models that are introduced to the market nearly daily. Each model brings a new perspective to the LLM environment. With Pieces OS Client, you gain access to many of these models and can switch between them as you want or need to as you move between tasks in your day.

NOTE: If you plan on cloning the Typescript Project, we recommend you starting with a clean slate and deleting any Local LLMs (LLLM) you may have downloaded in your Pieces Suite. If you do not, you will notice that some of the radio buttons on the Example Copilot page are inactive, and will provide buttons where you can delete each of the models as soon as you enter the application. We currently only support local instances of the Llama2 7B parameter models, but we plan to add support for Mistral and other models soon.

Adding Radio Buttons for Model Selection

If we pick up from the previous article in this series, we have our copilot chats added and can get an entire copilot conversation back in the chat. Next, we will start to add visual radio buttons so you can simply select each model. Here are the models that we will be working with:

  • Llama2 7B CPU (local)
  • Llama2 7B GPU (local)
  • GPT 3.5 (cloud)
  • GPT 4 (cloud)

You'll notice that these models are either hosted in the cloud or hosted locally on your machine. We can use each of these models in the same location, all by pressing one button. There is one additional step to download the model itself, but we will get to that further in this article.

First, we will add in the radio buttons. You can take the below code snippet and use to add radio buttons into the html of your project if needed, and it is also used as the controls for changing between LLLMs and the Cloud models provided:



llama2-7b-cpu


gpt-4


gpt-35


llama2-7b-gpu

Save this Snippet

You can see that the form above contains all of the radios, followed by the models’ downloads containers, which hold each of the buttons that are connected to downloadable chat models.

Each of these buttons will represent a single selectable model. As you adjust these values, each model that is selected is set to the active model, EXCEPT for when we first load the page. On first load, we default to the GPT 3.5 model and use it to perform our first request. When you add in these inputs, they are each connected to model download logic that lives in the new ModelProgressController.ts file.

This is where all of the magic lives! The Pieces endpoints handle the heavy lifting with our LLLMs and Cloud LLMs, which will allow us to build and functionally switch between them quickly. Why build your own copilot if it’s only powered by a single LLM?

Setting up the Models for Selection

In the modelsProgressController.ts, we first set a public variable to store our list of models. We can iterate through this list later, as we get the proper enum and value for each model that is returned from the Pieces.ModelsApi:

// first intialize the value here:
public models: Promise;

// then get its value inside of the constructor.
private constructor() {

// can access the model snapshot here and set it to the variable that was just created above.
this.models = new Pieces.ModelsApi().modelsSnapshot();

this.models.then((models) => {
this.initSockets(
// then you can use filter to set the initial value for the models download.
models.iterable.filter(
(el) =>
el.foundation === Pieces.ModelFoundationEnum.Llama27B &&
el.unique !== 'llama-2-7b-chat.ggmlv3.q4_K_M'
)
);
});
}

Save this Snippet

Using the Pieces.ModelsApi().modelsSnapshot(), you can get a list of all the available models. Then, the el.foundation === Pieces.ModelFoundationEnum.Llaama27B is used to match on the iterable list. This is the first time that this enum has been used thus far, but it certainly won’t be the last. Use that in combination with a few variables to start and create our onClick() functions in src/index.ts.

Moving over to the main() function, we can use .getInstance() to retrieve and set our modelsProgressController and our list of models, then create variables to store each model's value. We also set the initial value of the model that is selected to GPT 3.5:

async function main() {
// copilot stream controller.
CopilotStreamController.getInstance();

// get the values we have stored on the controller.
const modelProgressController = ModelProgressController.getInstance();
const models = await modelProgressController.models;

// set all model values that we are going to use in this example.
const gpt35 = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Gpt35 && !model.name.includes('16k'))!;
const gpt4 = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Gpt4)!;
const llama27bcpu = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Llama27B && model.cpu)!;
const llama27bgpu = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Llama27B && !model.cpu)!;

// set your model id here for gpt-3.5
CopilotStreamController.selectedModelId = gpt35.id;
...

Save this Snippet

Then we can start to create each button using a similar pattern. First, we create the button element, setting its onClick function to set the appropriate CopilotStreamController.selectedModelId as each dial is selected:

...
const gpt35Button: HTMLElement | null = document.getElementById('gpt-35-radio') as HTMLInputElement | null;
if (!gpt35Button) throw new Error('expected id gpt35Button');
gpt35Button.onclick = () => {
CopilotStreamController.selectedModelId = gpt35.id;
}
...

Save this Snippet

The selectedModelId is used when any message is sent from the input of the copilot chat and passed in as the model parameter on Pieces.QGPTStreamInput.question.model:

// in askQGPT in CopilotStreamController.ts const input: Pieces.QGPTStreamInput = { question: { query, relevant: {iterable: []}, // the updated parameter here: model: CopilotStreamController.selectedModelId }, };

Save this Snippet

Then we need to repeat this for each of the radio dials that we are creating. We’ll ensure that each button is present and throw an error if it is not found with if (!gpt35Button) throw new Error('expected id gpt35Button). This gets more interesting inside of the Llama2 7B CPU and Llama2 7B GPU models since we will download those files locally. Remember since you have Pieces OS downloaded and it is being used as the database and storage location for the models, it is handling storing and providing access to the model via these endpoints.

Once each of the radio buttons is created, the end result will show 4 radio dials that are set up for swapping between each of the models and setting the proper value each time it is selected:

The model selection module.

Downloading the Llama2 7B CPU Model

Now we can create the logic for downloading the CPU model when we click a button below the radio dials. This will detect if the model is already downloaded and give the appropriate option to delete it if it has already been downloaded.

Below where the llama27bcpuButton is created, we can add a check to see if the CPU model is present in the Client. Because we create the buttons with JavaScript, we can conditionally add the 'Download Llama2 7B CPU' text to the button only when the model is not downloaded. When it is downloaded, we can provide the option to delete.

...
// this checks the model value on the models.itereable to check if downloaded.
if (!llama27bcpu?.downloaded) {
// if the model is not downloaded, then we cannot select the radio button.
llama27bcpuButton.setAttribute('disabled', 'true');

// create the container for the download button.
const downloadLLama27bcpuContainer = document.createElement('div');
modelDownloadsContainer.appendChild(downloadLLama27bcpuContainer);

// create the button and set the appropriate text.
const downloadLlama27bcpuButton = document.createElement('button');
downloadLlama27bcpuButton.innerText = 'Download Llama2 7B CPU'
downloadLLama27bcpuContainer.appendChild(downloadLlama27bcpuButton);
...

Save this Snippet

After we create the buttons, we can set the specific model that we want to download when we click that button. We can then use the ModelApi().modelSpecificModelDownload() and pass in the corresponding .id value on the Pieces.Model value stored on Llama2bcpu:

...

downloadLlama27bcpuButton.onclick = (e) => {
new ModelApi().modelSpecificModelDownload({model: llama27bcpu.id}).then(console.log).catch(console.error)
}
...

Save this Snippet

Showing Download Progress

We also want to show our download progress. Let’s create a unique .id value for the const llama27bgpuDownloadProgress that we created:

...

const llama27bgpuDownloadProgress = document.createElement('div');
downloadLLama27bgpuContainer.appendChild(llama27bgpuDownloadProgress);

// creates the ID we need here using the unique value.
llama27bgpuDownloadProgress.id = `download-progress-${llama27bgpu.id}`

...

Save this Snippet

Over in the WebSocket found in ModelProgressController, you can see how we use the model.id to connect and set up the appropriate WebSocket the corresponding model locations as the download emits values. The event data comes back, the download progress is selected based on the progress ID that we created above, and we can use the values found on the model based on Pieces.Model.name.

Then, using the event data that is emitted—which will either be the event.percentage or event.status depending on if the model download has/has not started—we will share the percentage numbers. You will get a number back for each percentage value here. We use all that to set the downloadProgressElements.innerText value.

Here is all of that together:

private connect(model: Model) {
// setup the appropriate web socket.
const ws: WebSocket = new WebSocket(
`ws://localhost:${1000}/model/${model.id}/download/progress`
);

this.sockets[model.id] = ws;

ws.onmessage = (evt) => {
const event = Pieces.ModelDownloadProgressFromJSON(JSON.parse(evt.data));
const downloadProgressElement = document.getElementById(`download-progress-${model.id}`)
if (!downloadProgressElement) return;

// setting the inner text of the element.
downloadProgressElement.innerText = `${model.name} download progress: ${event.percentage ?? event.status}` + (event.percentage ? '%' : '');

...

Save this Snippet

Showing Individual ModelDownload Progress or Status

Now we can set the appropriate status message based on the event.status that is returned. We can compare it to the ModelsDownloadProgressStatusEnum.Initialized value, which would indicate if the button has been pressed on the page but the download has not officially begun yet.

The model download itself can be canceled once it has begun, so once the status is initialized, we will create and append the cancel download button. This button uses the ModelApi().modelSpecificDownloadCancel({ model: model.id }) inside of its onClick function.

If event.status is either ModelsDownloadProgressStatusEnum.Failed or ModelsDownloadProgressStatusEnum.Unknown, we want to remove the button that allows for the model to be canceled. If the event status is ModelDownloadProgressStatusEnum.Completed, then we can refresh the page to ensure that the new download is added and the radio dials are in sync:

...
if (event.status === ModelDownloadProgressStatusEnum.Initialized) {
// creates the cancel button and sets its ID.
const cancelDownloadButton = document.createElement('button');
cancelDownloadButton.id = `cancel-download-button-${model.id}`
downloadProgressElement.insertAdjacentElement('afterend', cancelDownloadButton);
cancelDownloadButton.innerText = `Cancel ${model.name} download`;

// ModelApi().modelSpecificModelDownloadCancel() <-- how to canel a models download
cancelDownloadButton.onclick = () => {
new ModelApi().modelSpecificModelDownloadCancel({model: model.id});
}
} else if (event.status === ModelDownloadProgressStatusEnum.Failed || event.status === ModelDownloadProgressStatusEnum.Unknown || event.status === ModelDownloadProgressStatusEnum.Completed) {
document.getElementById(`cancel-download-button-${model.id}`)?.remove();
}

if (event.status === ModelDownloadProgressStatusEnum.Completed)
window.location.reload();
}
...

Save this Snippet

The result is something like this:

Download progress on a LLLM.

Deleting a Downloaded Model

Now, if a model is already downloaded, we want to make it easy to remove the model. This is useful if we don’t need the model anymore, or if we want to re-download the entire thing. We’ll set up the delete function using the ModelsApi().modelsDeleteSpecificModelCache() with the model ID for each corresponding model back in index.ts to remove it from its downloaded location.

Remember that we are using the llama27bcpu model as an example and that examples for each CPU and GPU model are included in the full repo.

...

else {
const deleteLlama27bgpuButton = document.createElement('button');
modelDownloadsContainer.appendChild(deleteLlama27bgpuButton);
deleteLlama27bgpuButton.innerText = 'Delete Llama27b GPU';

// onclick we call modelsDeleteSpecificModelsCache()
deleteLlama27bgpuButton.onclick = () => {
new ModelsApi().modelsDeleteSpecificModelCache({model: llama27bgpu.id, modelDeleteCacheInput: {}}).then(() => {
window.location.reload()
})
}
}

Save this Snippet

The model should then be deleted, and the buttons and radio dials will return to normal upon page refresh. This function is built into the deleteLlama27bgpuButton.onclick function above. Your result will look like this:

The Delete Llama27b CPU button.

Upon pressing delete, the Llama2 Model should be deleted and the buttons will return to their original state, allowing you to download the models again!

Ready to Build Your Own Copilot?

This guide is a great way to introduce yourself to downloading LLLMs and an easy-to-use environment to get them set up on your machine, observe how they work, and understand how simple it is to observe their status with Pieces OS Client. This way you can build your own copilot for your application, project, or other solutions you need throughout your workflow.

To get the entire project, you can visit the repo and clone it to get started. All of the above code can be found there and used as copy-and-paste examples for whatever use case you may have.

The next article in this series on how to build a copilot will cover setting up context in each conversation to show the different ways you can combine specific models with particular contexts to get more pointed and useful responses. Keep an eye out for that article soon.

If you are interested in contributing to Open Source by Pieces, join our community on Discord or check out our OpenSource repository on GitHub for projects and other open-source initiatives. If you haven't checked out the Pieces for Developers Desktop App and Pieces OS before now, go learn about all of the functionality that is available to help you power your workflow and enhance your work as a developer!