Local Large Language Models (LLLMs) and Copilot Integrations

Local Large Language Models (LLLMs) and Copilot Integrations

Autopublished from RSS Original article

Local Large Language Models.

In our latest Ask Me Anything (AMA) session, we announced the AI integration of Pieces Copilot into a few of our most popular plugins, as well as our latest product release which debuted Local Large Language Models (LLLMs) support for code generation and question-answering in Pieces Copilot.

Our panel, consisting of Tsavo Knott (CEO), Cole Stark (Head of Growth), Caleb Anderson (Plugin Developer), Mark Widman (Head of Engineering), and Brian Lambert (Lead ML Developer) shared their insights on the latest features of Pieces Copilot, demonstrated its capabilities, shared extensive demos of our VS Code and Obsidian plugins, and answered questions from our community. The AMA also touched upon the future pricing model for Pieces.

The main focus of this livestream was on the local LLM support in Pieces Copilot, continuous re-grounding with Retrieval Augmented Generation (RAG), and the integration of our copilot into VS Code and Obsidian. This session is a must-watch for anyone interested in understanding how these advanced features can augment their existing workflow and enhance productivity.

If you missed the live event, don't worry. We've summarized the key takeaways, highlighted some of the most thought-provoking questions and answers, and even shared video snippets of the discussion to get you up to speed!

Topics Covered

  • Introduction (00:11) - Meet the team and get a brief outline of the session.
  • Unique Copilot Features (3:28) - Learn about the latest features of Pieces Copilot and how they can augment your existing workflow.
  • What Are LLLMs? (5:45) - Understand what Local Large Language Models are and their role in Pieces Copilot.
  • Pieces For Developers & Pieces OS (9:27) - Get an overview of Pieces for Developers and the state of the art operating system that powers it.
  • Right-Click Copilot Integrated & Continuous Regrounding (12:54) - Discover how the Copilot is integrated into the system and how it continuously re-grounds itself based on the context.
  • RAG & Copilot Demo on Desktop (15:15) - Watch a live demo of Retrieval Augmented Generation (RAG) and the Copilot in action on the desktop.
  • Workflow Activity & Proactive Saving (24:06) - Understand how Pieces Copilot handles workflow activity and proactive saving and the improvements made in the same.
  • Copilot in VS Code Demo & Value of Context Setting (26:34) - See the Copilot in action in VS Code and with the capability of Local LLM selection and learn about the importance of context setting.
  • Obsidian Demo & Future Copilot Capabilities (39:43) - Get a glimpse of the future capabilities of Copilot with an Obsidian demo highlighting the fully offline Pieces Copilot Contextualized by your Vault.
  • Support Highlights (50:55) - Learn about the support system in place for Pieces Copilot users.
  • What Makes Pieces Unique? (52:31) - Understand what sets Pieces apart from other developer productivity startups.
  • End of Obsidian Demo (54:16) - Wrap up and an overview of the Obsidian demo.
  • Do LLMs Support Multi-Language? (55:11) - Find out if LLMs support multiple languages.
  • Pro, Pro + Cloud & Teams Plans (57:21) - Learn about the different plans available for Pieces for Developers.
  • Upcoming Features & Closing Remarks (58:42) - Get a sneak peek into the upcoming features and wrap up the session.

So, get comfortable, prepare your favorite beverage, and let's dive into the key moments of our most recent livestream with the developer community.

Unique Copilot Features

The AMA discussion opened with the unique features of Pieces Copilot and how it can augment the workflows of both individual and enterprise developers.

Proactive and Easy Access

Pieces Copilot is designed to be proactive and provide easy access to the things you have saved. It's not just about helping you write code, but also about interacting with your workflow at large. We've made it easy to save things to Pieces and search for them, but we've also focused on how you interact with these materials beyond just generating code.

Understanding Your Team

One of the key features of our Copilot is its understanding of your team and what people work on. It's not just about understanding the code you're trying to write, but also what you're trying to build and the people you need to interact with.

Workflow Activity

Another coordinated point is when you are working on things. You've probably seen the Workflow Activity view inside of the Pieces app, but picking up where you left off and getting back into flow is a task that we're making the copilot available at. It can help you remember what you were doing yesterday, where you left off, and what are important things to solve today.

So What is Pieces Copilot?

In conclusion, the unique features of Pieces Copilot go beyond just code generation. It's about understanding your workflow, your team, and what you're trying to build. It's about capturing that context, accessing it, and enabling the copilot to serve that back to you in a natural conversational format.

Introduction to Local Large Language Models

We started to discuss the star of the AMA: Local Large Language Models (LLLMs) and their role in Pieces Copilot. A Language Model (LM) is a machine learning model that outputs a sequence of words given some input sequence.

The 'Large' in LLLMs (as opposed to small language models) refers to models that have been trained on much more data and have many more parameters, making them capable of knowing much more information and providing that back to the user.

Open Source Trend

Recently, there has been a trend in machine learning towards open source language models. Companies like Meta AI have released their LLMs completely open source, including the research paper, the data, and the model weights. This groundbreaking move in artificial intelligence (AI) allows us to take these models and put them on a user's hardware, customizing the model as we see fit.

Retrieval Augmented Generation

One of the ways we customize these open source models is by attaching personalized data to them in a process called Retrieval Augmented Generation. Having access to an open-source, non-black box model allows us to attach smaller modules to it that customize and personalize it based on your data.

Privacy and Security

We also touched on the importance of privacy and security. While maintaining a high fidelity output has been a challenge, we've been able to convince enterprises of the security and privacy of our models by demonstrating a large language model run locally with Pieces, right there on the device.

Edge ML and Extensibility

We've been working for some time to build our own LoRA AI multimodals models that do all the enrichment, tagging, and embeddings. We're also looking at further enterprise AI solutions such as the ability to bring their own pre-trained models to understand and generate code.

So, we've not only focused on making models edge capable, but also making them modular and extensible on top of our platform.

Running Large Language Models Locally

In conclusion, locally run large language models are a significant part of Pieces Copilot. They allow us to provide a high-quality output while maintaining privacy and security. They also allow us to customize and personalize the models based on your data, enhancing the functionality of the Copilot.

Pieces For Developers & Pieces OS

In this part of the session, we discussed the technology behind the Pieces desktop application and the difference between Pieces for Developers and Pieces OS.

Pieces OS

Pieces OS is a centralized application that runs locally on your computer. It enables any one of the Pieces plugins to connect to Pieces OS, providing a centralized backend for all plugins. This ensures that all data is shared and not replicated.

Downloading Local Large Language Models

We've built a system that downloads the best Local Large Language Models, like LLaMa LLLM, at runtime. This approach avoids shipping them directly with your Pieces OS build, which would result in long update times. The download process is smooth and doesn't interfere with other Pieces actions. Download is not required for cloud models such as GPT-4, PaLM 2, etc.

Loading and Using Models

Once the open source Large Language Model is downloaded, we can start to load and use the model for natural language processing and other use cases. This process also runs smoothly and allows you to continue saving, sharing, and reusing. The model runs in an isolate to ensure that the main thread of your Pieces OS application continues to run smoothly.

WebSocket Connection

We set up a WebSocket connection from Pieces for Developers to Pieces OS. This connection allows us to make a request to the isolate to load the model, or if it's already loaded into memory, use it immediately. We communicate changes via stream controllers, ensuring that Pieces OS runs smoothly and delivers individual changes to any of the WebSocket connections to Pieces OS.

Memory Management

Finally, we discussed memory management. If a model is loaded in the isolate, it will use CPU or RAM associated with your GPU. To manage this, we proactively unload the model if it hasn't been used for a while, or you can manually unload it once you're done using it to free up memory.

Summary of Our Product Architecture

In conclusion, Pieces for Developers and Pieces OS provide a smooth and efficient way to run large language models locally, or in the cloud across macOS, Windows, and Linux. The system ensures that all data is shared, not replicated, and manages memory usage effectively.

Right-Click Copilot Action & Continuous Regrounding

In this part of the session, we introduced a new set of features within VS Code and a few other plugins, which we've labeled "Ask Pieces Copilot About"

Introduction to New Features

These features allow you to right-click on a file in your File Explorer or select some code, right-click on it, and click the "Ask Pieces Copilot About" button. This action initializes a Copilot chat with the context you've just selected with your mouse.

Streamlining the Process

These features streamline the process of question-answering about the content within your workspace as you're working, all within VS Code.

You don't need to go out and copy and paste code into ChatGPT and open up your web browser to utilize these models. With this new feature, the process is very streamlined, allowing you to stay in flow and make generative AI more contextual and easy to use.

Integration Across Platforms

We've made a large effort to ensure that the models are powering Pieces Copilot in all of our integrations, including JupyterLab, JetBrains, Obsidian, Microsoft Teams, VS Code, and soon, Chrome and our other browser extensions. We've introduced the concept of continuous re-grounding to make this happen.

You have one model on your device that's running, but as you switch between your desktop app, your Chrome extension, and VS Code, you need to package up that context in an efficient way, send it to Pieces OS, and re-ground that model before the question is asked. This process provides a smooth experience, as you'll see in the demos.

Feature Summary

In conclusion, the right-click “Ask Copilot About” feature and continuous re-grounding capabilities provide a streamlined and efficient way to utilize the copilot within your workspace. These features are easy to use and allow you to stay in flow as you work.

RAG & Copilot Demo in Desktop

In this part of the session, we demonstrated how to set up the context for the copilot and how it can assist in generating code.

Setting Up Context

We started by choosing a runtime model and setting the context. This involved copying and pasting a tutorial link into Pieces, which then fetched all the context from that website. We also added some previously saved notes about WebSockets and client connections, and a directory where we were doing some gRPC work.

Asking the Copilot

Now running a Large Language Model locally with the context set, we asked the copilot a question. The copilot was able to provide high fidelity outputs quickly, thanks to the context we had provided. We asked it to help us get started with some WebSockets for a Dart project. The copilot provided detailed instructions, including the dependencies we needed and how to connect to localhost 1000.

Multichannel WebSocket Connections

We then asked the copilot how to support multichannel WebSocket connections. The copilot provided a detailed response, showing how to connect to multiple WebSocket servers simultaneously and how to track each channel.

Model Capabilities

This demonstration showcased the capabilities of running a 7 billion parameter model with a ton of context on a MacBook Air. The quality of the output was impressive, especially considering that the model we used, Llama 2, is only 2% the size of GPT-4.

Saving Generated Code

We concluded the demonstration by showing how you can save the generated code to Pieces for future context. We also showed how you can find similar snippets, related people, and relevant links to documentation. This feature essentially serves as an offline Google Search for your code.

Concluding the Pieces Copilot Demo

In conclusion, the RAG and Pieces Copilot demo in the desktop app showcased the power of context setting and the high-quality code generation capabilities of the copilot. Even using a model with a relatively small amount of training data, Pieces Copilot was able to provide detailed and accurate responses to our queries.

Workflow Activity & Proactive Saving

In this part of the session, we discussed the concept of workflow activity and how it relates to the copilot.

Workflow Activity

Workflow Activity is a feature that helps you understand when you engage with certain elements, what you're copying and pasting, and how you're interacting with your Pieces drive. It captures subtle usage of the material, such as updates, references, and searches.

Copilot's Understanding

This feature is particularly useful for the Copilot as it can understand what you were doing previously or where you left off. It can also understand what is most important from a perspective of Retrieval Augmented Generation (RAG). This is achieved by using searches, references, deletes, updates, and shares as proxies for relevance to your current workflow activity.

Future Capabilities

In the near future, you'll be able to ask the Pieces Copilot, "What was I doing yesterday?" and it will pick up exactly where you left off.

Proactive Saving

We also introduced the concept of proactive saving, which will be released towards the middle of Q4. Currently, you can copy and paste things into Pieces, and it will do all the enrichment, tagging, and context adding. However, in the future, things will be automatically saved in an invisible layer that we call Ghost Assets.

Ghost Assets

Ghost Assets are materials, people links, and so on that were either generated in a conversation or not actively saved by yourself. As you move from the browser to the IDE to the collaborative space, we're pattern recognizing important materials.

We maintain a pool of roughly 200-300 Ghost Assets that are always time-decayed and re-ranked to provide automatic retrieval augmented generation. This feature enhances the functionality of workflow activity, proactive saving, and code generation.

Copilot in VS Code Demo & Value of Context Setting

In the next part of our session, Caleb demonstrated the Copilot features within VS Code, one of our most popular integrations.

VS Code Extension Context

For context, we used a mono repo of multiple example extensions for JupyterLab, one of our integrations. The JupyterLab team maintains this repository as a comprehensive set of examples on creating extensions in JupyterLab.

Cell Toolbar Button Feature

One feature we highlighted was the addition of a Cell toolbar button in the Pieces JupyterLab extension. This was a challenging task during implementation, and we found that tools with a transformer architecture like ChatGPT weren't exactly helpful. However, after implementation, we realized that this would be a great use case for RAG.

'Ask Pieces Copilot' Feature

To demonstrate, we right-clicked on the toolbar button folder and clicked 'Ask Pieces Copilot'. We asked a question about the code contained within the directory: "How can I add a Cell toolbar button to my JupyterLab extension?"

The Pieces Copilot set the context and provided a step-by-step guide. Although it didn't immediately provide the exact answer, it did give a list of relevant files. Clicking on the schema plugin file led us to the answer we were looking for: a plugin JSON file where you can write a bit of JSON to add a Cell toolbar button.

Context Selector Feature

We also demonstrated the context selector feature. This feature displays your entire directory structure within your workspace, allowing you to set your context by clicking any of the checkboxes.

'Snippetizer' Algorithm

We discussed the challenge of extracting the right context from a large grouping of files. Our solution involves a 'snippetizer' algorithm that ranks the best or most snippet-like snippets. This process involves a bit of machine learning and heuristics processing, including bracket matching and tree parsing.

Use of Small Models

We also touched on the use of small local language models that run on the device. These models create embeddings that we rely on for various activities, such as ranking your search results and associating similarities between people, events, snippets, workflow, and context. These small local models run very fast and can process information at scale before feeding it into the large language model.

Unrelated Workspace Query

In the final part of the demo, we asked the Pieces Copilot a question unrelated to the current workspace. We asked, "How do I await a WebSocket connection in JavaScript?" The copilot provided a detailed response, generating JavaScript code using the Promise API.

Summing Up Pieces Copilot in VS Code

In conclusion, the Pieces Copilot in VS Code demo showcased the value of context setting and the usefulness of having this tool right inside your IDE. Even if the Copilot doesn't provide an exact answer, it points you in the right direction, saving you time and making your job easier.

The combination of proactive saving, workflow shadowing, and a copilot that can navigate all the things in Pieces, elevates your workflow and provides a world-class experience.

Obsidian Demo & Future Copilot Capabilities

As we moved into the second half of our session, we showcased our integration with Obsidian, a popular note-taking application and a much beloved integration to share how you could leverage the capabilities of running LLM locally in our Obsidian plugin.

Auto-Enrichment Feature

Caleb, one of our plugin developers, led this part of the session. We started with a demonstration of our auto-enrichment feature, which generates titles, tags, and links to similar code snippets within your Obsidian vaults. This feature is designed to help you manage and navigate your notes more effectively, especially if you have a large number of notes in your vault.

Privacy and Security

We also discussed the privacy concerns of many Obsidian users. To address these concerns, we've developed a feature that allows you to use Llama 2, a locally run large language model. This means you can use Pieces without sharing your data online, providing a secure, air-gapped experience.

Obsidian Pieces Copilot

We then moved on to discuss the copilot within Obsidian. Similar to our other integrations, the Obsidian Pieces Copilot allows you to choose your runtime and context. Caleb demonstrated how you can use the copilot to answer questions based on your notes, providing a detailed summary of the information in your notes.

Future of Pieces

We also touched on the future of Pieces. We're working on features that will allow you to create copilot profiles and workspaces, which will enable you to customize your copilot's verbosity and scope its context to specific workspaces. We're also exploring ways to further use the copilot to power Global Search, providing you with relevant files and suggested search queries.

Integration Testing for Plugins

We concluded the Obsidian demo with a discussion on the importance of integration testing for plugins. Caleb demonstrated how you can use the copilot to get a detailed overview of the steps involved in a full integration test.

With that, we wrapped up the Obsidian demo and moved on to the next part of our session.

Support Highlights

Similar to our previous AMA, we took plenty of time to address some of the live questions being asked during the AMA. We made sure to cover questions about the support for the LLLMs on different device architectures.

GPU and CPU Support

We're proud to offer support for both GPU and CPU, and users can switch between whichever they prefer to use. For macOS users, we utilize Apple Metal Shaders to accelerate on the GPU. For Linux or Windows users, we use the Vulcan SDK, which allows us to access either an Nvidia or an AMD GPU.

System Requirements

This means our system should work on all consumer GPUs out there. However, please note that our model is fairly large and takes up somewhere between five to six gigs of VRAM, so please check your system's capabilities before you download and use it.

Experimental Feature and Support

We also want to emphasize that this feature is experimental. If you encounter any issues, such as long loading times or selecting the incorrect model, don't worry. You can always reach out to our support team for help. You can also restart your Pieces OS in the top right corner of the toolbar and retry with a different model.

Sharing Tech Specs and Experience

We encourage you to share your tech specs and your overall experience with us, so we can assist you throughout the process, fix any bugs in the wild, and learn from our users.

What Makes Pieces Unique?

We took a moment to reflect on what sets Pieces apart from other generative AI startups.

Plug-and-Play Local Large Language Models

We're proud to be one of the first companies to ship plug-and-play local language models across macOS, Linux, and Windows. This is a significant achievement, and we believe it's a testament to the hard work and dedication of our team.

Unique Offline Capabilities

Our team member Caleb highlighted the unique offline capabilities of Pieces. The ability to perform a wide range of tasks without a Wi-Fi connection is truly remarkable and sets us apart in the market.

Model Distillation, Fine-Tuning, and Retrieval Augmented Generation

We've developed extensive capabilities around model distillation, fine-tuning, and retrieval augmented generation. As technology evolves, we're seeing a convergence of big models in the cloud and local models on devices. We're at the forefront of this shift, building blended capabilities that offer privacy, security, and trust.

Obsidian Demo

Caleb then continued with the Obsidian demo, showcasing the relevant notes feature, the direct links to information sources, and the suggested follow-up questions.

Local LLM Capabilities

He demonstrated how users can ask another question, and the Local LLM model installed on the machine (Llama 2 in this case) will provide the answer. This seamless experience is a testament to the power and capabilities of Pieces.

With that, we concluded the Obsidian demo and moved on to the next part of our session.

Do LLMs Support Multi-Language?

As we neared the end of our session, we addressed a common question about the language support of Large Language Models (LLMs).

Spoken Languages

When it comes to spoken languages, Llama 2, the open source LLM run locally on Pieces, was trained primarily on English data. A significant part of training large language models involves reinforcement learning with human feedback, where human evaluators assess the output of the model.

These evaluations are based on English output, so the model will almost always respond in English. However, it might understand queries input in other languages, as data from other languages might have inadvertently been included in the training dataset.

Programming Languages

As for programming languages, LLMs are trained on extensive open-source code datasets, likely from GitHub and other sources. Therefore, they support all common programming languages.

As demonstrated in our session, even less common languages like Dart are supported, with the model able to generate syntactically correct Dart code. This is because LLMs, like seasoned developers, can understand the syntax and general structure of code, allowing them to adapt to new languages quickly.

Testing Language Support

We encourage our users to test the limits of our LLMs' language support. If you find a language that our models don't support, we'd love to hear about it through our support channels. We're always looking to improve and expand the capabilities of our models.

Pro, Pro + Cloud & Teams Plans

As we approached the end of our session, we wanted to touch upon our upcoming plans for Pieces as a business.

Pricing Plan

We're excited to announce that we're in the process of developing our Pro, Pro + Cloud, and Teams plans. These plans are all in the later stage of development, and we have the entire release roadmap laid out.

Series A Funding Round

However, we're currently in the midst of our Series A funding round, and there are some nuances around when to turn on monetization. We're receiving advice from advisors and investors, and we're working through these considerations at the business level.

Enterprise Pilots

We also want to mention that our Enterprise pilots are paid and are currently out there. We're thrilled about the support we've received so far and are committed to taking good care of our early adopters. If you’re interested in starting a pilot at your company, reach out to us here.

Supporting Pieces

If you're interested in supporting us in more ways beyond economically, feel free to reach out to us. We're always open to collaboration and are excited about the future of Pieces. Stay tuned for more updates on our plans!

Upcoming Features & Closing Remarks

We concluded the session by sharing some exciting features coming to Pieces that we are really stoked about.

Persisted Chats

One of the features we're eagerly looking forward to is persisted chats. While it's still in the scoping phase, we're confident that it will be coming soon.

Additional Models

We're also planning to add additional models, both local and cloud-based ones. The infrastructure is already in place to deploy LLMs locally, and we're now focusing on adding support for individual models and integrating them into the pipeline.

Integration of Models from Various Sources

We're looking forward to integrating models from various sources, including OpenAI, Google, Meta, and Project Gemini.

Larger Models

What excites Brian the most is the prospect of larger models. For users with powerful machines, like the 6800 or the Arm Mac, especially an M Two, we can serve bigger models that will significantly increase the performance of the model on your machine.

Conclusion

We'd like to express our sincere gratitude to everyone who participated in our discussion about Local Large Language Models (LLLMs) and their role in Pieces Copilot.

Our team at Pieces for Developers is constantly pushing the boundaries of what's possible in the realm of intelligent code generation and management.

We're particularly excited about the potential of Local LLMs to enhance the functionality of Pieces Copilot, providing high-quality output while maintaining privacy and security. The ability to customize and personalize these models based on your data is a game-changer, and we're thrilled to be at the forefront of this innovation.

We're also looking forward to introducing new features and improvements that we've discussed today, and we hope you share our excitement. Stay tuned for announcements about our next AMA; we can't wait to delve deeper into the unique capabilities of Pieces and how it can revolutionize your coding workflow.

Whether you're a current user with feedback, a developer intrigued by the possibility to run LLM locally, or someone just curious about the technology, we invite you to join our next session and become part of our vibrant community on Discord!

Once again, thank you for your invaluable participation and for being a part of our community. Your feedback and engagement are what drive us to continue innovating. Until the next AMA, happy coding!