#Ai

4 min read

views

Running Large Language Models Locally: A Guide to Open-Source LLMs

Anass Nabil

Running Large Language Models Locally: A Complete Setup Guide

Hello! In today's blog, we'll explore how to set up and run a Large Language Model (LLM) locally on your own machine. Running an LLM locally means that, instead of relying on cloud services, the model—similar to ChatGPT or DeepSeek—runs directly on your hardware.

It's important to note, however, that popular models like ChatGPT are closed-source and cannot be downloaded or run locally. Fortunately, open-source alternatives exist. For instance, OpenAI has recently released their first open-source models, ChatGPT-OSS-12B and ChatGPT-OSS-80B, which you can freely download, experiment with, and integrate into your own projects.

Why Run LLMs Locally?

Before we dive into the setup, let's look at the pros and cons of using a local LLM.

#🔒 Privacy and Security

One of the most important advantages is privacy and security. When using cloud-based LLMs, all the data and information you provide are stored on their servers. With a local LLM, your data never leaves your machine, which is especially valuable when handling sensitive information such as:

  • Healthcare records
  • Financial data
  • Internal company documents
#📡 Offline Access

Another key benefit is offline access. A local LLM can function without an internet connection, allowing you to use the model anytime, anywhere, without relying on network availability.


Requirements

#1. Ollama

Ollama is a free, open-source tool that simplifies running large language models (LLMs) like LLaMA 3 and Mistral directly on your computer.

Download it from: https://ollama.com/

Ollama website homepage
#2. Continue Extension
Continue extension logo

Continue is an open-source AI coding assistant (kind of like GitHub Copilot) that you can install in VS Code. With this extension you can run your models locally (like LLaMA, Mistral, etc.) or connect to remote APIs (like OpenAI).

Download it from VS Code Extensions: https://www.continue.dev/

Continue extension demo

Installation Process

#Step 1: Installing the Models

You have two main options for downloading models:

Option A: Direct from Ollama

You can download a specific model directly from Ollama:

Ollama model library

Option B: From Hugging Face

Alternatively, you can download models from Hugging Face. I usually use https://huggingface.co/unsloth — simply select the model you want and download it in GGUF format:

Hugging Face model download

Once downloaded, you can load the model into Ollama and connect it with Continue in VS Code.

#Step 2: Configure Continue to Use Ollama

The config.json file is used by the Continue extension to specify how it should run and connect to your AI models. Essentially, it acts as a settings file for your local or remote LLMs.

Continue configuration interface

Adding a Model

Under the "models" section, you can add a chat model in the following format:

{
  "title": "Llama3.1 Chat",
  "model": "llama3.1",
  "contextLength": 4096,
  "provider": "ollama"
}
json

Task-Specific Model Configuration

You can assign different models for different tasks, such as:

  • Chat → For interactive conversations
  • Autocomplete → To help you write code or text faster
  • Edit → To refine or improve existing code/text
Model configuration example

With this setup, Continue can seamlessly interact with your local LLMs, giving you AI-powered assistance directly in VS Code.


🎉 You're All Set!

And that's it — you're all set! You can now use the chat and autocomplete features locally on your device.

Continue extension in action

⚠️ Important Note

Hardware Compatibility Warning: Make sure to choose a model that is compatible with your computer's hardware. Using a model that's too heavy may cause your PC to lag... or worse, "die" under the load 😉


Happy coding! 🚀

Enjoy this post? Like and share!