How To Run Large Language Models Locally On Your Laptop (Using Ollama)
Unlock the potential of Large Language Models on your laptop. Explore our guide to deploy any LLM locally without the need for high-end hardware.
In the wake of ChatGPT’s debut, the AI landscape has undergone a seismic shift. Large Language Models (LLMs) are no longer just a niche; they’ve become the cornerstone of modern machine learning, representing the pinnacle of AI advancements. From tech giants to startups, everyone seems to be unveiling their own LLM and integrating into their products.
One of the biggest hurdles with Large Language Models has always been their demanding hardware requirements, making it seem nearly impossible to run them on our everyday laptops or desktops. Yet, here lies the breakthrough: we’ve found a way to harness the power of LLMs right on our local machines, sidestepping the need for those high-end GPUs. This is the guide on how to run LLMs on laptops.
Deploy LLMs Locally with Ollama
For this tutorial we will be using Ollama, a nifty tool that allows everyone to install and deploy LLMs very easily. So let’s begin.
Important notes:
- For this tutorial we will be deploying Mistral 7B. This has a minimum requirement of 16GB memory.
- 3B, 7B and 13B models require 8B, 16GB and 32GB memory respectively.
- Inference is done locally meaning no internet connection is required besides the initial download of the model.
- Mistral 7B is licensed under apache 2.0, allowing anyone to use and work with it.
Update July 2024: Meta released their latest and most powerful LLAMA 3.1 models. The process of running the Llama 3.1 models is the same, the article has been updated to reflect the required commands for Llama 3.1: 8B, 70B and 405B models. It is now recommended to download and run the Llama 3.1 models that can be run locally on your laptop.
Time needed: 10 minutes
- Installing Ollama.
Head over to Ollama.ai and click on the download button.
- Download Ollama for your OS.
You can download Ollama for Mac and Linux. Windows version is coming soon.
- Move Ollama to Applications.
Move Ollama to the applications folder, this step is only for Mac users. Once Moved Ollama will successfully be installed on your local machine. Click the next button.
- Installing Command Line.
Now we need to install the command line tool for Ollama. Simply click on the ‘install’ button.
- Deploying Mistral/Llama 2 or other LLMs.
Install the LLM which you want to use locally. Head over to Terminal and run the following command ollama run mistral.
- Downloading Mistral 7B
Ollama will now download Mistral, which can take a couple of minutes depending on your internet speed (Mistral 7B is 4.1GB). Once it’s installed you can start talking to it.
That’s it! It is as simple as that.
Which LLM To Download? Mistral 7B vs Llama 2?
Which Large Language Model to deploy to might be a hard question. There are a lot of factors to consider and also your own requirements and what hardware you might have. There is a good rule of thumb; the higher the parameter count, the better the model performs. But you might have noticed something different in this tutorial and that is we are using Mistral 7B.
There are two major reasons for this. The first being Mistral 7B outperforms almost all open-source 7B models currently available. Mistral 7B outperforms Llama 2 13B in all benchmarks, which is quite impressive. In fact Mistral 7B outperforms Llama 1 34B on many benchmarks! The second reason being Mistral 7B requires 16GB memory which is more doable than a 32GB memory requirement for 13B models.
Model | Parameters | Size | Download |
---|---|---|---|
Mistral | 7B | 4.1GB | ollama run mistral |
Llama 2 | 7B | 3.8GB | ollama run llama2 |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
Llama 2 13B | 13B | 7.3GB | ollama run llama2:13b |
Llama 2 70B | 70B | 39GB | ollama run llama2:70b |
Llama 3.1 8B | 8B | 4.7GB | ollama run llama3.1:8b |
Llama 3.1 70B | 70B | 40GB | ollama run llama3.1:70b |
Llama 3.1 405B | 405B | 231GB | ollama run llama3.1:405b |
Orca Mini | 3B | 1.9GB | ollama run orca-mini |
Vicuna | 7B | 3.8GB | ollama run vicuna |
Why Run LLMs Locally?
Another important question that you should be asking is, why even bother running these massive LLMs on your laptop? Well, the short answer is you own everything.
Using a tool like Ollama will allow you to run LLMs on your own machine. That means no need to rely on a 3rd party APIs or using any cloud services. Running models locally means users’ data doesn’t need to leave their machine, offering a higher level of privacy and security. There is also a high degree of flexibility, allowing you to customise the model according to your needs. Ollama also offers REST API support.
It is going to be only a matter of time before we see even more powerful models being able to run locally on a laptop or desktop. Maybe even a phone in the not so far future.
Interested in Learning More?
Check out our comprehensive courses to take your knowledge to the next level!
Browse Courses
Just want to clarify on RAM/VRAM minimum specs as a lot of sites say minim RAM 8GB, does that mean VRAM GPU or just local RAM 8GB? Can ollama utilize either local RAM and VRAM, either/or, or does it prioritize what available VRAm you have while using local RAM at same time? Reason being I have an old gaming laptop with an old GPU card 4GB VRAM but maxed out local RAM at 64GB. Will I be able to utilize ollama llm models sufficiently having not much VRAM?