How to run a local AI model on your computer with LM Studio

We breakdown how to use LM Studio to download and use an open source model like DeepSeek (a distilled version, anyway) on your own computer.

No items found.

Your Complete Guide to Running AI Models Safely on Your Computer with LM Studio

This guide combines insights from multiple expert YouTube tutorials and additional research to help you run artificial intelligence models (like DeepSeek) on your own computer safely and privately with a tool called LM Studio.

Understanding Technical Terms

Before we begin, let us define some important terms you will encounter:

LLM (Large Language Model): An AI program trained on massive amounts of text to understand and generate human-like responses.
Parameters: Numbers that determine how smart and capable the AI model is. More parameters usually mean better performance but require more computer power.
Quantization: A process that makes AI models smaller and faster by reducing the precision of their calculations. Q4_K_M means a specific type of compression that balances quality and size.
GPU (Graphics Processing Unit): A computer chip originally designed for graphics that can also speed up AI calculations significantly.
VRAM (Video RAM): Memory specifically used by your graphics card. More VRAM allows you to run larger AI models.
API (Application Programming Interface): A way for different computer programs to communicate with each other.
GGUF format: A special file format designed specifically for running AI models efficiently on personal computers.
Distilled model: A smaller AI model that has been trained to mimic the behavior of a much larger model, making it faster and more accessible.
Tokens per second: A measure of how fast an AI model can generate text - higher numbers mean faster responses.

What Are AI Models and Why Run Them Locally?

Large Language Models (LLMs) are artificial intelligence programs that can understand and generate human-like text. Examples include ChatGPT, Claude, and DeepSeek. Normally, these models run on company servers "in the cloud," which means your conversations are sent over the internet to their computers.

Running models locally means downloading the AI program to your own computer so it works without sending any data to outside companies.

Privacy Concerns with Cloud-Based Models

When you use online AI services, several privacy issues arise:

DeepSeek stores your data on servers in China, where it becomes subject to Chinese cybersecurity laws that give authorities broad access to user information.
Cloud services track and analyze all your conversations for business purposes (and according to recent legal precedent, these companies may not actually be deleting your chats when you ask).
API costs (the fees companies charge for each question you ask) can add up quickly if you use the service frequently.
No control over model updates - companies can change how the AI works or remove access at any time.

Benefits of Running Models Locally

When you run AI models on your own computer, you get several advantages:

Complete privacy: Your data never leaves your computer - all processing happens locally on your machine.
Works completely offline*: Once you download the model, you can use it without any internet connection.
No subscription fees: You avoid recurring monthly costs or pay-per-use charges.
Full control: You decide which model version to use and how it behaves.
Access to different types of models: You can run both filtered models (that refuse certain requests) and unfiltered models (that will answer most questions).

🚨 CRITICAL SECURITY WARNING

Most tutorials miss this important point: Local Large Language Model (LLM) tools like LM Studio may still have internet access by default. This means they could potentially share your data online, which defeats the purpose of running them locally for privacy. We will show you how to verify and ensure your AI model truly works offline.

‍

Step 1: Install LM Studio

LM Studio is a user-friendly desktop application that makes it easy to download and run AI models on your computer.

If you only have 3 minutes, check out Futurepedia's simple explainer on LM Studio (it really is that easy to follow).

System Requirements

Before installing, make sure your computer meets these minimum requirements:

Memory (RAM): At least 16 gigabytes recommended. This is temporary storage your computer uses while running programs.
Processor: A modern CPU (Central Processing Unit) with AVX2 support. Most computers made after 2015 have this.
Graphics Card: A GPU with 8GB or more VRAM for better performance. This is optional but highly recommended.
Storage Space: At least 50 gigabytes of free space on your hard drive for the software and AI models.
Operating System: Windows 10 or 11, macOS 10.15 or newer, or a modern Linux distribution.

Installation Process

Download LM Studio: Go to the website lmstudio.ai using your web browser.
Choose your operating system: Click the download button for Windows, Mac, or Linux depending on what computer you have.
Run the installer: Double-click the downloaded file and follow the on-screen instructions.
Use default settings: The installer will suggest settings that work for most people - you can accept these.

The installation process is straightforward and should take only a few minutes.

Step 2: Choose and Download Your First AI Model

Once LM Studio is installed, you need to download an AI model to use. Different models have different capabilities and hardware requirements.

Choosing the Right DeepSeek Model for Your Computer

DeepSeek is one of the top open source AI model providers and offers several different versions of their AI models, each designed for different computer capabilities. Understanding which model to choose is crucial for getting the best performance from your hardware.

For an in depth guide on what kind of hardware you need to run DeepSeek R1 (DeepSeek's top "reasoning" model), click here. For a simple guide on how DeepSeek's architecture works, click here.

Hardware Requirements by Model Size

Understanding what your computer can handle will help you choose the right model:

For the 1.5B Model (Most Accessible):

Minimum requirements: Any modern processor with 8GB of system memory.
No graphics card needed: Can run entirely on your main processor.
Storage space: About 2-3 gigabytes of free disk space.
Performance: Good for basic questions and reasoning tasks.

For 7B-8B Models (Recommended for Most Users):

System memory: 16GB of RAM for smooth operation.
Graphics card: NVIDIA RTX 3060 (12GB VRAM) or similar for best performance.
Storage space: About 4-8 gigabytes per model.
Performance: Significantly better at complex reasoning and general knowledge.

For 14B-32B Models (Power Users):

Graphics card: Needs 12-24GB of VRAM for optimal performance.
System memory: 32GB+ of RAM recommended.
Storage space: 10-20 gigabytes per model.
Performance: Excellent for professional and research applications.

For 70B Models (Enthusiasts Only):

Graphics card: Requires 48GB+ of VRAM (usually multiple high-end cards).
System memory: 64GB+ of RAM.
Storage space: 40+ gigabytes.
Performance: Near-professional quality but requires expensive hardware.

‍

The Most Popular Choice: DeepSeek-R1-Distill-Qwen-1.5B

We're not kidding about it being popular; this small, distilled version of DeepSeek-R1 has been downloaded over a million times.

What "1.5B" means: This model has 1.5 billion parameters, making it the smallest in the DeepSeek family. Think of parameters like brain cells - more parameters usually mean smarter responses, but they also require more computer power.

Why it's popular:

Extremely accessible: Can run on almost any modern computer, even without a powerful graphics card.
Still very capable: Despite being small, it maintains the reasoning abilities that make DeepSeek special.
Privacy-focused: Runs completely offline with no need for internet connections.
Free to use: No API fees or subscription costs.

Where to download:

Official version: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Optimized version: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF

GGUF? "What is that?" you might be asking. Well...

Understanding Model Formats

GGUF files: These are optimized versions of AI models designed specifically for running on personal computers. They use advanced compression techniques to make models smaller and faster without losing much quality.

Quantization levels: Different compression amounts:

Q8_0: Highest quality, largest file size (about 8GB for a 7B model).
Q4_K_M: Good balance of quality and size (about 4GB for a 7B model) - Recommended.
Q3_K: Smaller file, slightly reduced quality (about 3GB for a 7B model).

Recommended Starting Models by Computer Type

For most Windows/Linux computers with 16GB+ RAM:

DeepSeek-R1-Distill-Qwen-7B (Q4_K_M): Best balance of intelligence and accessibility
- Primary GGUF: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF
- LM Studio Community: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF
- Unsloth: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF
- Original Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Llama-8B (Q4_K_M): Great alternative with similar performance
- Primary GGUF: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF
- LM Studio Community: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Llama-8B-GGUF
- Unsloth: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF
- Original Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B

For Mac computers with 16GB+ RAM:

DeepSeek-R1-Distill-Qwen-7B: Optimized for Apple Silicon chips, excellent performance.

For older or lower-powered computers:

DeepSeek-R1-Distill-Qwen-1.5B: Will run on almost any modern computer.

For high-end gaming computers:

DeepSeek-R1-Distill-Qwen-14B or 32B: If you have 24GB+ of VRAM, these provide superior results.
- 14B version:
  - Primary GGUF: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF
  - LM Studio Community: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-14B-GGUF
  - Unsloth: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF
  - Original Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- 32B version:
  - Primary GGUF: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF
  - LM Studio Community: https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-32B-GGUF
  - Unsloth: https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF
  - Original Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Special Consideration for Mac Users

For 16GB MacBook Air or MacBook Pro users, the sweet spot is the DeepSeek-R1-Distill-Qwen-7B model:

Why 7B is better than 1.5B for Macs:

Much better performance: Scores 92.8% on mathematical reasoning tests compared to 83.9% for the 1.5B version.
Faster on Mac hardware: Apple's M-series chips handle the 7B model very efficiently.
Still fits in memory: A 16GB Mac can comfortably run the 7B model with room for other applications.
Real-world speed: Testing shows 53-55 tokens per second on an M2 MacBook Pro, which feels very responsive.

Easy installation on Mac:

Using LM Studio: Search for "DeepSeek-R1-Distill-Qwen-7B" in the Discover tab
Using Ollama (another, slightly more technical tool): Type "ollama run deepseek-r1:7b" in the Terminal.

This is a pretty standard setup that the average Mac user either already has or could attain at not much cost.

Download Process

Open the Discover tab: In LM Studio, click the magnifying glass icon (🔍) labeled "Discover."
Search for models: Type "DeepSeek R1" in the search box.
Choose your model size: Select the model that matches your computer's capabilities.
Choose quantization level: Look for versions labeled Q4_K_M - this provides a good balance between quality and file size.
Check hardware compatibility: Green icons next to models mean your computer can run them with full GPU acceleration (using your graphics card for speed).
Start the download: Click the download button and wait. Download times vary:
- 1.5B model: 10-30 minutes on average internet.
- 7B model: 30-90 minutes on average internet.
- Larger models: Several hours depending on your internet speed.

Direct Download Links for Advanced Users

If you prefer to download models directly from HuggingFace (the main repository for open AI models):

Original uncompressed models:

Pre-optimized GGUF versions (recommended):

1.5B optimized (Unsloth): https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF
Alternative 1.5B (Bartowski): https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF

The GGUF versions are pre-compressed and optimized for LM Studio, making them faster to download and easier to use.

Step 3: Load and Test Your Model

After downloading, you need to load the model into memory so you can use it.

Loading Process

Go to the Chat tab: Click the speech bubble icon in LM Studio.
Select your model: Click the dropdown menu at the top that says "Select a model."
Choose your downloaded model: Pick the DeepSeek model you just downloaded.
Configure settings: Before loading, you can adjust:
- GPU offload: How much of the model runs on your graphics card (higher = faster).
- Context length: How much conversation the AI remembers (longer = more memory used).
- Temperature: How creative the responses are (0.6 is recommended for reasoning tasks).
Load the model: Click the "Load Model" button and wait. This process puts the AI into your computer's active memory.
Wait for loading: Depending on your hardware, this can take 30 seconds to several minutes.
Start chatting: Once loaded, you can type questions in the text box at the bottom.

Testing Your Model

Try asking a simple question to make sure everything works:

"Hello! Can you explain what you are and what you can help me with?"

For reasoning models like DeepSeek R1, you might also try:

"If a plant doubles in size every day and covers a lake in 30 days, how long would it take two plants to cover half the lake?"

You should see the model "think" through the problem step by step, showing its reasoning process before giving the final answer.

🔒 CRITICAL: Verify True Offline Operation

This is the most important security step that many tutorials skip. Just because an AI model is "local" does not automatically mean it is private.

The Hidden Problem

Even when running "local" AI models, the software might still access the internet in the background. This could happen to:

Check for software updates.
Download additional model components.
Send usage statistics back to the company.
Access current information to answer questions about recent events.

If your goal is privacy, you need to verify that your AI model truly works offline.

Method 1: Simple Verification Test

Before trusting your setup with sensitive information, test it with this specific question (from LM Studio Official documentation):

"Answer the following question using ONLY the local LLM model with no internet access: What happened in the news today?"

What the results mean:

Good response: The AI says it does not know current news, cannot access recent information, or explains that it only knows information up to its training date.
Bad response: The AI provides recent news stories, which means it is somehow accessing the internet.

Method 2: Airplane Mode Test

For complete verification:

Disconnect from internet: Turn on airplane mode or unplug your ethernet cable.
Test the model: Ask several questions including current events.
Verify responses: The model should still work for general questions but should not know anything recent.
Reconnect: Turn internet back on only after testing.

Method 3: Network Connection Monitoring (Advanced)

For users comfortable with technical tools, you can monitor what network connections LM Studio makes.

On Windows:

Open PowerShell: Press Windows key + X, then select "Windows PowerShell (Admin)"
Run this command:

powershell

Get-Process "LM Studio" | ForEach-Object { Get-NetTCPConnection -OwningProcess $_.Id -ErrorAction SilentlyContinue }

Check the results: You should only see local connections (addresses starting with 127.0.0.1 or localhost). Any other IP addresses mean the program is connecting to the internet.

Expected safe results:

Connections to 127.0.0.1:1234 (local API server).
Connections to localhost addresses.
No external IP addresses when chatting with the model.

Method 4: Complete Isolation with Docker (Most Secure)

For maximum security, advanced users can run AI models in complete isolation using Docker, a technology that creates secure containers for applications.

What Docker does: Docker creates a virtual container that isolates the AI model from your main operating system and network. Think of it like putting the AI in a secure, soundproof room where it cannot communicate with the outside world.

Prerequisites:

Install Docker Desktop for your operating system
For Windows users: Install WSL2 (Windows Subsystem for Linux)
Basic comfort with command-line interfaces

Docker isolation setup:

Install Docker: Download Docker Desktop from docker.com
Use Ollama instead of LM Studio: Ollama is another local AI tool that works well with Docker
Run this command to create an isolated container:

bash

docker run -d --gpus all \ --name ollama-secure \ --security-opt no-new-privileges \ --read-only \ --tmpfs /tmp \ --network none \ -p 11434:11434 \ ollama/ollama

This command creates a container that:

Cannot access the internet (--network none)
Cannot modify files (--read-only)
Has limited system privileges (--security-opt no-new-privileges)
Still uses your GPU for speed (--gpus all)

Advanced Features of LM Studio

Once you have verified your setup is secure, you can explore additional features.

Document Chat Feature

LM Studio allows you to upload documents and ask questions about them privately.

How it works:

Drag and drop files: You can drag PDF files, text documents, or other files directly into LM Studio.
Local processing: All document analysis happens on your computer - nothing is uploaded to the internet.
Ask questions: You can then ask the AI questions about the content of your documents.
Private analysis: This is particularly useful for analyzing confidential business documents, personal files, or sensitive research.

API Server for Developers

LM Studio includes a built-in server that allows programmers to use the AI model in their own applications.

How to access it:

Go to Developer tab: Click the gear icon in LM Studio.
Start the server: Click "Start Server" button.
Use the API: The server creates an endpoint at http://localhost:1234 that works like OpenAI's API
Build applications: Developers can now build custom applications that use your local AI model instead of cloud services.

This feature is valuable for programmers who want to create AI-powered applications without relying on external cloud services.

Model Comparison with Playground

The Playground feature allows you to test multiple AI models simultaneously to see which one gives better answers for your specific needs.

How to use it:

Load multiple models: Download and load several different AI models.
Access Playground: Click the Playground tab in LM Studio.
Enter a question: Type the same question for all models.
Compare responses: See how different models answer the same question.
Choose the best: This helps you identify which model works best for your specific use case.

Performance Optimization Tips

Getting the best performance from your local AI models requires understanding how to optimize your computer's settings.

GPU Acceleration Settings

What GPU acceleration means: Using your graphics card instead of just your main processor to run AI calculations. This can be 10-50 times faster.

How to enable it:

Check model loading settings: When loading a model, look for "GPU offload" settings.
Increase the slider: Move the GPU offload slider to a higher number.
Monitor performance: Higher numbers mean more of the model runs on your graphics card.
Find the sweet spot: Start high and reduce if you experience crashes or errors.

Memory Management

Context Length: How much of your conversation the AI can remember at once. Longer context uses more memory but allows for more complex discussions.

Recommended settings:

Short conversations: 2048 tokens (saves memory).
Long discussions: 8192 tokens (uses more memory but remembers more).
Document analysis: 16384+ tokens (for working with large documents).

Temperature Settings: Controls how creative or conservative the AI's responses are:

0.1-0.3: Very focused and factual responses.
0.6: Balanced creativity and accuracy (recommended for reasoning models).
0.8-1.0: More creative and varied responses.

For more on this, read Akshay's brilliant (and simple) thread on X that explains the concept of temperature well.

For specific performance optimization tips:

NVIDIA LM Studio Performance Guide - GPU acceleration and optimization techniques
RTX GPU Acceleration with LM Studio - Advanced GPU performance optimization
LM Studio CPU Tutorial - Temperature, context length, and performance settings
GPU Acceleration Guide - Hardware settings and optimization

Censored vs Uncensored Models

Understanding the difference between filtered and unfiltered AI models is important for choosing the right tool for your needs.

Censored (Filtered) Models

Most AI models, including standard DeepSeek versions, have built-in filters that prevent them from:

Providing information about illegal activities.
Giving medical or legal advice.
Answering questions about sensitive political topics.
Helping with potentially harmful requests.

These filters exist to prevent misuse and protect both users and the companies that create the models.

Uncensored (Unfiltered) Models

Some models, like the Dolphin series, have fewer restrictions and will attempt to answer most questions. These can be useful for:

Academic research: Studying controversial topics objectively.
Creative writing: Generating content without artificial limitations.
Technical education: Learning about cybersecurity, forensics, or other technical fields.
Philosophical discussions: Exploring complex ethical questions.

David talks about one potential uncensored model you can install, but we'll leave the explanation up to him if you're curious. He offers an important reminder to use uncensored models responsibly and in accordance with your local laws and ethical guidelines.

Alternative Deployment Options

Another local alternative to LM Studio is Ollama:

Command-line focused tool for running local AI models.
More technical but very powerful.
Works well with Docker for security.
Preferred by developers and technical users.

Troubleshooting Common Issues

Here are solutions to problems you might encounter when running local AI models.

Model Size and Performance Issues

Problem: "Model too large" or out of memory errors.

‍Solutions:

Download a smaller quantized version (Q3_K instead of Q8_0).
Close other applications to free up RAM.
Try a model with fewer parameters (1.5B instead of 7B).
Restart your computer to clear memory.

Problem: Very slow response times

‍Solutions:

Enable GPU acceleration in model settings.
Reduce context window size in advanced settings.
Try a smaller, faster model.
Make sure your graphics card drivers are updated.

Network and Privacy Concerns

Problem: Uncertain whether the model is truly offline

‍Solutions:

Always perform the verification test described earlier.
Use airplane mode during sensitive conversations.
Use network monitoring tools to check for unexpected connections.
Consider Docker isolation for maximum security.

Problem: Model seems to have current information it should not haveInvestigation steps:

Check if you are accidentally using a cloud-based model instead of local.
Verify that LM Studio is not set to use online search features.
Test with airplane mode enabled on your computer.
Review LM Studio settings for any internet-connected features.

Software and Installation Issues

Problem: LM Studio will not start or crashes'

‍Solutions:

Check that your computer meets minimum system requirements.
Update your graphics card drivers.
Try running as administrator (Windows) or with sudo (Linux).
Reinstall LM Studio with a fresh download.

Problem: Cannot download models

‍Solutions:

Check your internet connection stability.
Try downloading during off-peak hours for faster speeds.
Clear LM Studio's cache and restart the application.
Try downloading smaller models first to test the connection.

Sources:

LM Studio Model Loading Issues - Stack Overflow - Model loading errors and solutions
LM Studio Memory Issues - GitHub - Out of memory and large context troubleshooting
Model Loading Problems - GitHub - Loading hang issues and fixes
LM Studio Bug Fixes Update - Official troubleshooting updates and solutions
AMD GPU Troubleshooting - AMD-specific GPU issues and solutions

Security Best Practices Summary

Following these practices will help ensure your local AI setup maintains the privacy and security you are seeking.

Before Using with Sensitive Data

Always test offline capability first: Use the verification questions to confirm the model cannot access current information.
Start with public information: Test the model with non-sensitive questions before using it for confidential matters.
Monitor resource usage: Watch for unusual network activity, high CPU usage, or unexpected behavior.

During Regular Use

Use airplane mode when needed: For highly sensitive conversations, temporarily disable your internet connection.
Use Docker isolation for maximum security: If you are handling very confidential information, consider the Docker approach.
Keep models updated but verify changes: When you update LM Studio or download new models, repeat your security verification tests.

For Business or Professional Use

Document your security measures: Keep records of what verification steps you have taken.
Train team members: Make sure anyone using the system understands the security requirements.
Regular security audits: Periodically re-test your setup to ensure it still meets your security standards.
Have a backup plan: Know what you will do if your local setup fails and you need alternatives.

Conclusion

LM Studio provides an excellent way to run powerful AI models on your own computer while maintaining privacy and control. However, the privacy benefits are only real if you properly configure and verify your setup.

Key takeaways:

Local AI models can provide ChatGPT-like capabilities without sending data to external companies
Always verify that your "local" setup is truly offline before using it with sensitive information
Choose the right model size for your hardware - the 1.5B model works on almost any computer, while 7B models provide much better performance for those with adequate hardware
Different models offer different capabilities - choose based on your hardware and needs
Security requires ongoing vigilance, not just initial setup

Remember: The goal is not just to run AI models locally, but to do so in a way that actually protects your privacy and gives you the control you are seeking. Do not assume that "local" automatically means "private" - test and confirm that your setup meets your specific security requirements.

By following this guide and understanding both the capabilities and limitations of local AI models, you can make informed decisions about how to incorporate this powerful technology into your work while maintaining the privacy and security standards that matter to you.