Recommended Local Models

Run AI skills entirely on your Mac. No API keys, no cloud, no cost. Ever.

This is an advanced feature for power users comfortable with installing local software. VoxChimp works great out of the box with cloud providers too.

What are AI skills?

VoxChimp's AI skills let you do more than just dictate. Speak a command and the AI writes SOAP notes, drafts emails, summarises text, translates, rewrites for tone, and more. Skills need an AI model to work. You can use a paid cloud provider, or run one locally for free.

Total privacy

With a local model, your voice commands and text never leave your Mac. No data sent to OpenAI, Google, or anyone else. Perfect for medical notes, legal work, or anything confidential.

Zero ongoing cost

Cloud providers charge per token. The more you use skills, the more you pay. Local models are completely free. Use them as much as you want, forever. No API keys, no billing surprises.

Works offline

No Wi-Fi? No problem. Local models run entirely on your hardware. Use AI skills on a plane, in a rural clinic, or anywhere without an internet connection.

Cloud providers still have their place. Services like Claude, GPT-4, and Gemini offer the highest quality output and require no local setup, just an API key. Local models are the best choice when privacy, cost, or offline access matter most to you.

Pick your model

These models work great with VoxChimp via LM Studio or Ollama. All free, all local.

Model	Size	Min RAM	Best For	Tool Calling	Arena Elo
Gemma 4 31B New	31B	24 GB	Reasoning + agent tasks	Excellent	1452
Gemma 4 26B MoE New	26B	16 GB	Efficient reasoning (fewer active params)	Excellent	1441
Qwen 3.5 27B	27B	16 GB	Complex reasoning + search	Excellent	1450
Qwen 2.5 7B Instruct	7B	8 GB	General + search	Excellent	-
Gemma 3 12B	12B	16 GB	Fast reasoning	Good	-
Llama 3.1 8B Instruct	8B	16 GB	General purpose	Good	-
Phi-4 Mini 3.8B	3.8B	8 GB	Lightweight tasks	Good	-
Nemotron 3 Nano 4B	4B	8 GB	Lightweight + search	Good	-

Gemma 4 by Google DeepMind

Open-weight models built for reasoning and agent tasks, not just chat. The 31B model ranks #3 on Arena AI, outperforming models up to 20x larger. The 26B MoE variant ranks #6 and uses fewer active parameters per step, making it more efficient on consumer hardware.

VoxChimp uses smart pre-search for local models, so web search works even without native tool calling. The "Tool Calling" column reflects each model's general capability.

Get started in minutes

Pick a runner, download a model, and connect it to VoxChimp. That's it.

Setting up LM Studio

Recommended for beginners. A polished Mac app for running AI models locally

LM Studio is a free desktop app that lets you browse, download, and run open-source AI models with a friendly interface. It handles all the technical details (quantisation, GPU offloading, server configuration) so you can focus on picking a model and using it. Think of it as an app store for local AI.

1 Download LM Studio from lmstudio.ai and install it
2 Search for a model (we recommend Qwen 2.5 7B Instruct for most users) and download it
3 Go to the Developer tab and start the local server (runs on localhost:1234)
4 In VoxChimp, open Settings > Agent and select "LM Studio (Local)" as your AI provider
5 Enter the model name exactly as shown in LM Studio, then hit Test Connection

Setting up Ollama

Built by Meta. Lightweight, fast, and runs from the command line

Ollama is a free, open-source tool built by Meta's AI team that makes running large language models dead simple. It runs quietly in the background on your Mac and serves models through a local API. VoxChimp's default local model (Llama 3.2) was created by Meta and is one of the best open models available. It punches well above its weight for a 3B parameter model.

1 Install Ollama: brew install ollama (or download from ollama.ai)
2 Pull a model: ollama pull llama3.2
3 Ollama runs automatically on localhost:11434. No extra setup needed
4 In VoxChimp, open Settings > Agent and select "Ollama (Local)"
5 Default model is llama3.2. Hit Test Connection and you're good to go

Free web search with SearXNG

Give your local AI access to live web results, no API key needed

1 Install Docker Desktop from docker.com/products/docker-desktop. It's free for personal use. Open it once to finish setup.
2 Download our ready-made setup files: searxng-setup.zip. Unzip anywhere on your Mac.
3 Open Terminal (in Applications > Utilities), navigate to the unzipped folder, and run:
docker compose up -d
This downloads and starts SearXNG, a private search engine running on your Mac.
4 In VoxChimp, open Settings > Web Search and select "SearXNG"
5 Done! SearXNG restarts automatically with Docker Desktop.

Don't want Docker? You can also use a paid search provider (Brave, Tavily, or SerpAPI) with just an API key. Configure it in the same settings panel.

Choosing the right model for your Mac

Match the model to your hardware for the best experience

8 GB RAM

Qwen 2.5 7B, Phi-4 Mini 3.8B, or Nemotron Nano 4B

16 GB RAM

Gemma 3 12B or Gemma 4 26B MoE (quantised)

24 GB RAM

Gemma 4 31B, Qwen 3.5 27B, or Gemma 4 26B MoE (full)

32+ GB RAM

Any model at higher quantisation (Q8_0) for best quality

Performance tips

Get the most out of your local setup.

Pick the right quant

Q4_K_M is the sweet spot for most users. Good quality, fits in less RAM. Q8_0 gives near-full quality at 2x the size. If RAM is tight, go Q4_K_S.

GPU offloading

Apple Silicon Macs use Metal automatically in both Ollama and LM Studio. More GPU layers = faster inference. On M-series chips, expect 2-3x speedup over CPU-only.

Benchmark your setup

In Ollama, run a model and check the eval rate. In LM Studio, the UI shows tok/s in real-time. Aim for 10+ tok/s for comfortable use, 20+ feels instant.

Ready to go local?

Download VoxChimp and connect your favourite model in minutes.

Download for macOS