Running models locally

Terminal window running ollama with a local model

This is a placeholder post for the local LLMs topic. Replace it with real content when ready.

What goes here

Posts about running language models on your own hardware. Quantization, inference engines, VRAM budgets, benchmarks, and the tradeoffs between running locally versus calling an API.

Ideas for first real posts:

Ollama vs llama.cpp vs vLLM: when to use what
How much VRAM you actually need for different model sizes
Quantization formats compared: GGUF, GPTQ, AWQ, EXL2