← Local LLMs
Local LLMs

Running models locally

#local-llms #placeholder
Terminal window running ollama with a local model

Terminal window running ollama with a local model

This is a placeholder post for the local LLMs topic. Replace it with real content when ready.

What goes here

Posts about running language models on your own hardware. Quantization, inference engines, VRAM budgets, benchmarks, and the tradeoffs between running locally versus calling an API.

Ideas for first real posts:

  • Ollama vs llama.cpp vs vLLM: when to use what
  • How much VRAM you actually need for different model sizes
  • Quantization formats compared: GGUF, GPTQ, AWQ, EXL2