Running models locally
#local-llms
#placeholder
This is a placeholder post for the local LLMs topic. Replace it with real content when ready.
What goes here
Posts about running language models on your own hardware. Quantization, inference engines, VRAM budgets, benchmarks, and the tradeoffs between running locally versus calling an API.
Ideas for first real posts:
- Ollama vs llama.cpp vs vLLM: when to use what
- How much VRAM you actually need for different model sizes
- Quantization formats compared: GGUF, GPTQ, AWQ, EXL2