Model architectures overview
#models
#placeholder
This is a placeholder post for the models topic. Replace it with real content when ready.
What goes here
Posts about model architectures, training, fine-tuning, evaluation, and the research side of ML. Transformers, diffusion models, mixture of experts, and whatever comes next.
Ideas for first real posts:
- The attention mechanism explained without the usual hand-waving
- Fine-tuning versus prompting: where the line is now
- Reading ML papers without a PhD