Model architectures overview

Neural network diagram with input, hidden, and output layers

This is a placeholder post for the models topic. Replace it with real content when ready.

What goes here

Posts about model architectures, training, fine-tuning, evaluation, and the research side of ML. Transformers, diffusion models, mixture of experts, and whatever comes next.

Ideas for first real posts:

The attention mechanism explained without the usual hand-waving
Fine-tuning versus prompting: where the line is now
Reading ML papers without a PhD