AI learning workspace
Avenire
In Progress 2
Review 2
Deep Tutor: Transformer Self-AttentionIn Progress
Explain how self-attention works in transformer models using my uploaded notes
Add a follow-up...
Apollo 1
Memory ›ML > attention-notes.md
Apollo 1
Transformer Attention — Concept Graph
Self-Attention Core Idea
Each token computes relevance with every other token, enabling dynamic context-dependent representations.
Scaling Insight
Dividing by √dₖ prevents large dot-product magnitudes that would saturate softmax and harm gradients.
Study Artifacts
Plot Preview
Attention distribution evolves from diffuse to focused representations.
3 Tasks
Summarize Q/K/V geometrically
Compare dot-product vs cosine similarity
Generate derivation flashcards
Add a task, ⌘K to generate...
Built for thinkers, builders, and curious minds.