AI learning workspace

Avenire

In Progress 2

Review 2

Deep Tutor: Transformer Self-AttentionIn Progress

Explain how self-attention works in transformer models using my uploaded notes

Add a follow-up...

Apollo 1

Memory ›ML > attention-notes.md

Apollo 1

Each token computes relevance with every other token, enabling dynamic context-dependent representations.

Dividing by √dₖ prevents large dot-product magnitudes that would saturate softmax and harm gradients.

Plot Preview Transformer attention head concentration graph

Attention distribution evolves from diffuse to focused representations.

Summarize Q/K/V geometrically

Compare dot-product vs cosine similarity

Generate derivation flashcards

Add a task, ⌘K to generate...

Built for thinkers, builders, and curious minds.

AI learning workspaceCollect.