Lecture 13 Attention

Sequence-to-Sequence with RNNs

Screen Shot 2022-07-21 at 8.29.26 PM.png

Context Vector

Problem - Input sequence bottlenecked through fixed-sized vector

Screen Shot 2022-07-21 at 8.41.14 PM.png

Using Different Context vector in each time step of decoder

Input sequence not bottlenecked through single vector
At each time step of decoder, context vector “looks at” different parts of the input sequence

Compute (scalar) Alignment Scores

Current hidden state of decoder, and one of hidden state of encoder
How much should we attend to the each hidden state of encoder given the current hidden state of decoder?

$$ e_{t,i}=f_{att}(s_{t-1},h_i)\quad f_{att}\,is\,MLP $$

Normalize alignment scores to get attention weights

$$ 0<a_{t,i}<1, \sum_ia_{t, i} = 0 $$
Compute context vector as linear combination of hidden states

$$ c_t = \sum_ia_{t,i}h_i $$