Sequence-to-Sequence with RNNs
![Screen Shot 2022-07-21 at 8.29.26 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e62692e0-7b0c-452a-a815-f94163454c2a/Screen_Shot_2022-07-21_at_8.29.26_PM.png)
Context Vector
- Transfer info between encoding and decoding sequence
- Summarize all the info the decoder needs to generate the sentence
Problem - Input sequence bottlenecked through fixed-sized vector
- Idea: Use a new context vector at each step of decoder
Sequence-to-Sequence with RNNs and Attention
![Screen Shot 2022-07-21 at 8.41.14 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e94e87b8-1d10-4f12-a2f4-9e2a38429c36/Screen_Shot_2022-07-21_at_8.41.14_PM.png)
Getting the Context Score for each step
Using Different Context vector in each time step of decoder
- Input sequence not bottlenecked through single vector
- At each time step of decoder, context vector “looks at” different parts of the input sequence
Compute (scalar) Alignment Scores
- Current hidden state of decoder, and one of hidden state of encoder
- How much should we attend to the each hidden state of encoder given the current hidden state of decoder?
$$
e_{t,i}=f_{att}(s_{t-1},h_i)\quad f_{att}\,is\,MLP
$$
-
Normalize alignment scores to get attention weights
$$
0<a_{t,i}<1, \sum_ia_{t, i} = 0
$$
-
Compute context vector as linear combination of hidden states
$$
c_t = \sum_ia_{t,i}h_i
$$