Similarities among ML domains
| NN | Transformer | CNN |
|---|---|---|
| - | Multi-head | Multi-channel |
| - | Skip-connection | ResNet |
Progress of Natural Language Processing
| Model | Main Disadvantage | Solved by | How? |
|---|---|---|---|
| NN | Can’t handle dynamic length input | RNN | RNN can handle dynamic length input |
| RNN | Vanishing Gradient Problem | LSTM | LSTM can handle vanishing gradient problem |
| LSTM | Non parallelizable | Transformer | Transformer can parallelize the computation |
| Trasformer | losses sequentiality | Transformer | Positional Encoding |