CUDA Programming Notes

GPU can be much faster than CPU for some computations such as matrix multiplications. They are intended to be fast for such computations. I find it very important to learn more how to utilize such a hardware to do advanced stuff and more optimizations. Especially when it comes to deep learning! Thus, this blog contains my notes about CUDA programming which I collected during my learning process.

Interesting Papers from ICASSP 2020 Conference

In May, I attended my first (virtual) conference which is “International Conference on Acoustics, Speech, and Signal Processing (ICASSP)”. I presented by paper entitled by Layer-normalized LSTM for Hybrid-HMM and End-to-end ASR where we investigated different variants of how to apply layer normalization inside the internal recurrency of the LSTM. In addition to that, there were some quite interesting papers (also relevant to my research) which I would like to highlight here.

Hidden Markov Model

First, we need to define what is a Markov model and why we have this additional word Hidden. Then I am going to explain the structure of HMM and how to compute the likelihood probability using the Forward algorithm. Moreover, I am going to explain the decoding Viterbi algorithm which is used to compute the most likely sequence. After that, I am going to dig into the mathematical details behind training HMMs using the Forward-Backward algorithm.

Let's Pay Attention

I am going first to explain briefly how recurrent neural networks work since it is a main component in the seq2seq models. Then, I am going to talk about the encoder-decoder architecture and what are its problems. After that, I am going to introduce the Attention model which lead to a huge improvement in the performance of these models. In the end, we look at some experiments and results.

Will Capsule Networks Replace CNNs?

In this blog post, I am going first to discuss the main problems of CNNs, then I will move into the capsule theory by discussing how capsules work and the main algorithm, the Dynamic Routing algorithm, behind this theory. Then, I will go over the CapsNet architecture by explaining its layers and finally we look up at some experiments and results.