书名：Reinforcement Learning with TensorFlow
作者名：Sayon Dutta
本章字数：370字
更新时间：2025-02-23 17:35:22

Long Short Term Memory Networks

RNNs practically fail to handle long term dependencies. As the gap between the output data point in the output sequence and the input data point in the input sequence increases, RNNs fail in connecting the information between the two. This usually happens in text-based tasks such as machine translation, audio to text, and many more where the length of sequences are long.

Long Short Term Memory Networks, also knows as LSTMs (introduced by Hochreiter and Schmidhuber), are capable of handling these long-term dependencies. Take a look at the image given here:

The key feature of LSTM is the cell state . This helps the information to flow unchanged. We will start with the forget gate layer, which takes the concatenation of of last hidden state, and as the input and trains a neural network that results a number between 0 and 1 for each number in the last cell state , where 1 means to keep the value and 0 means to forget the value. Thus, this layer is to identify what information to forget from the past and results what information to retain.

Next we come to the input gate layer and tanh layer whose task is to identify what new information to add in to one received from the past to update our information, that is, the cell state. The tanh layer creates vectors of new values, while the input gate layer identifies which of those values to use for the information update. Combining this new information with information retained by using the the forget gate layer, ,to update our information, that is, cell state :

Thus, the new cell state is:

Finally, a neural network is trained at the output gate layer, , returning which values of cell state to output as the hidden state,:

Thus, an LSTM Cell incorporates the last cell state , last hidden state and current time step input , and outputs the updated cell state and the current hidden state.

LSTMs were a breakthrough as people were able to benchmark remarkable outcomes with RNNs by incorporating them as the cell unit. This was a great step towards the solution for issues concerned with long term dependencies.