Notoriously, learning with recurrent neural networks (RNNs) on long sequences is a difficult task. There are three major challenges: 1) extracting complex dependencies, 2) vanishing and exploding gradients, and 3) efficient parallelization. In this paper, we introduce a simple yet effective RNN connection structure, the DILATEDRNN, which simultaneously tackles all these challenges. The proposed architecture is characterized by multi-resolution dilated recurrent skip connections and can be combined flexibly with different RNN cells. Moreover, the DILATEDRNN reduces the number of parameters and enhances training efficiency significantly, while matching state-of-the-art performance (even with Vanilla RNN cells) in tasks involving very long-term dependencies. To provide a theory-based quantification of the architecture’s advantages, we introduce a memory capacity measure – the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures. We rigorously prove the advantages of the DILATEDRNN over other recurrent neural architectures.
The light weighted demo demonstrates how to construct a multi-layer DilatenRNN with different cells, hidden structures and dilations for the task of permuted sequence classification on MNist. Although most of the code is straightforward, we would like to provide examples for different network constructions.
Below is an example that constructs a 9-layer DilatedRNN with vanilla RNN cells. The hidden dimension is 20. And the number of dilatation starts with 1 at the bottom layer and ends with 256 at the top layer.
cell_type = “RNN”
hidden_structs =  * 9
dilations = [1, 2, 4, 8, 16, 32, 64, 128, 256]
The current version of the code supports three types of cell: “RNN”, “LSTM”, and “GRU”. Of course, the code also supports the case where the dilation rate at the bottom layer is greater than 1 (as shown on the right hand side of figure 2 in our paper).
An example of constructing a 4-layer DilatedRNN with GRU cells is shown below. The number of dilations starts at 4. It is worth mentioning that, the number of dilations does not necessarily need to increase with the power of 2. And the hidden dimensions for different layers do not need to be the same.
cell_type = “GRU”
hidden_structs = [20, 30, 40, 50]
dilations = [4, 8, 16, 32]
Tested environment: Tensorflow 1.3 and Numpy 1.13.1.
That’s all for now and hope this repo is useful to your research. For any questions, please create an issue and we will get back to you as soon as possible.