Comparing Shallow and Deep Neural Networks: A Detailed Analysis
When designing neural networks, one common decision is whether to use a shallow or deep architecture. Both have their advantages and disadvantages, depending on the task at hand. In this post, we’ll compare two neural networks that map a scalar input x to a scalar output y: a shallow network with one hidden layer and a deep network with multiple hidden layers.
The Neural Networks
Shallow Network
- Structure: Single hidden layer
- Hidden Units: 95
Deep Network
- Structure: 10 hidden layers
- Hidden Units per Layer: 5
Calculating Parameters
Shallow Network
- Input to Hidden Layer:
- Weights: 1×95=95
- Biases: 95
- Total: 95+95=190
- Hidden Layer to Output Layer:
- Weights: 95×1=95
- Biases: 1
- Total: 95+1=96
- Total Parameters:
190+96=286
Deep Network
- Input to First Hidden Layer:
- Weights: 1×5=5
- Biases: 5
- Total: 5+5=10
- Hidden Layers to Hidden Layers:
- For 9 connections (between 10 layers):
- Weights: 5×5=25
- Biases: 5
- Total per connection: 25+5=30
- Total for all connections: 9×30=270
- Last Hidden Layer to Output Layer:
- Weights: 5×1=5
- Biases: 1
- Total: 5+1=6
- Total Parameters:
10+270+6=286
Both networks have the same number of parameters: 286.
Linear Regions
Shallow Network
- Number of Linear Regions: 295 (a very large number)
Deep Network
- Number of Linear Regions: 5^10=9,765,625
The deep network, despite having fewer units per layer, can create significantly more linear regions due to its multiple layers.
Runtime Performance
Shallow Network
- Likely to run faster due to fewer layers and operations per forward pass.
Deep Network
- May be slower due to more layers and operations, despite having the same number of parameters.
Conclusion
- Shallow Network: Runs faster but may be limited in modeling complex functions.
- Deep Network: More powerful in terms of creating linear regions and modeling complexity but may run slower.
In summary, the choice between a shallow and deep neural network depends on the specific requirements of your task. If speed is crucial, a shallow network might be preferable. If modeling complex patterns is more important, a deep network could be the better choice.