WideResNet 28-2

To keep it short, we use the following shorthands.

A Conv layer is annotated with a rounded rectangle with:
- shape of the kernel _x_
- number of channels _C
- stride _S
- padding _P
BN and ReLU are annotated with a circle at the end of a line ---o
A node that is doubly circled is a data block, it's not an operation, it's just to indicate the current data's channel count.
A (+) indicates an element-wise addition.
Other operations are annotated with a rectangular box.

Model Diagram

Block Repeats

Blocks above annotated with x (N - 1) are repeated N - 1 times. N - 1 is because the first block is already counted in the Group Projection.

Projection Layers

Projection Layers are Convolutional Layers with a kernel size of 1x1. They are mainly used to manipulate the number of channels.

In the above diagram, projection layers are used in the Group X projection blocks. They are necessary as a skip connection will not work if the number of channels added are different.

Strides

In some convolutional layers, strides of 2 are used to reduce the size of the output. This is done on Group 1 and 2

Model Diagram Pseudocode WideResNet 28-2

Conv torch.Size([16, 3, 3, 3]) Stride 1 Pad 1 Group 0 Block 0 BN -> ReLU = X Conv torch.Size([32, 16, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 = Z X -> Conv torch.Size([32, 16, 1, 1]) Stride 1 Pad 0 Z + X Block 1 BN -> ReLU = X Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 2 BN -> ReLU = X Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 3 BN -> ReLU = X Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([32, 32, 3, 3]) Stride 1 Pad 1 = Z Z + X Group 1 Block 0 BN -> ReLU = X Conv torch.Size([64, 32, 3, 3]) Stride 2 Pad 1 BN -> ReLU Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 = Z X -> Conv torch.Size([64, 32, 1, 1]) Stride 2 Pad 0 Z + X Block 1 BN -> ReLU = X Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 2 BN -> ReLU = X Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 3 BN -> ReLU = X Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([64, 64, 3, 3]) Stride 1 Pad 1 = Z Z + X Group 2 Block 0 BN -> ReLU = X Conv torch.Size([128, 64, 3, 3]) Stride 2 Pad 1 BN -> ReLU Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 = Z X -> Conv torch.Size([128, 64, 1, 1]) Stride 2 Pad 0 Z + X Block 1 BN -> ReLU = X Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 2 BN -> ReLU = X Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 = Z Z + X Block 3 BN -> ReLU = X Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 BN -> ReLU Conv torch.Size([128, 128, 3, 3]) Stride 1 Pad 1 = Z Z + X BN -> ReLU AvgPool 8 Stride 1 Pad 0 View Linear torch.Size([10, 128])

Last modified: 26 November 2023