vgg-rule

A simple 'rule' actually coined in the resnet paper for the design of convolutional stacks: "the convolutional layers mostly have 3x3 filters and follow two simple design rules: (i) for the same output feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by convolutional layers that have a stride of 2." It was, however, named after VGGnet because it was inspired by that paper (contra alexnet).

So in the case that we have a stride of $2$ in our convolution, or we feed the output into a max pool of stride $2$, we would double the number of filters/kernels/features in our next convolution. To break with the 'VGG' rule can be thought of as a decision to throw away some time complexity.

vgg-rule

Models which express this meme