alexnet-head
An architectural pattern where the final layers of a convnet are dense, large layers (often taking up a bulk of the model's parameter count), finally fed into a softmax to create a multinomial distribution on the resulting labels conditioned on the input. Here is an example:
nn.Flatten(),
nn.Linear(final_out_dim, dense_mid_dim),
nn.ReLU(),
nn.Linear,(dense_mid_dim, dense_mid_dim),
nn.ReLU(),
nn.Linear(dense_mid_dim, num_labels),
nn.ReLU(),
nn.Softmax()