I reproduced a classic mech interp paper

2025-02-04 Mark Henry

Here is the classic visualization result in "Visualizing and Understanding Convolutional Networks", Zeiler and Fergus, 2013 (ZF2013):

I made a small neural network trained on MNIST with two conv layers. Implementation of ZF2013 is trivialized by torch's MaxUnpool2d module.

Here is the output of my deconvolutional visualization for layer 1, which has 8 features:

And here is layer 2, which has 16 features:

In the figure, there is a 3x3 group of images for each feature in the layer. Each group of 9 images shows the deconvolutional visuzlization for each of the top 9 input images that most strongly activated that feature.

Just as ZF2013 observed, we see a hierarchy of features. I am pleased with the diagonal lines on display in the layer 1 visualization, but layer 2 is cloudy and indistinct. I believe that the cloudiness in layer 2 is due to the fact that the network is trained on MNIST, which is a very simple dataset that doesn't take advantage of the capacity of the network.

My code is availble on github.