The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

Alper Yıldırım

cs.CV, cs.AI, cs.LG

TL;DR

Researchers found that AI image classifiers identify objects primarily by the 'phase' or structural patterns of an image, rather than the 'magnitude' or intensity information, mirroring how human vision processes images.

Summary

In the early 1980s, researchers discovered that human beings can still recognize images even if the intensity information (magnitude) is removed, provided the structural information (phase) remains. This study investigates whether modern artificial intelligence models, specifically image classifiers, process visual data in the same way. To test this, the authors performed 'transplant' experiments where they swapped the phase information of one image with the magnitude information of another within the internal layers of various AI models. They then observed which image the AI identified. The results show that for most models tested, the AI's final prediction followed the phase information, suggesting that the model's 'identity' of an object is stored in its structural patterns. Even when the researchers deleted image-specific magnitude data, the models maintained their accuracy, confirming that magnitude is largely unnecessary for the AI to recognize the subject. While one specific architecture, ResNet-50, initially appeared to behave differently, the authors found that this was due to the way the model processes data through its internal activation functions. Once they accounted for these specific mathematical steps, the underlying pattern remained consistent. Ultimately, the study suggests that these AI models share a common way of encoding identity through phase, though they express this information differently depending on their specific design. This provides a clearer understanding of why different types of AI models, such as those based on convolutional networks versus attention mechanisms, perceive textures and shapes in distinct ways.

Abstract

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture--shape gap between CNNs and attention models.

Read the original paper

Read the full simplified version on Paperglide

Browse all simplified papers