Not aligned: Even when models represent similar concepts, their channels are not directly comparable. Distances between features from different models (DINO, CLIP) are not meaningful.
No descriptor: What does the model hidden units mean? plotting the 768-dimensional feature vector is not informative. We need to find a feature space that each hidden unit has a meaningful descripter.
To enable joint analysis across models, we align feature channels into a shared representation space.
We use the human visual cortex as a universal reference frame. Given an image:
Once trained, features from different models to be expressed in the same brain-referenced space.
After transforming features to the brain-referenced space, features from different models can be compared.
Cosine similarity between features from different models (DINO, CLIP) in the brain-referenced space.
Each hidden unit in the brain-referenced space has a meaningful descriptor, e.g., there's low-level brain regions (V1), body-related regions (EBA), face-related regions (FFA).
Once channels are aligned:
One of the most consistent patterns discovered by AlignedCut is figure–ground separation. In CLIP, DINO, and MAE, foreground and background pixels cluster into distinct spectral groups.
Across models, the same visual concept Figure-Ground correspond to similar brain activation patterns on brain space.
Beyond figure–ground separation, later layers exhibit category-specific clusters.
Across models, the same visual concept Category correspond to similar brain activation patterns on brain space.
AlignedCut also allows us to visualize how representations evolve through layers in the network. By embedding tokens from all layers into the same brain-referenced space, we can track trajectories of the tokens across layers.