Your Unembedding Matrix Is Secretly a Feature Lens for Text Embeddings

Songhao Wu, Zhongxin Chen, Yuxuan Liu, Heng Cui, Cong Li, Rui Yan

EmbedFilter improves LLM-based text embeddings by filtering out the "edge spectrum" subspace that encodes uninformative high-frequency tokens.

Why do LLM-based text embeddings often perform poorly on zero-shot tasks, and how can we filter the unembedding matrix to improve them?

Large language models (LLMs) often perform poorly as zero-shot embedding models because their internal representations are biased toward frequent, semantically irrelevant tokens. This "representation collapse" forces embeddings into a narrow, anisotropic region of the vector space, masking nuanced semantic features. The authors identify that the model's unembedding matrix acts as a "feature lens" that actively writes these high-frequency tokens into the embedding space. They introduce EmbedFilter, a linear transformation that filters out this specific "edge spectrum" subspace to unmask the underlying semantic representations. This post-processing step consistently improves zero-shot performance across multiple LLM backbones, achieving up to a 14.1% gain on the Massive Text Embedding Benchmark (MTEB) while simultaneously enabling efficient dimensionality reduction.

Paper Primer

EmbedFilter works by isolating the "edge spectrum"—the singular vectors associated with the largest and smallest singular values of the unembedding matrix—and removing them from the embedding. This is like a noise-canceling filter that identifies the specific frequency band where "average" token bias lives and mutes it, leaving only the semantic bulk of the signal.

EmbedFilter significantly boosts zero-shot embedding performance without training.

Evaluations on the MTEB benchmark across Qwen, Llama, and Mistral backbones.

Why does the unembedding matrix cause this bias in the first place?

The unembedding matrix is designed to map hidden states to vocabulary probabilities; the authors show it encodes a latent subspace corresponding to an "average" token—a frequency-weighted representation of the training corpus—which pulls raw embeddings toward common, uninformative tokens.

How does this differ from standard dimensionality reduction or whitening techniques?

Unlike whitening, which typically requires a calibration dataset to compute statistics, EmbedFilter is a training-free, heuristic post-processing step that leverages the model's existing unembedding matrix parameters to identify and remove the biased subspace.

Researchers can now treat the LLM unembedding matrix as a diagnostic tool to refine embeddings, enabling high-performance, low-dimensional retrieval without the need for additional fine-tuning or complex prompt engineering.

Paper Primer

The abstract identifies noisy high‑frequency token bias and proposes EmbedFilter to clean LLM embeddings.

Large language models excel at zero‑shot tasks but underperform as off‑the‑shelf embedding generators because their vectors over‑represent high‑frequency tokens. By recognizing that the unembedding matrix actively injects these noisy components, the authors propose EmbedFilter—a simple linear projection that excises the edge spectrum subspace, thereby sharpening semantic features, shrinking dimensionality, and accelerating retrieval without sacrificing quality.

https://github.com/CentreChen/EmbFilter

Introduction

LLMs embed noisy high‑frequency components, and a simple filter can recover cleaner semantics.

Large language models excel on many benchmarks, yet when their hidden states are used directly as text embeddings they fall short of the quality needed for downstream retrieval and similarity tasks. Prior attempts such as prompt engineering only provide fragile, modest improvements, leaving a systematic gap between LLM capabilities and embedding performance.

Our analysis, powered by the Logit Lens, reveals that raw embeddings are biased toward high‑frequency tokens that carry little semantic content. This “representation collapse” is consistent across model families, suggesting a universal hidden subspace—named the Edge Spectrum Subspace—that steers vectors toward an average, frequency‑weighted token. By linearly projecting out this subspace (the EmbedFilter), we can recover more faithful semantic features with negligible overhead.

LLM embeddings contain a hidden component that pulls them toward common, semantically weak tokens, obscuring the true meaning of the input.

Mechanistic Interpretability Tools

Background: key tools for probing embeddings and their semantics.

We first review how large language models produce text embeddings and the interpretability lenses we use to inspect them.

The unembedding matrix is the final linear layer that maps hidden states to vocabulary logits, i.e., it decides which token each hidden vector would predict.

The Logit Lens projects any intermediate hidden representation directly into the vocabulary space, showing which tokens that layer would predict if it were the final layer.

**Figure 1.** Logit Lens applied to text embeddings from three LLM backbones. Word clouds show the top-aligned tokens with the highest decoding probabilities, which are primarily high-frequency yet semantically uninformative. The input text, encoded by the text embeddings, is given as: "We call this a ‘lens’ because it is one way of extracting information from GPT’s internal activations. I imagine there is other information present in the activations that cannot be understood by looking at logits over tokens. The logit lens show us some of what is going on, not all of it." This corresponds to the official notation of the logit lens.

Discovery of the Edge Spectrum

We expose the edge‑spectrum subspace that drives high‑frequency token bias in LLM embeddings.

LLM embeddings are highly anisotropic, clustering in a narrow subspace, and they tend to align with frequent, low‑semantic tokens. This suggests that the dominant subspace encodes the edge‑spectrum of high‑frequency tokens, motivating its isolation.

We evaluate models from Qwen‑2.5 (0.5 B) to Mistral‑v0.3‑Instruct (7 B) and Llama‑3.1‑Instruct (8 B), using the RedPajama corpus to approximate the true word‑frequency distribution $p$ with $\hat{p}$.

Treating the unembedding matrix $U$ and the projection $W$, the logits for a token satisfy $\mathbf{h}W^{\top}U = \log(\mathbf{q}) + b$. Using the Moore–Penrose pseudo‑inverse $W^{+}$ we solve for the hidden state that would produce the empirical frequency $\hat{p}$, yielding the average token embedding $\hat{\mathbf{h}} = \log(\hat{p})\,W^{+}U$.

A thin band of singular directions at the extremes of the $WU$ spectrum that disproportionately amplifies logits of frequent tokens, while contributing little to semantic variation.

Projection onto the edge spectrum: $\mathbf{p}= (\mathbf{h}\!\cdot\!e_3)\,e_3 + (\mathbf{h}\!\cdot\!e_4)\,e_4 = [0,\,0,\,3,\,4]$.

Removing the edge component: $\mathbf{h}^{\text{filtered}} = \mathbf{h} - \mathbf{p} = [1,\,2,\,0,\,0]$.

Assuming a frequent token’s logit is proportional to the dot product with $\mathbf{h}$, the original logit is $3+4=7$; after filtering it drops to $0$, illustrating the edge spectrum’s boost.

Eliminating the low‑energy edge directions removes the disproportionate boost given to high‑frequency tokens while preserving the core semantic components (the first two dimensions).

**Figure 2.** $\Delta\pi$ distribution for Qwen, Llama and Mistral.

The EmbedFilter Method

EmbedFilter linearly removes the edge spectrum to sharpen text embeddings.

After identifying the edge spectrum subspace, we now describe how EmbedFilter excises it with a single linear step.

EmbedFilter projects each embedding onto the middle singular vectors of the unembedding matrix, discarding the extreme directions that carry noisy, high‑frequency token information.

How does EmbedFilter differ from a standard low‑rank PCA projection that also drops extreme components?

PCA would keep the top singular vectors, whereas EmbedFilter deliberately discards them and retains the middle spectrum; this flips the usual variance‑preserving intuition to suppress high‑frequency token influence instead of preserving it.

Extract the bulk matrix $V[l_{\tau}:r_{\tau}] = \begin{bmatrix}0&1&0\\0&0&1\\0&0&0\end{bmatrix}$ (columns 2‑4 of $I_{6}$).

Form the projector $\Phi_{\tau}=V[l_{\tau}:r_{\tau}]\,V[l_{\tau}:r_{\tau}]^{\top}$, which zeroes out dimensions 1, 5, 6 and leaves dimensions 2‑4 unchanged.

Apply $\hat{e}=e\Phi_{\tau}^{\top}$: the result is $(0,\,2,\,3,\,4,\,0,\,0)$ – the edge dimensions are removed.

Compute the distance between two toy embeddings $e^{(1)}=(1,2,3,4,5,6)$ and $e^{(2)}=(6,5,4,3,2,1)$ after filtering: $\|e^{(1)}\Phi_{\tau}^{\top}-e^{(2)}\Phi_{\tau}^{\top}\|_{2}= \sqrt{(2-5)^2+(3-4)^2+(4-3)^2}= \sqrt{9+1+1}= \sqrt{11}$, identical to the distance computed directly on the retained dimensions.

EmbedFilter removes only the edge dimensions; the remaining bulk dimensions retain exact Euclidean geometry, guaranteeing that similarity‑based downstream tasks are unaffected.

**Figure 3.** Re-running logit lens analysis in Section 1 with text embeddings refined by EmbedFilter. Top-6 tokens from logit lens are displayed, with colored entries indicate tokens that have literal connections with the input text. EmbedFilter suppresses the expression of frequent tokens and enhances the semantic richness of text embeddings.

By excising the edge spectrum, EmbedFilter yields embeddings that are both cleaner and amenable to aggressive dimensionality reduction without sacrificing task performance.

Experimental Setup

We evaluate EmbedFilter on the MTEB benchmark across three LLM backbones.

We built the evaluation pipeline on the official MTEB implementation, reporting each task’s standard metric.

The benchmark aggregates a suite of downstream tasks—semantic similarity, classification, clustering, and retrieval—to measure how well a text embedding captures useful semantics.

We run EmbedFilter on three popular LLM backbones, varying the reduction factor $τ$ to 2, 4, and 8, which scales the output dimensionality to $1/τ$ of the original size.

On Qwen‑2.5‑05B, EmbedFilter yields the largest relative gain at $τ\!=\!8$, improving the overall MTEB score by +9.0%.

Table 1 shows a +9.0% increase (from 68.81 to 74.41) for the $\tau$=8 column, the highest improvement among all $\tau$ settings for this model.

Llama‑3.1‑8B‑Instruct benefits most at $τ\!=\!2$, where EmbedFilter raises the MTEB score by +7.8%.

The $\tau$=2 column in Table 1 records a +7.8% jump (from 53.47 to 57.70), the strongest lift for the Llama backbone.

For Mistral‑7B‑Instruct‑V0.3, the $\tau$=4 setting delivers a +5.8% improvement over the vanilla baseline.

In Table 1 the $\tau$=4 column rises from 51.43 to 57.32, a +5.8% gain that stands out among the Mistral results.

Main Results

EmbedFilter removes edge‑spectrum noise, yielding stronger text embeddings.

EmbedFilter removes the edge‑spectrum subspace from the unembedding matrix, yielding cleaner semantic embeddings.

EmbedFilter boosts the MTEB average score by up to 14% across all evaluated setups.

Table 1 reports the performance gains for PromptEOL and ECHO configurations.

Ablation Studies

We examine how the filtering ratio and design choices affect EmbedFilter’s performance.

The hyperparameter $τ$ controls the filtering ratio: embeddings are shrunk to $1/τ$ of their original size, which cuts index storage by the same factor and yields an approximate $τ$‑fold speedup in similarity search.

Even at a high filtering ratio $τ$ = 8, EmbedFilter remains competitive on the Massive Text Embedding Benchmark (MTEB), confirming that the gains are not solely due to dimensionality reduction.

Applying EmbedFilter to Llama‑3.1‑8B‑Instruct reduces the representation size while improving downstream scores, surpassing strong baselines such as SimCSE and coCondensor despite the smaller dimension.

We ablate five filtering configurations on Qwen2.5‑0.5B with PromptEOL ($\tau$ = 2). Config 1 truncates the first half of dimensions (Matryoshka), and Config 2 randomly drops half; both underperform the baseline.

Configs 3–5 target singular subspaces: dominant, secondary, and bulk. EmbedFilter (which removes the secondary subspace) yields the best downstream results, while the inverse operation (Config 5) performs worst. Config 4 (bulk filtering) markedly outperforms Config 5, aligning with the $\Delta$$\pi$ analysis that the secondary subspace encodes frequent tokens more strongly than the dominant one.

Comparison and Discussion

We compare EmbedFilter against whitening and other calibration baselines on MTEB.

Recall that the unembedding matrix contains an edge spectrum subspace that injects noisy high‑frequency components; EmbedFilter projects away this subspace to yield cleaner embeddings.

Whitening decorrelates embedding dimensions by scaling them to unit variance, turning an anisotropic embedding space into an isotropic one.

EmbedFilter outperforms whitening without any calibration data, akin to cleaning a blurry photo without a reference image while whitening relies on a sample palette for adjustment.

**Table 6.** MTEB results for EmbedFilter and whitening. Best results are highlighted in bold.

Appendices

Supplementary details, prompts, figures, and proofs that support the main text.

We examined why large language models underperform on zero‑shot text‑embedding tasks, identified an “edge spectrum” subspace in the unembedding matrix that injects high‑frequency noise, and introduced EmbedFilter to linearly remove that subspace. Applying EmbedFilter consistently improves zero‑shot performance and reduces embedding dimensionality, which cuts storage costs and speeds up retrieval.

This work was funded by Lenovo Group; we thank Ang Lv for writing suggestions, Yuhan Liu and Yankai Lin for computational resources, and the anonymous KDD reviewers for their constructive feedback.

The bibliography below lists all cited works, ranging from foundational embedding studies to recent large‑model benchmarks.

Section A details the experimental protocol: we evaluate every task in the Massive Text Embedding Benchmark (MTEB), including STS, Classification, Clustering, Pair Classification, Re‑ranking, Retrieval, and Summarization. For each model we use the PromptEOL and ECHO templates shown in Figure code2, and we offset indices for Mistral‑7B‑Instruct‑V0.3 until ℓ$\tau$ = 128.

Section B provides the equivalence‑transformation proof. Starting from the projection matrix $\Phi_{\tau}=V[l_{\tau}:r_{\tau}]\,V[l_{\tau}:r_{\tau}]^{\top}$, we define $V_{\tau}=V[l_{\tau}:r_{\tau}]$ and show that $\|x\Phi_{\tau}^{\top}\|^{2}=\|xV_{\tau}-yV_{\tau}\|^{2}$ by letting $z=x-y$ and using the identity $V_{\tau}V_{\tau}^{\top}=I$.

### PromptEOL **Qwen** `Summarize the sentence: "{text}" in one word:` **Llama** `Summarize the sentence: "{text}" in one word:` **Mistral** `This sentence: "{text}" means [MASK]`

**Figure 4.** $\Delta\pi$ distribution for high-frequency, low-frequency and randomly sampled tokens on the Qwen model.

**Table 7.** Evaluation metrics used for MTEB tasks.

This table maps various task categories to their corresponding evaluation metrics.

Questions & answers

What is the main contribution of this paper?

The paper introduces EmbedFilter, a training-free linear projection that removes the 'edge spectrum subspace' from LLM hidden-state embeddings by leveraging the model's own unembedding matrix, improving zero-shot text embedding quality by up to 14.1% on the Massive Text Embedding Benchmark (MTEB) without any fine-tuning or calibration data.

What problem does EmbedFilter address?

LLMs perform poorly as zero-shot embedding models because their internal representations are biased toward frequent, semantically irrelevant tokens—a phenomenon the authors call 'representation collapse'—which forces embeddings into a narrow, anisotropic region of the vector space and masks nuanced semantic features.

Why do raw LLM embeddings suffer from representation collapse?

The unembedding matrix encodes a latent subspace corresponding to an 'average' token—a frequency-weighted representation of the training corpus—which pulls raw embeddings toward common, uninformative tokens, causing them to cluster in a narrow, anisotropic region.

How does EmbedFilter work technically?

EmbedFilter identifies the 'edge spectrum'—the singular vectors associated with the largest and smallest singular values of the unembedding matrix—and linearly projects them out of the embedding, retaining only the middle-spectrum singular subspace that encodes semantic content.

What is the 'edge spectrum subspace' and how is it identified?

The edge spectrum subspace is the dominant and secondary singular-vector subspace of the unembedding matrix U that encodes high-frequency token bias; it is identified by computing the average token embedding as ĥ = log(p̂) W⁺U using the Moore–Penrose pseudo-inverse of the projection W and the empirical token frequency distribution p̂ approximated from the RedPajama corpus.

How does EmbedFilter differ from standard PCA or whitening?

Unlike PCA, which retains the top (highest-variance) singular vectors, EmbedFilter deliberately discards the extreme singular vectors and retains the middle spectrum, inverting the usual variance-preserving intuition. Unlike whitening, EmbedFilter requires no calibration dataset and uses only the model's existing unembedding matrix parameters.

What datasets and benchmarks were used to evaluate EmbedFilter?

Evaluation was conducted on the Massive Text Embedding Benchmark (MTEB), covering all task types including STS, Classification, Clustering, Pair Classification, Re-ranking, Retrieval, and Summarization, using the official MTEB implementation and reporting each task's standard metric.

Which LLM backbones were tested?

The paper evaluates EmbedFilter on models ranging from Qwen-2.5 (0.5B) to Mistral-v0.3-Instruct (7B) and Llama-3.1-Instruct (8B), using PromptEOL and ECHO prompt templates.

What are the key quantitative results?

EmbedFilter achieves up to a 14.1% gain on MTEB across tested LLM backbones. Applied to Llama-3.1-8B-Instruct, it surpasses strong baselines such as SimCSE and coCondensor even at reduced embedding dimensionality. At the highest filtering ratio τ=8 (reducing embeddings to 1/8 of original size), EmbedFilter remains competitive on MTEB.

What does the hyperparameter τ control, and what are its practical effects?

The hyperparameter τ controls the filtering ratio: embeddings are reduced to 1/τ of their original dimensionality, cutting index storage by the same factor and yielding an approximate τ-fold speedup in similarity search. The paper tests τ values of 2, 4, and 8.

What do the ablation studies reveal about which subspace to filter?

Ablations on Qwen2.5-0.5B with PromptEOL (τ=2) show that removing the secondary singular subspace (EmbedFilter's approach) yields the best downstream results, while removing the bulk subspace performs worst; simple truncation (Matryoshka-style) and random dimension dropping both underperform the baseline, confirming that targeting the specific edge spectrum is critical.

What are the limitations of EmbedFilter as acknowledged in the paper?

The paper describes EmbedFilter as a heuristic post-processing step and does not claim it is theoretically optimal. The paper does not extensively discuss failure cases, generalization to all model families, or tasks beyond those in MTEB.

How does EmbedFilter compare to prior approaches like prompt engineering?

Prior approaches such as prompt engineering provide only fragile, modest improvements to LLM embedding quality, whereas EmbedFilter is a systematic, training-free linear transformation that consistently improves performance across multiple model families and task types on MTEB.

Is EmbedFilter training-free and how can it be reproduced?

Yes, EmbedFilter is training-free and requires no additional fine-tuning or calibration data; it uses only the model's existing unembedding matrix to compute the projection matrix Φ_τ = V[l_τ:r_τ] V[l_τ:r_τ]^⊤, where V contains the middle-spectrum singular vectors of the unembedding matrix.

What interpretability tool is used to diagnose the embedding bias?

The paper uses the Logit Lens, a mechanistic interpretability tool, to reveal that raw LLM embeddings are biased toward high-frequency tokens with little semantic content, motivating the identification and removal of the edge spectrum subspace.

Who funded this work and where was it published?

This work was funded by Lenovo Group, and the paper acknowledges KDD reviewers, suggesting it was submitted to or published at the KDD conference. The paper does not specify the exact publication year beyond what is inferable from the arXiv identifier (2606.07502).

Key terms

EmbedFilter: A training-free linear post-processing method that removes the edge spectrum subspace from LLM embeddings using the model's unembedding matrix to improve semantic quality and reduce dimensionality.
unembedding matrix: A weight matrix in a language model that maps internal hidden states to vocabulary token probabilities, used here as a diagnostic lens to identify and remove frequency-biased subspaces from embeddings.
edge spectrum subspace: The subspace of the unembedding matrix spanned by singular vectors associated with the largest and smallest singular values, which the paper shows encodes high-frequency token bias that degrades embedding quality.
representation collapse: A phenomenon where LLM embeddings cluster in a narrow, anisotropic region of the vector space due to over-representation of frequent, semantically uninformative tokens.
anisotropic embeddings: Embeddings that are not uniformly distributed across the vector space but instead concentrated along certain directions, reducing their ability to distinguish between different semantic meanings.
Logit Lens: A mechanistic interpretability technique that projects intermediate LLM hidden states through the unembedding matrix to inspect which tokens the model is predicting at each layer.
average token embedding: A theoretical hidden state ĥ = log(p̂) W⁺U that would produce the empirical token frequency distribution, representing the frequency-weighted 'average' of the training corpus that biases raw embeddings.
MTEB (Massive Text Embedding Benchmark): A comprehensive benchmark for evaluating text embedding models across diverse tasks including semantic textual similarity, classification, clustering, retrieval, and summarization.
PromptEOL: A prompt template used to extract embeddings from LLMs, evaluated in this paper as one of two prompting strategies alongside the ECHO template.
ECHO template: A prompt template used to elicit text embeddings from LLMs, used alongside PromptEOL in the paper's experimental evaluation.
filtering ratio τ: A hyperparameter in EmbedFilter that controls how aggressively the embedding is compressed, reducing the output dimensionality to 1/τ of the original size.
Moore–Penrose pseudo-inverse: A generalization of the matrix inverse that provides the best least-squares solution for non-square or singular matrices, used here to solve for the average token hidden state from the unembedding matrix.
SimCSE: A contrastive learning method for sentence embeddings used as a baseline comparison in the paper's experiments.
coCondensor: A dense retrieval pre-training method used as a baseline comparison in the paper's experiments.
RedPajama corpus: A large open-source text dataset used in this paper to approximate the empirical token frequency distribution p̂ for computing the average token embedding.
Matryoshka representation: A dimensionality reduction approach that truncates the first portion of embedding dimensions, used as an ablation baseline (Config 1) that underperforms EmbedFilter.
zero-shot embedding: Using a pre-trained model's hidden states directly as text embeddings without any task-specific fine-tuning or training on embedding-specific objectives.

Read the original paper

Open the simplified reader on Paperglide

Browse all simplified papers