Your GFlowNet Secretly Learns an Optimal Transport Plan

Non-acyclic GFlowNets implicitly solve optimal transport problems by minimizing trajectory flow.

How do non-acyclic GFlowNets mathematically relate to the Optimal Transport (OT) problem, and can we use this connection to solve OT tasks?

Optimal transport (OT) requires moving probability mass between distributions while minimizing cost, but finding these transport plans in large combinatorial spaces is computationally expensive. This paper proves that non-acyclic Generative Flow Networks (GFlowNets) are mathematically equivalent to Kantorovich optimal transport: the GFlowNet's minimum-flow objective acts as a cost-minimizing transport plan, where the learned policy defines the coupling. Experiments confirm that GFlowNets recover exact OT solutions on tractable graphs and scale to complex environments like permutation spaces where traditional solvers fail.

Paper Primer

The core mechanism hinges on fixing the initial edge-flow distribution to match a source distribution $L(u)$. By minimizing the total flow in a non-acyclic graph, the GFlowNet objective reduces to a linear program that is dual to the Kantorovich OT problem, where the ground cost is defined by the shortest-path distance in the graph.

GFlowNet minimum-flow objectives are equivalent to Kantorovich optimal transport.

Theorem 3.2 establishes that the optimal flow $F^\star$ in a non-acyclic GFlowNet induces a coupling $\Pi$ that solves the OT problem between source $L$ and target $R$. The learned policy recovers optimal transport costs $OT^\star$ across various hypergrid and permutation environments.

The framework scales to high-dimensional combinatorial spaces.

In permutation environments of size $n=20$, where exact OT solvers are intractable, the GFlowNet successfully approximates the transport plan. Empirical $L_1$ error for fixed-point probabilities remains low (0.002) even as the state space grows.

Why use GFlowNets for OT instead of standard linear programming solvers?

Standard OT solvers require explicit cost matrices and struggle with large combinatorial spaces; GFlowNets use neural parameterization to learn the transport plan implicitly through local, feasible moves, enabling scalability to environments where the state space is too large to enumerate.

What is the role of the regularization coefficient $\lambda$ in this framework?

The parameter $\lambda$ controls the trade-off between the expected trajectory length (transport cost) and the accuracy of the sampled distribution; higher values bias the sampling to reduce path costs, while lower values prioritize exact reward matching.

Introduction and Motivation

We pose the problem of recasting non‑acyclic GFlowNets as an entropy‑regularized optimal transport task.

Generative Flow Networks (GFlowNets) generate structured objects by traversing a stochastic graph from a start state to a terminal sink. In non‑acyclic settings the flow on each edge must satisfy conservation at every intermediate node, and the expected trajectory length becomes a key efficiency metric. The paper asks whether these flow constraints can be interpreted as the marginal constraints of an optimal transport coupling, turning the learning objective into an entropy‑regularized OT problem.

By viewing the edge‑wise flow of a non‑acyclic GFlowNet as a transport plan, the network’s flow conservation exactly matches the marginal constraints of an optimal coupling, so learning the GFlowNet policy is equivalent to solving an entropy‑regularized optimal transport problem on the graph.

Assign flow $f_{s_0\to a}=0.6$ and $f_{s_0\to b}=0.4$ leaving $s_0$ (conservation satisfied).

At node $a$, forward flow $f_{a\to s_f}=0.6$; at node $b$, forward flow $f_{b\to s_f}=0.4$ (conservation satisfied at $a$ and $b$).

The resulting flow matrix $\{f_{s_0\to a},f_{s_0\to b},f_{a\to s_f},f_{b\to s_f}\}$ defines a transport plan moving $0.6$ mass along the $s_0\to a\to s_f$ path and $0.4$ along $s_0\to b\to s_f$.

The total transport cost is $0.6\cdot2 + 0.4\cdot2 = 2$, which is the optimal Kantorovich cost for this simple source‑target pair.

This toy example shows how edge flows encode a coupling that simultaneously satisfies flow conservation and minimizes the summed path costs, exactly as an optimal transport plan does.

The paper bridges GFlowNets and optimal transport.

Background and Definitions

We introduce non‑acyclic GFlowNets and the Optimal Transport framework that underpins them.

A GFlowNet is a stochastic sampler that walks a directed graph; “non‑acyclic” means the graph may contain cycles, so trajectories can revisit states before terminating.

Optimal Transport formalizes moving probability mass from one distribution to another at minimal cost, using a coupling that respects the marginals of both distributions.

The forward–backward policy equality (Equation 1) ensures that the distribution over sampled trajectories is consistent, and together with the reward matching condition (Equation 2) it enforces detailed balance across every edge.

In the non‑acyclic setting, the total expected flow $\sum_{s\in S}F(s)$ equals the expected trajectory length, so minimizing flow directly reduces sampling cost.

Theoretical Equivalence

Formalizing the link between GFlowNet flow constraints and optimal‑transport marginals.

Under Assumption 3.1 (mass‑balance $\sum_{u}L(u)=\sum_{x}R(x)=1$), the reduced GFlowNet minimum‑flow linear program (edge‑flow variables $F(s\to s')$ with constraints $F(s_0\to u)=L(u)$ and $F(x\to s_f)=R(x)$) attains the same optimal value as the Kantorovich optimal‑transport problem (8) between $L$ on $U$ and $R$ on $X$.

GFlowNet edge flows play the role of a transport plan: the amount of flow leaving a source state $u$ matches the source distribution $L(u)$, and the amount arriving at a target state $x$ matches the target distribution $R(x)$. Thus the flow‑conservation equations are exactly the marginal constraints of an optimal‑transport coupling.

Set the first‑step flows: $F(s_0\to u_1)=0.6$, $F(s_0\to u_2)=0.4$ (matching $L$).

Choose an optimal OT coupling $\Pi^\star$ (e.g., $\Pi_{u_1,x_1}=0.5$, $\Pi_{u_1,x_2}=0.1$, $\Pi_{u_2,x_1}=0.0$, $\Pi_{u_2,x_2}=0.4$) that satisfies the marginal constraints.

For each non‑zero $\Pi_{u,x}$, allocate its mass uniformly along a shortest path $\tau_{u,x}$; e.g., $\Pi_{u_1,x_1}=0.5$ adds $0.5$ flow to each edge of the path $u_1\to s_a\to s_b\to x_1$.

Summing contributions yields edge flows: $F(s_a\to s_b)=0.5+0.1+0.4=1.0$, $F(s_b\to x_1)=0.5$, $F(s_b\to x_2)=0.5$, etc.

Check sink constraints: $F(x_1\to s_f)=0.5$, $F(x_2\to s_f)=0.5$, matching $R$.

The constructed flow respects all GFlowNet constraints and reproduces the OT coupling, illustrating concretely how edge‑flow variables encode a transport plan.

**Figure 1.** Visualization of the solution to the GFlowNet LP problem and Kantorovich OT optimal plan on a hypergrid environment. OT induces optimal coupling that connects source and target states and how much mass is transfered between them. GFlowNet on the other hand builds a policy that samples target states starting from source states, thus inducing a coupling with respect to the shortest path metric. Both formulations achieve the same transport cost of 4.351.

How does the GFlowNet flow‑matching constraint differ from a naïve “assign any probability mass to edges” approach?

In a naïve assignment the edge probabilities could violate the source/target marginals, leading to excess or missing mass. The flow‑matching constraint forces the total out‑flow from the source to equal $L$ and the total in‑flow to the sink to equal $R$, which is exactly the marginal feasibility condition of an optimal‑transport coupling.

Neural Training Algorithm

Train a non‑acyclic GFlowNet with a trajectory‑balance loss that enforces optimal‑transport flow constraints.

We adopt the non‑acyclic GFlowNet training pipeline and specialize it with the trajectory‑balance (TB) objective, which directly ties the learned flow to the optimal‑transport formulation.

Assuming reward‑matching ($P_B(s\mid s_f)=R(s)$) and forward‑matching ($P_F(s\mid s_0)=L(s)$), minimizing the regularized trajectory‑balance loss $L_{TB}(\theta,\tau)$ yields a flow that satisfies the marginal constraints of the entropy‑regularized optimal‑transport coupling between the target distribution $L$ and the reward distribution $R$.

We adjust the forward policy $P_F$ and backward policy $P_B$ of a non‑acyclic GFlowNet so that the induced stochastic flow matches the optimal‑transport coupling between the target distribution $L$ and the reward distribution $R$.

Compute the numerator of the log‑ratio: $L(s_0)\cdot R(s_1)=1\times0.5=0.5$.

Compute the denominator: $R(s_0)\cdot P_F(s_f\mid s_1)=1\times0.6=0.6$ (here $R(s_0)=1$ by definition).

Log‑ratio term: $\log(0.5/0.6)\approx -0.182$.

Path probability product: $P_F(s_1\mid s_0)\,P_F(s_f\mid s_1)\,P_B(s_0\mid s_1)\,P_B(s_1\mid s_f)=0.8\times0.6\times0.7\times0.9=0.3024$.

Loss contribution: $-0.182 + \log(0.3024) + \lambda \approx -0.182 -1.196 + 0.1 = -1.278$.

The example shows how the TB loss simultaneously aligns the start‑state target, the intermediate rewards, and the full forward‑backward path probability; the regularizer $\lambda$ nudges the terminal flow toward the reward value.

How does the trajectory‑balance objective differ from the standard GFlowNet loss that only penalizes flow mismatch?

Standard GFlowNet losses enforce $F(s)=\sum_{a}F(s\!\rightarrow\!s')$ without explicitly tying the forward start distribution to the target $L$ or the backward distribution to the reward $R$. The TB objective adds log‑ratio terms that directly match $P_F(s\mid s_0)$ to $L(s)$ and $P_B(s\mid s_f)$ to $R(s)$, and it includes the regularizer $\lambda\,R(s)/P_F(s_f\mid s)$ to enforce terminal‑state flow, yielding a tighter connection to the optimal‑transport formulation.

Table 1 reports total‑variation distances and average trajectory lengths on several hypergrid sizes, confirming that the learned sampler nearly matches the perfect sampler (TV*). Table 2 explores the effect of the regularization weight $\lambda$ on permutation tasks, showing that smaller $\lambda$ reduces the L1 error $C(k)_{L1}$ while increasing the expected trajectory length.

Hypergrid Empirical Results

Hypergrid experiments confirm the method recovers optimal OT plans.

Recall that a non‑acyclic GFlowNet can be seen as solving an entropy‑regularized Optimal Transport problem, with flow constraints matching OT marginals.

A hypergrid is a D‑dimensional lattice where each state is a coordinate vector, and a transition changes a single coordinate by ±1 until a terminal state is reached.

How does the single‑coordinate move rule differ from a standard random‑walk on the grid?

In a random walk the agent may change any subset of coordinates at once, which blurs the marginal constraints. The single‑coordinate rule forces each step to respect the grid’s axis‑aligned structure, so the accumulated flow exactly matches the OT marginal equations.

We solve the GFlowNet LP (11) on a $10\times10$ hypergrid and compare the learned sampler to the exact Kantorovich OT solution (8).

The learned policy recovers the optimal OT plan on all tested reward shapes without bias.

Table 1 shows total‑variation (TV) distances that are within $0.01$ of the perfect‑sampler reference TV* and mean trajectory lengths that match the OT* optimum for moon, corner, and ball distributions.

**Figure 2.** Visualization of the distributions used in the experiments: moon-shaped (left), corner-shaped (center), and ball-shaped (right).

Ablating the state‑flow regularization parameter $\lambda$ reveals that larger $\lambda$ values push the learned distribution away from the OT optimum, as reflected by higher TV distances in Table 1.

The method successfully recovers OT plans on hypergrids.

Permutation Experiments and Ablations

We evaluate our method on permutation environments and discuss its broader implications.

The state space is the Cayley graph of the symmetric group $S_n$: each node encodes a permutation of $n$ items and edges correspond to swapping two neighboring entries.

How does the permutation environment differ from the hypergrid environment used earlier?

In the hypergrid, states are points in a rectangular grid and moves change a single coordinate; in the permutation environment, states are permutations and moves are adjacent swaps, so the connectivity is defined by the Cayley graph rather than axis‑aligned edges, leading to a factorial‑size state space.

**Table 2.** Results obtained on permutations environment with ablation of $\lambda$. Reference OT* cost obtained via the POT solver (Flamary et al., 2021) for $n = 4$ is 0.567 and is $n = 8$ is 1.008, for larger permutations OT* is intractable. $C(k)L^1$ is the empirical $L_1$ error between fixed points probabilities obtained from model and the true ones (Morozov et al., 2025) $\mathbb{E}|\tau|$ is the expected trajectory length. All results are averaged over 3 seeds.

Varying the regularization coefficient $\lambda$ trades off trajectory length against sampling optimality: larger $\lambda$ shortens $\mathbb{E}|\tau|$ but biases the sampler, while smaller $\lambda$ yields near‑optimal sampling at the cost of longer trajectories.

Appendix and References

Technical proofs and experimental specifics that support the main theory.

This appendix supplies the missing proofs for the LP reformulation of the minimum‑flow problem and the associated duality results, then details the experimental setup used in the paper.

We first show that the original minimum‑flow objective can be expressed as a linear program (LP) by treating the edge flows as decision variables and enforcing flow conservation with linear constraints.

Instead of storing a separate forward probability table, we can encode the same information directly in the edge‑flow variables, provided they are non‑negative and sum to the node’s total outflow.

Analogously, the backward transition probabilities can be folded into the incoming edge‑flow variables, again requiring only non‑negativity and a sum‑to‑node constraint.

With both policies eliminated, the optimisation reduces to a pure LP over node and edge flows.

We now turn to the dual of this lifted LP, which reveals a connection to shortest‑path potentials.

The cost vector $c$ assigns a unit cost to every interior state and zero cost to the source and sink; dual variables $\alpha_s$ and $\beta_s$ correspond to the outgoing‑ and incoming‑flow constraints, while $\eta_x$ enforces the terminal constraints.

By forming the Lagrangian of the primal LP and demanding finiteness, we obtain a maximisation problem over potentials that upper‑bounds the primal objective.

The optimal dual potentials are exactly the shortest‑path distances from the source to each state, measured in unit‑cost edges.

Finally, we observe that the dual of the extended primal problem matches the dual of the entropy‑regularised OT formulation, establishing the claimed equivalence between GFlowNets and OT.

Experimental details: all models are two‑layer MLPs (hidden size 128) taking a one‑hot state encoding; the three heads $F_\theta$, $P_F$, and $P_B$ share the backbone but have separate linear output layers. Training uses on‑policy batches of 512, AdamW ($\text{lr}=10^{-3}$, weight decay $10^{-4}$) on CPUs.

Three synthetic landscapes—moon, ball, and corner—are defined on a discretised hypergrid to test the GFlowNet’s ability to learn complex, multimodal reward structures.

For a permutation, $C(k)$ denotes the probability of observing exactly $k$ fixed points; the $L_1$ error measures how well the GFlowNet’s empirical distribution matches this target.

Questions & answers

What is the main contribution of this paper?

The paper proves that non-acyclic Generative Flow Networks (GFlowNets) are mathematically equivalent to Kantorovich optimal transport (OT): the GFlowNet's minimum-flow objective reduces to a linear program dual to the Kantorovich OT problem, and the learned policy defines the coupling (transport plan).

What problem does this paper address and why does it matter?

The paper addresses the computational difficulty of finding optimal transport plans in large combinatorial spaces, where standard OT solvers require explicit cost matrices and become intractable. By connecting GFlowNets to OT, it enables scalable, neural-parameterized transport plan learning without enumerating the full state space.

Why use GFlowNets for optimal transport instead of standard linear programming solvers?

Standard OT solvers require explicit cost matrices and struggle with large combinatorial spaces, whereas GFlowNets use neural parameterization to learn the transport plan implicitly through local, feasible moves, enabling scalability to environments where the state space is too large to enumerate.

How does the theoretical equivalence between GFlowNets and optimal transport work?

By fixing the initial edge-flow distribution to match a source distribution L(u) and minimizing total flow in a non-acyclic graph, the GFlowNet objective reduces to a linear program that is dual to the Kantorovich OT problem, where the ground cost is defined by the shortest-path distance in the graph.

How does the GFlowNet flow-matching constraint enforce valid transport marginals?

The flow-matching constraint forces the total out-flow from the source to equal L and the total in-flow to the sink to equal R, which is exactly the marginal feasibility condition of an optimal-transport coupling, preventing the excess or missing mass that would arise from naïve edge-probability assignment.

What is the trajectory-balance (TB) objective and how does it differ from the standard GFlowNet loss?

The trajectory-balance objective adds log-ratio terms that directly match the forward start distribution P_F(s|s_0) to L(s) and the backward distribution P_B(s|s_f) to R(s), and includes a regularizer λR(s)/P_F(s_f|s) to enforce terminal-state flow, yielding a tighter connection to the OT formulation than the standard loss, which only penalizes flow conservation violations.

What is the role of the regularization coefficient λ in this framework?

The parameter λ controls the trade-off between expected trajectory length (transport cost) and sampling accuracy: higher λ values shorten expected trajectory length but bias the sampler away from the OT optimum, while lower λ values yield near-optimal sampling at the cost of longer trajectories.

What datasets or environments were used in the experiments?

The paper uses two experimental environments: a hypergrid (a rectangular grid where moves change a single coordinate) and a permutation space (where states are permutations and moves are adjacent swaps on the Cayley graph). The paper does not specify any external real-world datasets.

What are the key empirical results on the hypergrid?

Table 1 reports total-variation (TV) distances and average trajectory lengths on several hypergrid sizes, confirming that the learned sampler nearly matches the perfect sampler (TV*). The method successfully recovers exact OT plans on a 10×10 hypergrid, and larger λ values produce higher TV distances from the OT optimum.

What are the key empirical results on permutation tasks?

Table 2 shows that on permutation tasks, smaller λ reduces the L1 error C(k)_L1 while increasing expected trajectory length, confirming the λ trade-off. The GFlowNet approach scales to the factorial-size permutation state space where traditional solvers fail.

How does the permutation environment differ from the hypergrid environment?

In the hypergrid, states are points in a rectangular grid and moves change a single coordinate; in the permutation environment, states are permutations and moves are adjacent swaps, so connectivity is defined by the Cayley graph rather than axis-aligned edges, resulting in a factorial-size state space.

How does the single-coordinate move rule on the hypergrid enforce valid OT marginals?

The single-coordinate rule forces each step to respect the grid's axis-aligned structure, so accumulated flow exactly matches the OT marginal equations; a standard random walk that changes any subset of coordinates at once would blur these marginal constraints.

What are the limitations of this approach?

The paper does not explicitly enumerate limitations, but the framework requires a non-acyclic GFlowNet setting and the regularization parameter λ introduces a bias-cost trade-off that prevents simultaneously achieving minimal trajectory length and exact OT optimality.

How does this work compare to prior GFlowNet formulations?

Prior GFlowNet formulations operate on acyclic graphs and do not connect the flow objective to optimal transport; this paper extends GFlowNets to non-acyclic graphs and proves the minimum-flow objective is dual to the Kantorovich OT problem, a connection not established in prior work.

How would a practitioner reproduce or apply this method?

Models are two-layer MLPs with hidden size 128 taking one-hot state encodings; three heads (F_θ, P_F, P_B) share a backbone with separate linear output layers; training uses on-policy batches of 512 with AdamW (learning rate 1e-3, weight decay 1e-4) on CPUs, using the trajectory-balance objective with a chosen λ.

Where and when was this paper published?

The paper is available on arXiv at https://arxiv.org/abs/2606.06272; the paper does not specify a conference venue or publication date beyond the arXiv identifier.

Key terms

GFlowNet (Generative Flow Network): A generative model that constructs structured objects by traversing a stochastic graph from a start state to a terminal sink, with edge flows trained to match a target reward distribution.
Optimal Transport (OT): A mathematical framework for finding the most cost-efficient way to move probability mass from one distribution to another.
Kantorovich OT: The linear-programming formulation of optimal transport that finds a joint coupling (transport plan) between two distributions minimizing expected transport cost subject to marginal constraints.
Transport plan (coupling): A joint probability distribution over pairs of source and target states whose marginals match the given source and target distributions, representing how mass is moved between them.
Non-acyclic GFlowNet: A GFlowNet defined on a graph that may contain cycles, unlike the standard acyclic (DAG-based) formulation, allowing the agent to revisit states during trajectory generation.
Minimum-flow objective: A training objective that minimizes the total expected flow (equivalently, expected trajectory length) in a GFlowNet, which the paper shows is equivalent to minimizing OT cost.
Trajectory-balance (TB) objective: A GFlowNet training loss that ties the forward start distribution to the source distribution L and the backward distribution to the reward R, providing a tighter connection to the OT formulation than standard flow-conservation losses.
Flow conservation (flow-matching constraint): The requirement that at every intermediate node in a GFlowNet, the total incoming flow equals the total outgoing flow, analogous to marginal feasibility in optimal transport.
Regularization coefficient λ: A scalar hyperparameter that controls the trade-off between minimizing expected trajectory length (transport cost) and accurately matching the target reward distribution in the GFlowNet-OT framework.
Hypergrid: A rectangular grid environment used as a benchmark where each state is a point in a multi-dimensional grid and each move changes exactly one coordinate.
Permutation space (Cayley graph): An environment where states are permutations of elements and moves are adjacent transpositions, forming a graph whose connectivity is defined by the Cayley graph structure and whose size grows factorially.
Total-variation (TV) distance: A measure of the difference between two probability distributions, equal to half the sum of absolute differences over all states, used here to evaluate how closely the learned sampler matches the OT solution.
Shortest-path distance: The minimum number of edges (or minimum cost path) between two nodes in a graph, which serves as the ground cost in the GFlowNet-OT equivalence.
Linear program (LP): An optimization problem in which the objective function and all constraints are linear, used here to formalize both the GFlowNet minimum-flow problem and the Kantorovich OT problem.
Dual problem (LP duality): A related optimization problem derived from a linear program whose optimal value equals that of the original (primal) problem, used here to establish the equivalence between the GFlowNet LP and the Kantorovich OT formulation.
Entropy-regularized OT: A variant of optimal transport that adds an entropy penalty to the transport plan, making the problem strictly convex and easier to solve, which the paper connects to the GFlowNet flow objective.
Forward policy P_F: The learned probability distribution over actions taken from each state in the forward (source-to-sink) direction of a GFlowNet trajectory.
Backward policy P_B: The learned probability distribution over predecessor states in the reverse (sink-to-source) direction of a GFlowNet trajectory, used to enforce consistency with the reward distribution.

Read the original paper

Open the simplified reader on Paperglide