Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Ruiqi Wang, Qimin Chen, Daniel Ritchie, Angel X. Chang, Manolis Savva, Kai Wang, Hao Zhang

Function2Scene generates 3D indoor layouts by iteratively refining furniture arrangements against functional design constraints.

How can we generate 3D indoor furniture layouts that satisfy complex, human-centric functional requirements rather than just aesthetic or geometric ones?

Existing text-to-scene models prioritize object-centric prompts, often producing layouts that are visually plausible but functionally unusable for specific human activities. Function2Scene reframes this by parsing natural-language design briefs into a taxonomy of spatial, ergonomic, and activity-based constraints, then iteratively repairing the layout through a tool-augmented check-and-repair loop. In perceptual studies, this functionality-first approach is preferred in 94.3% of pairwise comparisons against state-of-the-art LLM-based baselines.

Paper Primer

The core move is a "check-and-repair" loop: the system treats layout generation as an iterative optimization problem where an LLM acts as a designer, using specialized tools to measure geometric, ergonomic, and environmental compliance against a 17-point constraint taxonomy.

Function2Scene significantly outperforms existing LLM-based scene synthesis methods in functional quality.

A 2AFC perceptual study comparing layouts against Holodeck, iDesign, and LayoutVLM across 30 professional design cases.

The system's effectiveness hinges on the integration of evaluation tools; ablation studies show that iterative refinement is actually counterproductive without the grounded spatial feedback provided by these tools to guide the LLM's adjustments.

Why does this approach require a custom constraint taxonomy instead of relying on the LLM's internal knowledge?

LLMs struggle to directly generate functional layouts because functional requirements are heterogeneous and high-level; the taxonomy provides an explicit, interpretable framework that allows the system to measure and repair specific failures like poor circulation or glare.

What is the scope of this framework—does it handle arbitrary user requests?

The current implementation assumes a detailed, professionally written functional specification as input; it does not yet include a conversational interface to help non-expert users articulate vague or short initial demands.

For researchers in scene synthesis, this paper shifts the focus from "placing objects" to "supporting human use," demonstrating that explicit, tool-verified design principles are more effective than implicit statistical priors for functional layout generation.

Introduction

We frame indoor layout generation as a functional‑specification problem and outline our solution.

Existing indoor‑scene generators treat a room as a bag of objects, ignoring how the space will be used. Function2Scene instead takes a functional specification—who will occupy the room and what they need to do—and drives layout generation with explicit design constraints, closing the gap between aesthetic plausibility and real‑world usability.

A functional specification is a short natural‑language brief that describes the occupants (personas), their activities, and the functional needs the space must satisfy, rather than listing concrete furniture items.

With a candidate layout of five objects, each pairwise distance check is O(N²) = 25 operations per constraint.

Evaluating all five constraints therefore requires 5 × 25 = 125 geometric checks.

A naïve generator that ignores constraints would need to repeat this full evaluation for every sampled layout, quickly becoming intractable.

Even a modest functional brief can explode the combinatorial search space, motivating the need for an iterative, constraint‑driven refinement loop.

**Fig. 1.** We present Function2Scene, a framework for generating 3D indoor layouts from functional specifications. Given a detailed functional specification, our method decompose them into functional design constraints, which are then used to iteratively evaluate and refine a generated scene. Please refer to the supplementary material for the full input prompt and more detailed visualizations.

The key shift is moving from aesthetic‑only generation to a function‑driven design paradigm.

Related Work

We position our work among prior scene synthesis, LLM‑driven methods, and evaluation approaches.

Early indoor scene synthesis relied on hand‑crafted design rules, simple statistical relationships, or bespoke programs, demanding extensive manual effort and offering little flexibility for open‑ended scenarios. These approaches could not scale to the diversity of modern applications.

Earlier pipelines evolved from explicit rule‑based systems to data‑driven deep models, yet they never encoded the functional, ergonomic constraints that determine whether a room truly supports its intended activities.

The advent of large language models enabled direct prediction of object coordinates from open‑vocabulary prompts, yet most systems still describe only objects, relations, and absolute positions. As a result, functional ergonomics and activity support remain under‑specified.

Current evaluation pipelines measure geometric validity, semantic coherence, navigability, and collision avoidance, and they employ iterative refinement loops such as VLM feedback, multi‑turn reinforcement learning, differentiable VLM optimization, or vision‑language editing. Only recent work begins to assess functional affordances, leaving a gap that our typed verification tools aim to fill.

Functional Design Constraints

We encode user‑specific needs as prioritized functional constraints for layout generation.

Different occupants impose conflicting placement demands—e.g., a wheelchair user needs wide clearances while an entertainer values open social zones. A one‑size‑fits‑all constraint set cannot satisfy both.

Constraints are grouped into four intuitive categories—Spatial, Ergonomic, Activity, Environmental—and each is parameterized by the specific activity × persona pair, then ordered into six priority tiers so the most essential rules are enforced first.

Spatial: Geometry Validity (S1) checks that both pieces fit inside the 4 × 4 boundary—true.

Ergonomic: Circulation (E1) computes the narrowest path between sofa and desk; for the wheelchair persona the path width is 0.6 m < 0.8 m → violation.

Ergonomic: Reachability (E3) evaluates desk height against the seated reach of the wheelchair user; height 0.75 m > 0.7 m → violation.

Activity: Zone Allocation (A1) measures the free floor area around the sofa; for the entertainer persona the available 1.2 m × 1.2 m zone is insufficient for the required 1.5 m × 1.5 m → violation.

Priority: All T1 violations (S1, E1) are flagged first; only after they are resolved does the system consider the higher‑tier A1 violation.

The same geometry can satisfy the mobility‑limited user after widening the corridor, yet still fail the entertainer’s social‑zone requirement, illustrating why constraints must be both persona‑specific and tiered.

How does this differ from traditional hard‑constraint layout optimizers?

Classic optimizers treat every rule as a static Boolean that must hold for all users. Our design encodes each rule with a persona‑specific threshold and a tier, so the system can relax lower‑priority aesthetic constraints while always enforcing safety‑critical ones for the current occupant.

**Fig. 2. Constraints Taxonomy.** We organize interior design constraints into four categories: Spatial (S1–S5), Ergonomic (E1–E4), Activity (A1–A4), and Environmental (N1–N4), each illustrated with representative examples of how they shape furniture placement in a typical room layout.

Constraints are ordered into six priority tiers (T1–T6); lower tiers (e.g., geometry and clearance) are verified before higher tiers such as visual composition or ventilation.

The Function2Scene Pipeline

Method details the pipeline that turns functional prompts into refined 3D layouts.

Initial furniture layouts generated directly from the functional prompt often contain overlaps or violate ergonomic needs, exposing the core weakness that the later refinement must fix.

The pipeline converts a user’s functional description into a 3D layout by first extracting constraints, then iteratively refining the furniture placement with LLM‑driven feedback until all prioritized constraints are satisfied.

Parsing extracts the two constraints and produces a JSON scene description.

Room Structure Generation creates a $4\times4$ m room with walls encoded in the DSL.

Furniture Initialization places the desk against the north wall and the chair directly in front of it, violating constraint 1.

Constraint 1 is evaluated; the tool reports a $15$ cm clearance shortfall.

The LLM shifts the chair $15$ cm toward the south, satisfying constraint 1.

Constraint 2 is already satisfied; no change is made.

Even with only two objects, the loop resolves the spatial conflict in a single iteration, demonstrating how local edits quickly achieve global feasibility.

How does this LLM‑driven loop differ from a traditional constraint solver?

Traditional solvers try to satisfy all constraints simultaneously, often requiring expensive global optimization. Our loop lets the LLM address constraints one by one, using tool feedback to make localized edits while preserving higher‑priority constraints that are already met.

Parsing: extract a structured constraint set and a reformulated scene description from the functional prompt.

Room Structure Generation: encode walls, floor, ceiling, doors, and windows in a JSON‑based DSL and present it for user verification.

Furniture Initialization: generate an initial placement of furniture items based on the parsed description.

Constraint Evaluation: for each constraint in priority order, invoke the appropriate tool (e.g.,

LLM Interpretation: the LLM reads the tool output, decides whether the constraint is satisfied, and if not, produces a justification and a targeted refinement action.

Layout Refinement: apply the LLM‑suggested edit locally, then re‑evaluate the same constraint before moving to the next one.

Termination: after all constraints have been processed, re‑evaluate Tier 1 spatial constraints to ensure no later edits broke foundational requirements; output the final layout.

**Fig. 3. Overall Pipeline:** Given a functional prompt, FUNCTION2SCENE generates 3D indoor scene layout through iteratively evaluation and refinement based on functional constraints.

Results and Evaluation

Quantitative preferences and qualitative comparisons show our method outperforms existing baselines.

Function2Scene turns functional prompts into 3D layouts by mapping them to a constraint taxonomy and refining the result with an LLM‑driven loop.

Our method outperforms all baselines in functional layout preference.

Overall preference 94.3 % versus an average of 94.2 % across baselines (Table 3).

**Figure 4.** Qualitative comparisons of our method against various comparison conditions. Top two rows: baselines with original functional prompts; middle two rows: baselines with our parsed specifications; bottom two rows: ablations, from left to right: w/ parsed input and iterative refinement, with original prompt and no iterative refinement, with parsed input and no iterative refinement.

**Figure 5.** Functional scenes generated by our method, along with zoomed in highlights. The input prompts are truncated due to space constraints. Please refer to the supplementary materials for all qualitative results, along with visualization of all intermediary optimization steps.

Together, these quantitative preferences and qualitative visualizations confirm that our constraint‑driven refinement loop yields layouts that users deem more functional than those produced by prior LLM‑based systems.

Layout Representation and DSL

We define a JSON‑based Layout DSL and detail the 2AFC perceptual study.

To let the iterative refinement loop reason about rooms, we need a precise, machine‑readable layout description.

A tiny JSON schema that lists every wall, floor, ceiling, opening, and piece of furniture together with their positions, sizes, and orientations, so a program can validate constraints and render a 3‑D scene automatically.

How does the furniture orientation tag differ from the wall facing field?

facing is an absolute bearing of a surface’s outward normal (e.g., a wall’s inside direction). orientation describes an object’s symmetry class: “directional” objects need a front‑face bearing, “axial” objects only need a rotation around one axis, and “symmetric” objects ignore rotation entirely.

Define the wall:

Add a chair:

Validate: the chair’s centroid lies $0.5$ m from the wall, satisfying a “minimum clearance” constraint.

Render: the engine reads the JSON, places the wall at the given centroid, then orients the chair eastward according to its

The orientation tag lets the solver treat a chair’s front‑face as significant while ignoring rotation for symmetric objects like a ceiling light.

**Fig. 6.** Sample layout DSL file.

We evaluated layout quality with a two‑alternative forced‑choice (2AFC) study covering 30 distinct room scenes.

Each survey presented 30 pairs; five pairs were attention‑check items where our method’s layout was pitted against a random layout.

We created ten survey variants, each assigning a different baseline method to the 30 scenes; three participants completed each variant.

Participants were recruited on Prolific, required normal vision, US/Canada residency, ≥100 prior submissions, and prior AI‑evaluation experience.

Perceptual Study Setup

Ablation study shows how each pipeline stage impacts layout quality and user preference.

The ablation removes each pipeline component in turn and measures the resulting drop in participant preference in the 2AFC study.

Without the side‑by‑side interface, participants lose the visual reference needed to compare layouts, leading to a noticeable decline in consistent choices.

Skipping the LLM‑driven initialization sequence forces a single monolithic prompt; the generated rooms contain more structural errors, which participants penalize heavily.

Omitting the shell‑generation step (i.e., always assuming a rectangle) produces layouts that ignore L‑shaped recesses, causing mismatches with briefs that request alcoves.

When door and window placement is disabled, openings are missing or mis‑aligned; participants report a sharp drop in ergonomic scores because egress pathways become blocked.

Removing the human‑verification pause eliminates the opportunity to correct obvious geometry mistakes, resulting in a higher rate of hard‑constraint violations.

Disabling the functional‑constraint parser means soft zone rules are never enforced; layouts frequently place furniture in inappropriate zones, reducing activity‑support ratings.

Turning off Tier 1 hard rule S1 for floor objects allows objects to protrude beyond the outer boundary, which participants flag as severe structural flaws.

Similarly, disabling S1 for wall‑mounted objects creates floating fixtures that intersect walls, leading to a marked drop in spatial‑validity scores.

**Figure.** A comparison of two living room layouts for a fitness entrepreneur and choreographer, presented as a selection task based on a provided brief and persona.

**Fig. 8.** Perceptual study introduction.

Structural Baseline and Refinement

Ablating each constraint reveals the specific layout failures it prevents.

Removing S1 – the “no overlap” rule – immediately produces intersecting furniture footprints, causing structural inconsistencies and downstream placement failures.

Without S2 – clear door swing arcs – objects intrude into door swing polygons, forcing the system to flag doors for conversion to sliding types or to relocate blocking items.

Dropping the floor‑contact rule lets objects float above the floor plane, breaking the assumption that Z = 0 for all floor‑based items and causing rendering artifacts.

Eliminating the unobstructed door‑approach constraint creates blocked entry corridors on one or both sides of a door, violating accessibility requirements and prompting path‑blocking warnings.

If wall‑mounted objects are not forced flush against their backing wall, they may protrude into the room volume, leading to collision checks failing and unrealistic wall penetrations.

Relaxing the ≤ 5° open‑threshold angle permits objects to tilt away from walls, resulting in misaligned placements that break the intended “flush‑against‑wall” aesthetic.

Allowing objects into the clear‑path zone of an open threshold creates invisible barriers in archways, reducing usable circulation width and triggering layout plausibility errors.

When wall‑mounted items are permitted to span windows or door openings, they obscure openings, violate the solid‑wall requirement, and cause the verification tools to flag illegal spans.

Disabling the window‑obstruction rule lets furniture block glazing, which the visual pipeline interprets as a blocked view and the system marks for manual review.

Omitting the fixture‑orientation constraint leaves use‑faces hidden from the entry sightline, breaking ergonomic expectations and causing the layout evaluator to report inaccessible fixtures.

Skipping the open‑floor facing rule permits sofas or chairs to point away from the central open area, reducing functional cohesion and triggering a proximity‑alignment failure.

Without the circulation‑passability check, main pathways may become too narrow for a standard adult, leading to dead‑end warnings in the layout plausibility tier.

Removing the sleeping‑side reachability rule can trap a guest against a wall, violating the “both sides reachable” requirement and flagging the sleeping surface for repositioning.

Eliminating the dead‑zone size constraint permits large unusable corners, which the evaluator marks as excessive dead space and suggests furniture reshuffling.

When desk‑cluster proximity is ignored, desk‑related items drift apart, breaking functional grouping and causing the system to label the cluster incoherent.

Dropping the desk‑storage capacity rule leads to overloaded drawers or shelves, which the verification step flags as insufficient for the designated load.

Without the desk‑alignment rule, chairs, monitors, and desks may face mismatched directions, producing a visually disjoint workspace and triggering alignment warnings.

Skipping the desk‑wall‑space expansion check leaves no record of available wall length for future storage, limiting design flexibility in later refinement stages.

Removing lounge‑cluster proximity enforcement scatters sofas, screens, and seating, breaking the cohesive movie‑viewing area and causing the layout scorer to downgrade plausibility.

If residual space after placing the lounge is not evaluated, awkward mid‑room strips remain unnoticed, reducing usable floor area and prompting manual cleanup.

Disabling lounge‑alignment allows mismatched facing between sofa, screen, and chairs, degrading visual harmony and triggering an alignment failure.

Omitting the lounge‑wall‑space expansion rule hides potential storage extensions, preventing the system from suggesting additional shelving where needed.

When furniture footprint proportionality is ignored, items become either too massive or too tiny for the room, leading to aesthetic and functional implausibility flags.

Skipping the lounge residual‑usability assessment leaves leftover gaps unexamined, which may later be flagged as wasted space during layout optimization.

Removing the lounge‑storage capacity check permits storage units that cannot hold their assigned items, causing overflow warnings in the verification stage.

Without the object‑footprint balance rule, a single oversized piece can dominate the room or a tiny decorative item can appear out of place, both of which are flagged as implausible.

Dropping the sofa‑length proportion rule yields sofas that either crowd the wall or leave excessive empty space, violating the proportionality requirement and triggering a size warning.

Eliminating the tier‑3 lounge‑wall‑space expansion check prevents the system from identifying wall segments that could accommodate extra storage when the primary units are insufficient.

When the desk‑zone floor‑area rule is removed, the free‑floor polygon may be too small for a seated adult, causing the activity‑support evaluator to reject the layout.

Skipping desk‑zone accessibility removes the guarantee of a clear path from the main circulation route, leading to blocked entry warnings.

Ignoring the permanent‑clear desk‑zone rule allows lounge or sleeping furniture to intrude during mode switches, resulting in cross‑mode collision errors.

Constraint Verification Tools

We detail the suite of verification tools that turn functional constraints into concrete checks.

Before a layout can be accepted, every functional constraint must be turned into a concrete, automatically checkable predicate.

Each functional requirement is mapped to a small, self‑contained routine that inspects the 3‑D scene and returns a Boolean or a structured verdict.

The example shows how geometric predicates quickly eliminate impossible placements, while LLM/VLM checks add higher‑level functional validation without manual coding.

Verification loop used during each refinement iteration.

**Table 1.** Evaluation tools setup. Each constraint is verified by one or more tools, color-coded by type: numeric/geometric tools compute quantitative measures directly from scene geometry, LLM query tools leverage language model reasoning over structured scene data, and VLM tools interpret rendered images. Tier indicates evaluation priority, where lower-tier constraints are verified first as prerequisites for higher-tier ones.

By chaining these tools, the system can automatically certify that a generated layout satisfies every functional specification without human inspection.

Conclusion

We wrap up Function2Scene’s achievements and outline paths for richer design workflows.

Function2Scene demonstrates that grounding layout generation in a formal functional taxonomy and an iterative LLM‑driven loop yields scenes that are both higher‑quality and more aligned with design intent than prior methods.

Future work should add a conversational interface to elicit detailed functional specifications from vague user requests, and enrich the verification pipeline with domain‑specific tools such as embodied simulation, lighting and acoustic estimators, or a semantic DSL for distance constraints; expanding the architectural scope to co‑optimize room shape, openings, and partitions would capture the full breadth of interior design.

Implementation Details

Implementation rules enforce clear zones, accessibility, and ergonomic constraints across all room modes.

The movie viewing zone must provide enough unobstructed floor area for two to four seated guests, and it must be reachable from the main circulation path without creating a bottleneck.

If the same zone doubles as an overnight sleeping area, the furniture that folds or moves (sofa‑bed, coffee table, occasional chairs) must find valid positions that respect the S1 and S2 constraints in both lounge and sleep configurations.

The overnight sleeping zone itself must contain a clear floor area for a single adult lying down, and it must be reachable both from the main circulation path and from the entry door.

When the room switches among office, movie, and sleep modes, every object that needs to be repositioned (chairs, tables, rugs, sofa‑bed) must be able to move to a new location that still satisfies S1, S2, and A1 constraints.

Rugs or mats required for sofa‑bed deployment must be rollable without forcing any furniture that sits on their edges to be moved.

All doors, drawers, and other openable panels must have a clear sweep zone in front of them, and every occasional seat and the sofa must have a clear front access zone for sitting and articulation.

The office chair at the desk requires a clear pull‑out zone behind it, and no companion object may be placed inside any other object’s access or articulation zone, both in the main layout and within the lounge cluster.

Every storage item (shelves, drawers, cabinets) must be reachable by a standard adult without a step stool, and the total number of storage units must satisfy the estimated storage load for each functional cluster.

The overall room layout must appear visually balanced from a top‑down view, and all seats, work surfaces, screens, and monitors must be sized and positioned for a standard adult’s body and comfortable viewing angles.

Each multi‑step activity (office work, movie watching, sleep deployment) should follow a logical spatial path with minimal backtracking, keeping the total path length reasonable for a 20 m² room.

No screen or sustained visual‑work surface may receive direct glare from windows or artificial lights, and priority activity zones must stay reasonably close to natural light sources.

Noise‑generating zones must be acoustically separated from quiet zones, and no object may intersect the clearance zone of HVAC vents, radiators, or other heat sources.

Read the original paper

Open the simplified reader on Paperglide