TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration
TIDE proactively discovers multiple hidden problems in documents or code by iteratively applying reusable reasoning templates.
How can LLM agents proactively discover multiple hidden problems in a context rather than waiting for explicit user requests?
Agents typically wait for explicit user requests, leaving consequential issues—like conflicting project data or hidden code bugs—unnoticed in the background. TIDE addresses this by treating proactive assistance as an iterative discovery process: it uses reusable "thought templates" to recognize recurring problem patterns and conditions each search round on previously found issues to ensure broad coverage. Across personal workspaces and software repositories, this approach consistently outperforms single-pass agents in identifying and resolving multiple coexisting problems.
Paper Primer
The core mechanism hinges on two complementary moves: iterative discovery and thought templates. Iteration forces the agent to look beyond the most salient, obvious issues by conditioning each new search round on the cumulative state of already-discovered problems, while templates provide a library of distilled reasoning schemas that anchor predictions in recognizable problem classes rather than speculative inference.
TIDE significantly improves multi-problem discovery coverage compared to single-shot and parallel multi-agent baselines.
In personal workspace settings, TIDE consistently recovers four or more problems per instance, whereas baselines typically surface only one or two. TIDE achieves substantial gains across retrieval, identification, and resolution metrics, with iterative conditioning proving more effective than simply scaling the number of parallel agents.
The framework's effectiveness is robust across different model backbones, and the thought templates demonstrate strong cross-LLM transferability, meaning a template library built by one model can be effectively utilized by another.
Why is a single-pass approach insufficient for this task?
Single-pass prediction tends to anchor on the most salient cases, causing them to overshadow subtler but equally important problems, and lacks a reusable prior on how contextual signals indicate specific problem classes, leading to generic or speculative outputs.
How does TIDE ensure it doesn't just re-discover the same problems in every round?
Each iteration is explicitly conditioned on the cumulative set of problems already found, which forces the agent to redirect its attention toward new, undiscovered candidates in subsequent rounds.
The Reactive Agent Bottleneck
Agents react only to explicit requests, missing many hidden problems in the user’s context.
Current LLM agents are reactive: they wait for an explicit user request before acting. In real workflows, many problems remain hidden in documents, emails, and code, and their total number is unknown. This section frames the gap between explicit requests and latent workspace problems that TIDE aims to close.
A reactive agent only responds to user‑initiated prompts, so any issue not explicitly asked about is never addressed.
Why does reacting only to explicit requests limit an agent’s usefulness?
Because it assumes the user already knows every issue; hidden problems never surface, so the agent cannot help resolve anything the user hasn’t asked about.
**Figure 1.** Conceptual illustration of TIDE. (A) Reactive agents act only on explicit user requests, leaving (B) the many problems coexisting hidden across the user context untouched. (C) TIDE surfaces them by applying reusable thought templates over multiple rounds of iterative discovery conditioned on the cumulative state, returning per-task plans that identify, ground, and resolve each discovered problem.
Closing the request‑problem gap lets agents surface issues users never notice, expanding their utility.
The TIDE Framework
We introduce TIDE, a template‑guided iterative discovery loop that uncovers many hidden problems with higher fidelity.
TIDE repeatedly applies reusable thought templates to a document collection, each round conditioning on what has already been discovered, so the agent can surface many hidden problems instead of a single shot.
How does TIDE differ from simply prompting the LLM multiple times?
Each TIDE round is conditioned on the cumulative set of previously discovered problems and on a fixed library of thought templates; naïve repeated prompting lacks this conditioning and therefore repeatedly returns the same salient problems.
A thought template is a reusable schema that captures the evidence pattern of a known problem class, letting the agent infer new instances without learning from scratch.
Why not let the LLM infer evidence patterns on the fly instead of using templates?
Without a template the model must invent how raw context signals constitute a problem, which often leads to generic or speculative claims; the template provides a concrete, vetted evidence flow that anchors the inference.
Initialize the cumulative prediction set $\hat P(0) \gets \emptyset$.
For each round $t = 1 \dots T$: invoke the LLM with $(D, T, \hat P(t-1), k)$ to obtain a batch $\Delta\hat P(t)$ of up to $k$ new triples.
Update the cumulative set $\hat P(t) \gets \hat P(t-1) \cup \Delta\hat P(t)$.
If $\Delta\hat P(t) = \emptyset$ or $t = T$, stop; otherwise continue to the next round.
Return the final set $\hat P = \hat P(T)$.
Round 1: LLM scans $D$ with the template and emits two candidates: (i) “d₁ and d₂ contain different versions of the same policy” and (ii) “d₂ mentions a deadline that conflicts with d₃’s schedule.” Both are added to $\hat P(1)$.
Round 2: Conditioning on $\hat P(1)$, the LLM looks for problems not already covered. It finds a third candidate: “d₁ lacks the required sign‑off signature present in d₃.” This becomes $\Delta\hat P(2)$ and is added to $\hat P(2)$.
Since the LLM returns no further candidates in Round 3 (or the round limit is reached), the loop terminates with three discovered problems.
The iterative process forces the model to move beyond the most obvious conflicts, uncovering less salient issues that a single‑shot prompt would miss.
Evaluation Design
Key experimental details and headline performance for TIDE on multi‑problem discovery.
TIDE (Ours) consistently outperforms the single‑agent baseline on the multi‑problem discovery task.
Across both personal‑workspace and software‑repository settings, TIDE achieves higher coverage and F1 scores for retrieval, identification, and resolution.
A single LLM invocation that emits a list of all predicted problems without any iterative refinement.
How does the Single‑Shot baseline differ from the Multi‑Agent approach?
Single‑Shot uses one LLM call to produce all predictions at once, while Multi‑Agent runs several independent LLM agents in parallel, each generating its own set of predictions; the latter incurs higher compute but can explore diverse solution spaces.
This table compares the performance of a "Single-Agent" approach against the "TIDE (Ours)" approach across three stages: Retrieval, Identification, and Resolution, using a "Gold" standard as the baseline. The table illustrates how TIDE successfully retrieves all 5 supporting documents, identifies the specific sync conflict and gating IT security issue, and resolves the problem by escalating to the correct manager with a concise summary. In contrast, the Single-Agent approach fails to retrieve relevant documents, surfaces unrelated issues, and fails to address the core problem.
Performance and Ablations
We quantify how TIDE outperforms reactive baselines across multi‑problem discovery.
The paper’s core claim is that reactive agents miss many latent issues, while TIDE iteratively discovers and resolves them using template‑guided loops. This section shows how that claim translates into concrete gains.
**Table 1.** Main results on the two evaluation settings: Personal Workspace and Software Repository. For each sub-task, we report Coverage (Cov.) and F1 over three independent runs; the best per-LLM results are in bold.
**Figure 2.** Multi-problem discovery on the Workspace setting with GPT. Left: discovered problems per instance. Right: coverage by gold count.
**Figure 4.** F1 results as a function of the per-instance LLM-call budget $k$ on the Workspace setting with GPT.
**Figure 5.** Per-run template citation frequency.
**Figure 6.** Per-iteration retrieval coverage (left) and precision (right) on the Workspace setting with GPT.
**Figure 7.** F1 scores as the template pool size grows on the repository setting with Claude.
**Table 3.** Template transferability on the Repository setting.
**Table 2.** Results on the Repository setting with GPT using raw few-shot demonstrations (ITER. + DEMOS).
**Table.** Comparison of bug fixing approaches for the *mlxtend* issue #393.
Related Work
We position proactive discovery within prior LLM‑agent and template‑based work.
LLM agents have been evaluated in task‑oriented settings such as document understanding, tool use, web interaction, and software engineering. Benchmarks measure whether agents can follow user instructions, navigate complex environments, and complete prescribed tasks.
These studies typically assume that a user request, issue description, or failing test already defines the goal, reducing the agent’s role to executing against a stated objective. This explicit‑request assumption limits the agent’s ability to discover problems that have not been articulated.
We consider the inverse setting: no request is provided, and the agent must first discover one or more latent problems from a broader context before taking action. This shift expands the problem space from a single, localized goal to multiple co‑existing issues.
Proactive agents aim to anticipate user needs and initiate assistance before an explicit request. One line of work uncovers user intent by asking clarification questions or bridging knowledge gaps, yet still anchors interaction on a user‑issued query.
A more recent strand studies when an agent should intervene, leveraging user activity signals to predict assistance opportunities and generate proactive suggestions. However, this literature remains focused on a single localized need at a time.
Improvements to LLM reasoning have traditionally relied on internal capabilities, such as eliciting intermediate steps or having the model critique its own outputs. These methods presuppose that the problem statement is already known.
Template‑based approaches externalize reusable reasoning patterns: Buffer‑of‑Thoughts caches prior traces, hierarchical template paths organize them, schema‑based abstractions support in‑context learning, self‑evolving memory stores strategies, graph‑based reuse connects thought fragments, and multi‑hop reasoning handles long‑context documents. All of these assume a given problem description and treat templates as solution schemas.
Our work repurposes such templates as discovery schemas that specify which contextual signals to attend to and how to connect them, enabling iterative expansion over multiple co‑existing problems rather than refining a single solution.
Conclusion and Limitations
Summarizes TIDE’s impact, outlines its limits, and addresses ethical considerations.
We presented TIDE, a framework that consistently outperforms single‑shot and multi‑agent baselines on retrieval, identification, and resolution across personal workspaces and software repositories. Iteration redirects capacity toward undiscovered problems, while thought templates anchor each prediction in a recognizable problem class. Together they recast proactive assistance as a multi‑step discovery process over context.
Templates are built once from a pool of solved cases and remain fixed at inference, which already proves effective and transfers across backbones. Updating the library online from agent interactions or augmenting the pool with automatically constructed cases are natural extensions. Iterative discovery trades a small bounded budget for broader coverage, a trade‑off our analyses show to be favorable against multi‑agent baselines at matched budgets.
TIDE is intended to help users surface hidden problems that would otherwise go unaddressed, but operating over real‑world documents and distilled templates can expose sensitive, biased, or undesirable content. We therefore recommend standard safeguards such as content filtering, bias detection, and human‑in‑the‑loop review during both template construction and deployment. These practices align with best practices for responsibly deploying LLM‑based agents.