
When agent workflows fail, developers often assume that the problem lies with the reasoning capabilities of the underlying model. In fact, the limited information provided by the search interface is often the main limiting factor.
Researchers at many universities offer a technique called direct corpus interaction (DCI) that allows agents to bypass all input models by directly searching the raw corpus using standard command-line tools.
Limits of classical search
In Classic search engines like RAGdocuments are decomposed, converted to vector images (or inputs) and indexed offline in a vector database. When the AI system processes the query, the retriever filters the entire database to return the rating "top-k" a list of document fragments matching the query. All evidence must pass through this evaluation mechanism before any downstream justification can occur.
But modern agent applications require more. "Tight search is very useful for broad semantic recall, but when an agent must solve a multi-step task, it often needs to search for exact strings, numbers, versions, error codes, file paths, or sparse combinations of clues." The authors of the DCI paper said in comments to VentureBeat. "These long tail details are where semantic similarity can be fragile."
Unlike static search, agents must dynamically revise their search plans after observing partial or localized evidence. Exact lexical constraints and multi-step hypothesis specification are difficult to perform with semantic retrievers. Because the retriever compresses the input in one step, any critical evidence filtered by the similarity search cannot be retrieved later, no matter how advanced the agent’s downward reasoning capabilities are. As the authors explain, current search pipelines can become a bottleneck because "they decide very early on what the agent is allowed to see."
Direct interaction of the body
This direct access solves a major problem in enterprise environments: data obsolescence. Indexing is always a snapshot, requiring considerable computation and time to build and maintain.
"In many enterprise settings, data is not a fixed set of documents. Daily financial reports, live logs, tickets, code commits, configuration files, incident graphs and internal documents that keep changing," the authors said. DCI allows the agent to make more inferences about the current state of the workspace than yesterday’s vector index.
The agent runs in a terminal-like environment where its observations are raw tool outputs such as file paths, matching text ranges, and surrounding lines. The basic tools provided by DCI are few but highly expressive. Agents use commands like “find” and “glob” to navigate directory structures and find files. For exact matches, they use “grep” and “rg” to find specific keywords, regex patterns, and exact strings. When local validation is needed, “head”, “tail”, “sed”, “cat” and lightweight Python scripts allow the agent to look at the context surrounding the match or read specific file sections.
An agent can combine these tools via shell pipelines to execute complex search logic in a single step. An agent can send commands to enforce strict lexical restrictions, such as searching a file for one term and forwarding output to search for a second term. By searching for a specific file type, such as searching for keywords, it can combine many weak clues in the corpus. "report," and filtration as a year "2024." It can also test the hypothesis instantly by checking the exact lines around the keyword match.
Instead of relying on location-based similarity search, DCI directly delegates semantic interpretation to the agent. An agent can formulate hypotheses, test precise lexical patterns, and extract detailed information that a traditional semantic retriever might miss.
Researchers propose two versions of this system. DCI-Agent-Lite is designed as a lightweight, low-cost installation built on the GPT-5.4 nano model and limited to only raw terminal interaction such as bash commands and basic file reads. Because reading raw files can quickly fill the memory of a smaller model, this version relies on lightweight runtime context management strategies to continue exploration over the long horizon.
DCI-Agent-CC is a higher performance version designed for teams with larger computing budgets. It continues Claude Code Powered by Claude Sonnet 4.6. Claude Code provides stronger motivation, more robust tool orchestration, and superior built-in context management, improving agent stability during complex, multistage searches across heterogeneous datasets.
DCI is in action
The researchers tested both versions of DCI on agent search criteria such as BrowseComp-Plus, knowledge-intensive QA with single-hop and multi-hop reasoning, and information retrieval ranking in tasks requiring domain-specific reasoning and scientific fact-checking.
They tested DCI against three bases. The first includes open weight search agents Search-R1 and special agents equipped with frontier models such as the GPT-5 and Claude Sonnet 4.6 combined with standard retrievers. The second base includes classic sparse retrievers such as BM25 and dense retrievers such as OpenAI’s text embedding-3-large and Qwen3-Embedding-8B. The third base consisted of high-performance reasoning-oriented rerankers such as ReasonRank-32B and Rank-R1.
DCI systematically outperformed key indicators, according to researchers. In the comprehensive BrowseComp-Plus benchmark, replacing the traditional Qwen3 semantic retriever with DCI on a Claude Sonnet 4.6 backbone increased accuracy from 69.0% to 80.0% while reducing API cost from $1,440 to $1,016. The return on investment for light agents was also noticeable. DCI-Agent-Lite with GPT-5.4 nano competes with the OpenAI o3 model using traditional search, reducing costs by more than $600.
According to the researchers, in multi-hop QA benchmarks, DCI-Agent-CC achieved an average accuracy of 83.0%, improving the strongest open weight search database by 30.7 points.
The data show that DCI recalls fewer common documents than dense placement models, but extracts significantly more value from a relevant document once it finds it.
"If an enterprise AI leader were to ask where DCI is most useful, I would point to tasks that require pinpoint evidence localization in a dynamic workspace: troubleshooting production incidents, searching large codebases, log analysis, compliance research, audit trails, or multi-document root cause analysis." researchers note.
In one complex deep-search task, the agent had to identify a specific soccer match based on 12 interconnected clues, including exact attendances, yellow cards, and player dates of birth. A traditional retriever fails by uncovering short, disconnected fragments. Instead, the DCI agent scoured the file directory, read specific lines from the 1990 England-Belgium match report to verify the exact number of substitutions, extracted a specific quote from the interview file, and checked the two players’ exact dates of birth by looking at the Wikipedia text files. By chaining these simple commands, DCI ensures that no evidence is permanently lost behind a flawed semantic search algorithm.
Limits and practical application of DCI
DCI has a clear operating envelope where it measures great in search depth but struggles with search breadth. When the experimental corpus was expanded from 100,000 to 400,000 documents, the accuracy of the system decreased significantly and the average number of tool calls increased. Although DCI is powerful once a promising document is found, the cost of locating that initial useful anchor document increases dramatically as the size of the candidate space increases.
DCI also has less extensive document recall than dense placement models. It engages in comprehensive recall for high-resolution, local precision. If enterprise workflows strictly require finding every relevant document in a massive database, DCI may not be the right tool.
Giving the agent expressive tools such as an unlimited bash shell increases latency and computational cost due to the high volume of iterative tool calls required to complete the search. It also creates significant context-management and security challenges for IT departments.
"Tool calls can return large outputs; long trajectories can fill the context window; and raw terminal access requires sandboxing, permission controls, and careful engineering," the authors said. To control the context window, the researchers found that moderate truncation and compression help the agent perform longer searches, while overly aggressive generalization discards useful evidence.
Due to these operational realities, DCI is not intended to be a mandatory replacement of existing vectoring infrastructure. On the contrary, it serves as a complement.
"In our opinion, the most practical near-term deployment pattern for orchestration engineers and data architects is a hybrid," the authors said. Semantic search can still provide high recall candidate discovery when user intent is not broadly or precisely specified. "DCI can then act as a precision and verification layer: the agent can search within retrieved documents, expand from them to neighboring files, check precision constraints, and combine weak signals between documents."
The researchers announced DCI code under the permitted MIT license.
"Longer-term DCI is changing the way we think about enterprise data. Data will not only need to be stored for humans or indexed for search engines; must be arranged for agents who can check, compare, investigate, track and verify," the authors conclude. "File names, timestamps, fixed identifiers, metadata, version history, and machine-readable structure become part of the search interface."





