File Editing Methods
The core approaches AI coding agents use to modify your files, from brute-force rewrites to surgical precision, hash anchors, and extension-hosted editors.
1. Whole File Replacement
High Token Cost Most ReliableWhat It Does
Completely overwrites the target file with new content. The agent provides the file path and the entire new content.
Why It Still Fails
In theory this is the most reliable method β no surgical precision
required. In practice it breaks down due to platform encoding
mismatches: agents trained on Linux (LF line endings) sometimes write
files on Windows (CRLF) or vice versa, producing mixed or incorrect
line endings. BOM byte-order marks, invisible whitespace, and charset
assumptions also cause silent corruption. When these fail, agents
often resort to hacky workarounds β shell commands like
sed or printf β that can introduce
additional errors rather than fix the root cause.
How It Works
writeFile(path, entireNewContent)
Example (Cline's XML Format)
<write_to_file>
<path>src/utils/helper.js</path>
<content>
// Entire file content goes here
export function add(a, b) {
return a + b;
}
export function subtract(a, b) {
return a - b;
}
</content>
</write_to_file>
When to Use
- Creating new files β No existing content to match
- Scaffolding/Boilerplate β Generating entire templates
- Small files (< 100 lines) β Token cost is negligible
- Massive refactors (> 50% changed) β More efficient than many edits
- Last resort fallback β When surgical methods fail
The Danger of Full Rewrites
If the AI doesn't have the complete file in context (or forgets parts), it might accidentally delete existing code. This is why agents like Cline recommend reading the file first and only use this as a fallback.
Who Uses It
All agents support this as a creation tool and fallback. MCP-SuperAssistant uses it as the primary method (the "minimalist" approach).
2. Search & Replace Blocks
Low Token Cost Requires Exact MatchWhat It Does
Surgical edits using a "Find this exact text" and "Replace with this" pattern. The AI specifies a block of code to find and what to replace it with.
Format Variations
Different agents use different markers, but the concept is the same:
app.py
```python
<<<<<<< SEARCH
def hello():
print("world")
=======
def hello():
print("Hello, world!")
>>>>>>> REPLACE
```
Uses git-conflict-style markers. Requires 5-9 angle brackets.
<replace_in_file>
<path>src/app.py</path>
<diff>
------- SEARCH
def hello():
print("world")
=======
def hello():
print("Hello, world!")
+++++++ REPLACE
</diff>
</replace_in_file>
XML wrapper with dash/plus markers. Supports legacy
<<<< format too.
EditTool({
filePath: "/abs/path/to/app.py",
oldString: `def hello():
print("world")`,
newString: `def hello():
print("Hello, world!")`,
replaceAll: false
})
JSON/TypeScript-style arguments. replaceAll flag for
bulk replacements.
The Critical Rule
SEARCH Must Match Exactly
The search block must match the file content character-for-character, including whitespace, indentation, comments, and newlines. This is where most edits fail.
Why Models Get It Wrong
- Tab vs. spaces: The model outputs spaces, but the file uses tabs
- Trailing whitespace: Invisible spaces at line ends
- Hallucinated code: The model "remembers" code that's slightly different
- Stale context: The file changed since the model read it
Who Uses It
Cline (primary tool), Aider (diff format), OpenCode (EditTool), Grok CLI (str_replace_editor)
3. Unified Diff / Patch Format
Very Token Efficient Specialized ModelsWhat It Does
Uses standard diff format (or custom variations) to
describe changes. Lines starting with - are removed,
lines with + are added, and unchanged context lines help
locate the edit.
Standard Unified Diff
--- app.py
+++ app.py
@@ -10,7 +10,7 @@
def calculate_total(items):
- return sum(item.price for item in items)
+ subtotal = sum(item.price for item in items)
+ tax = subtotal * 0.1
+ return subtotal + tax
def main():
Codex's Custom Patch Format
Codex/Claude Code uses a custom patch syntax with special markers for multi-file operations:
*** Begin Patch
*** Add File: src/new_module.py
+"""New module docstring"""
+
+def new_function():
+ pass
*** Update File: src/app.py
@@ def calculate_total():
- return sum(prices)
+ subtotal = sum(prices)
+ return subtotal * 1.1
*** Delete File: src/deprecated.py
*** End Patch
Why Custom Formats?
Advantages
- Extremely token efficient
- Multiple files in one patch
- Clear operation semantics (add/update/delete)
- Familiar to developers (like git)
Challenges
- High cognitive load for generic LLMs
- Easy to hallucinate line numbers
- Context lines must match exactly
- Best with models trained on diffs
Who Uses It
Codex/Claude Code (custom patch format, primary method), Aider (standard udiff, optional format), Cline (V4A format for GPT-5 models only)
4. Line-Based / Anchor Matching
Fallback Strategy Some False Positive RiskWhat It Does
When exact matching fails, use the first and last lines of a code block as "anchors" to locate the target. The middle content is verified with fuzzy matching or similarity scoring.
How It Works
function applyAnchorEdit(fileLines, searchLines, replaceLines) {
const startAnchor = searchLines[0].trim();
const endAnchor = searchLines[searchLines.length - 1].trim();
// Find start line index in real file
const startIndex = fileLines.findIndex(line =>
line.trim() === startAnchor
);
// Find end line index (after start index)
const endIndex = fileLines.findIndex((line, idx) =>
idx > startIndex && line.trim() === endAnchor
);
if (startIndex !== -1 && endIndex !== -1) {
// Optionally: verify middle content similarity
if (similarityScore(middle) > 0.5) {
return [
...fileLines.slice(0, startIndex),
...replaceLines,
...fileLines.slice(endIndex + 1)
].join("\n");
}
}
throw new Error("Anchors not found or content mismatch");
}
Why This Helps
AI models often get the first and last lines of a function/block correct but hallucinate minor differences in the middle (comments, whitespace, formatting). Anchor matching says: "If the boundaries match and the size is right, it's probably the target."
The Risk
False Positives
If the first and last lines aren't unique (e.g., multiple
functions starting with def process():), you might
edit the wrong block. Agents mitigate this with similarity
thresholds (OpenCode uses 50%) and size checks.
Who Uses It
Cline (Tier 3 fallback: "Block Anchor Fallback Match"), OpenCode (BlockAnchorReplacer + ContextAwareReplacer)
5. Multi-Edit / Atomic Operations
Efficient I/O All-or-NothingWhat It Does
Apply multiple disjoint edits to a single file in one atomic operation. Read the file once, apply all changes in memory (handling offset shifts), write once.
Example (OpenCode's MultiEditTool)
MultiEditTool({
filePath: "/abs/path/to/file.ts",
edits: [
{
oldString: "const API_URL = 'http://localhost'",
newString: "const API_URL = process.env.API_URL"
},
{
oldString: "console.log('debug')",
newString: "logger.debug('request received')"
},
{
oldString: "// TODO: add auth",
newString: "validateToken(req.headers.authorization)"
}
]
})
Why Use Multi-Edit?
- Performance: One file read + one file write instead of N operations
- Atomicity: All edits succeed or none do (easier to rollback)
- Offset handling: The tool manages line number shifts as edits are applied
- Common use case: Renaming variables, updating imports, bulk find-replace
Complexity
Multi-edit tools must handle overlapping edits (what if two edits affect the same lines?) and offset shifts (if edit 1 adds 3 lines, edit 2's target line is now 3 lines later). OpenCode applies edits sequentially in memory to handle this.
Who Uses It
OpenCode (dedicated MultiEditTool),
Cline (multiple SEARCH/REPLACE blocks in one
replace_in_file call)
6. Cryptographic Hash-Anchor Editing
Low Token Cost Precise StatefulWhat It Does
Instead of having the LLM describe which code to find (which
hallucinates), Dirac gives every line a
unique, opaque word anchor β like
AppleBanana or RiverMountain. The LLM
references these anchors directly, and Dirac resolves them with
cryptographic certainty. No fuzzy matching, no
whitespace sensitivity, no "what if the model remembers wrong."
The Cryptographic Core: FNV-1a Hashing
Every line in a file gets hashed with FNV-1a (FowlerβNollβVo), a blazing-fast non-cryptographic hash that produces a 32-bit integer. The key property: any change at all β an extra space, a different letter, a missing comma β produces a completely different hash with no similarity to the original. This is the cryptographic guarantee that anchors can't be spoofed.
export function contentHash(content: string): string {
let h = 2166136261 // FNV-1a offset basis
for (let i = 0; i < content.length; i++) {
h = Math.imul(h ^ content.charCodeAt(i), 16777619)
}
return (h >>> 0).toString(16).padStart(8, "0")
}
Why Words Instead of Raw Hashes?
Raw hex hashes like 4a7b2f01 are meaningless to LLMs β
they look like noise and models can't reliably reproduce them. Dirac
instead maps each hash to a two-word combination from
a dictionary file (.hash_anchors), producing
human-readable anchors like PlasticGrass,
HappyWhistle, or BlueVortex.
The mapping is stable across edits: if a line doesn't change, its FNV-1a hash stays the same, and it keeps the same word anchor forever. When a line does change, the old anchor is retired and a new random word pair is assigned.
private static getUniqueWord(usedWords: Set<string>, pool: string[]): string {
while (true) {
if (pool.length === 0) {
AnchorStateManager.refill(usedWords, pool)
}
const word = pool.pop()!
if (!usedWords.has(word)) return word
}
}
// refill generates 10,000 two-word combinations from dictionary:
// w1 = dict[random], w2 = dict[random], word = w1 + w2
// e.g., "Plastic" + "Grass" β "PlasticGrass"
How the LLM Sees Anchored Files
When Dirac reads a file, every line is prefixed with its word anchor
and a delimiter (by default :):
AppleBanana:import { ToolUse } from "@core/assistant-message"
RiverMountain:import { splitAnchor, stripHashes, getDelimiter } from "@utils/line-hashing"
CloudForest:import { AppliedEdit, Edit, FailedEdit, ResolvedEdit } from "./types"
GoldenFalcon:
DesertBloom:export class EditExecutor {
How the LLM Edits
The LLM doesn't copy code β it references anchor names:
{
"path": "src/core/EditExecutor.ts",
"edits": [
{
"anchor": "AppleBanana:import { ToolUse } from \"@core/assistant-message\"",
"edit_type": "replace",
"text": "import { ToolUse, ToolResult } from \"@core/assistant-message\""
}
]
}
The anchor format is WordWord:actual_line_content. Dirac
verifies both: it looks up the word anchor to find
the line index, then checks that the provided content matches the
actual file line byte-for-byte. If either fails, the edit is rejected
with a precise error.
Triple Verification
1. Anchor word exists in file β 2. Provided line content matches actual content character-for-character β 3. Range is valid (start β€ end). All three must pass before any edit is applied.
Myers Diff Reconciliation
The brilliance of this system: it survives edits. When the LLM
modifies lines, the AnchorStateManager reconciles the new
file content with the old anchors using a
Myers diff algorithm:
- Unchanged lines: Keep their exact word anchors (FNV-1a hash matches β same word)
- Changed lines: Old anchor retired, new word generated
- Inserted lines: Get new word anchors
- Deleted lines: Their words are freed for reuse
This means the LLM can read a file once, get anchors, and those anchors stay valid through multiple editing rounds β even if the file has been partially changed. Only lines that actually changed get new words.
Why This Beats Every Other Method
| Problem | Search/Replace | Unified Diff | Anchor Match | Hash-Anchor |
|---|---|---|---|---|
| Whitespace sensitivity | β Fails | β Fails | β Tolerant | β Irrelevant |
| Hallucinated content | β Fails | β Fails | β οΈ Partial | β Detected & rejected |
| Token cost (large files) | β Low | β Low | β Low | β Low |
| Duplicate line ambiguity | β Wrong instance | β Wrong hunk | β οΈ Risk | β Unique per line |
| Survives partial edits | β οΈ Stale context | β οΈ Offset shift | β Anchors break | β Myers diff reconcile |
Who Uses It
Dirac β exclusive. This is Dirac's core innovation and the only agent using word-anchored cryptographic line hashing. Combined with its AST-aware tools, Dirac achieves 64.8% cost reduction over traditional search/replace agents.
Real-World Impact
64.8% of tokens saved represent anchor lookups that succeeded on the first try β no retries, no fuzzy fallbacks, no "file changed since you read it." The hash guarantees correctness; the words make it usable by LLMs.
Summary: Choosing the Right Method
| Scenario | Recommended Method | Why |
|---|---|---|
| Creating a new file | Whole File | No existing content to match |
| Changing one function | Search & Replace | Precise, low token cost |
| Multi-file refactor | Unified Diff / Patch | Token efficient, multiple files at once |
| Exact match failing | Anchor Matching | Handles whitespace/formatting differences |
| Renaming variable everywhere | Multi-Edit | Atomic, handles offset shifts |
| Surgical precision, multiple rounds | Hash-Anchor | Cryptographic certainty, survives partial edits |
| Everything else failed | Whole File | Nuclear option, always works |