← AI Coding Guides β€Ί Deep Dives

File Editing Methods

The core approaches AI coding agents use to modify your files, from brute-force rewrites to surgical precision, hash anchors, and extension-hosted editors.

1. Whole File Replacement

High Token Cost Most Reliable

What It Does

Completely overwrites the target file with new content. The agent provides the file path and the entire new content.

Why It Still Fails

In theory this is the most reliable method β€” no surgical precision required. In practice it breaks down due to platform encoding mismatches: agents trained on Linux (LF line endings) sometimes write files on Windows (CRLF) or vice versa, producing mixed or incorrect line endings. BOM byte-order marks, invisible whitespace, and charset assumptions also cause silent corruption. When these fail, agents often resort to hacky workarounds β€” shell commands like sed or printf β€” that can introduce additional errors rather than fix the root cause.

How It Works

Pseudo-code
writeFile(path, entireNewContent)

Example (Cline's XML Format)

XML
<write_to_file>
  <path>src/utils/helper.js</path>
  <content>
// Entire file content goes here
export function add(a, b) {
  return a + b;
}

export function subtract(a, b) {
  return a - b;
}
  </content>
</write_to_file>

When to Use

⚠️

The Danger of Full Rewrites

If the AI doesn't have the complete file in context (or forgets parts), it might accidentally delete existing code. This is why agents like Cline recommend reading the file first and only use this as a fallback.

Who Uses It

All agents support this as a creation tool and fallback. MCP-SuperAssistant uses it as the primary method (the "minimalist" approach).

2. Search & Replace Blocks

Low Token Cost Requires Exact Match

What It Does

Surgical edits using a "Find this exact text" and "Replace with this" pattern. The AI specifies a block of code to find and what to replace it with.

Format Variations

Different agents use different markers, but the concept is the same:

Aider's SEARCH/REPLACE Format
app.py
```python
<<<<<<< SEARCH
def hello():
    print("world")
=======
def hello():
    print("Hello, world!")
>>>>>>> REPLACE
```

Uses git-conflict-style markers. Requires 5-9 angle brackets.

Cline's SEARCH/REPLACE Format
<replace_in_file>
  <path>src/app.py</path>
  <diff>
------- SEARCH
def hello():
    print("world")
=======
def hello():
    print("Hello, world!")
+++++++ REPLACE
  </diff>
</replace_in_file>

XML wrapper with dash/plus markers. Supports legacy <<<< format too.

OpenCode's Edit Tool
EditTool({
  filePath: "/abs/path/to/app.py",
  oldString: `def hello():
    print("world")`,
  newString: `def hello():
    print("Hello, world!")`,
  replaceAll: false
})

JSON/TypeScript-style arguments. replaceAll flag for bulk replacements.

The Critical Rule

πŸ’‘

SEARCH Must Match Exactly

The search block must match the file content character-for-character, including whitespace, indentation, comments, and newlines. This is where most edits fail.

Why Models Get It Wrong

Who Uses It

Cline (primary tool), Aider (diff format), OpenCode (EditTool), Grok CLI (str_replace_editor)

3. Unified Diff / Patch Format

Very Token Efficient Specialized Models

What It Does

Uses standard diff format (or custom variations) to describe changes. Lines starting with - are removed, lines with + are added, and unchanged context lines help locate the edit.

Standard Unified Diff

Unified Diff (udiff)
--- app.py
+++ app.py
@@ -10,7 +10,7 @@
 def calculate_total(items):
-    return sum(item.price for item in items)
+    subtotal = sum(item.price for item in items)
+    tax = subtotal * 0.1
+    return subtotal + tax
 
 def main():

Codex's Custom Patch Format

Codex/Claude Code uses a custom patch syntax with special markers for multi-file operations:

Codex Patch Format
*** Begin Patch
*** Add File: src/new_module.py
+"""New module docstring"""
+
+def new_function():
+    pass

*** Update File: src/app.py
@@ def calculate_total():
-    return sum(prices)
+    subtotal = sum(prices)
+    return subtotal * 1.1

*** Delete File: src/deprecated.py
*** End Patch

Why Custom Formats?

Advantages

  • Extremely token efficient
  • Multiple files in one patch
  • Clear operation semantics (add/update/delete)
  • Familiar to developers (like git)

Challenges

  • High cognitive load for generic LLMs
  • Easy to hallucinate line numbers
  • Context lines must match exactly
  • Best with models trained on diffs

Who Uses It

Codex/Claude Code (custom patch format, primary method), Aider (standard udiff, optional format), Cline (V4A format for GPT-5 models only)

4. Line-Based / Anchor Matching

Fallback Strategy Some False Positive Risk

What It Does

When exact matching fails, use the first and last lines of a code block as "anchors" to locate the target. The middle content is verified with fuzzy matching or similarity scoring.

How It Works

Pseudo-code (from OpenCode/Cline analysis)
function applyAnchorEdit(fileLines, searchLines, replaceLines) {
    const startAnchor = searchLines[0].trim();
    const endAnchor = searchLines[searchLines.length - 1].trim();

    // Find start line index in real file
    const startIndex = fileLines.findIndex(line => 
        line.trim() === startAnchor
    );
    
    // Find end line index (after start index)
    const endIndex = fileLines.findIndex((line, idx) => 
        idx > startIndex && line.trim() === endAnchor
    );

    if (startIndex !== -1 && endIndex !== -1) {
        // Optionally: verify middle content similarity
        if (similarityScore(middle) > 0.5) {
            return [
                ...fileLines.slice(0, startIndex),
                ...replaceLines,
                ...fileLines.slice(endIndex + 1)
            ].join("\n");
        }
    }
    throw new Error("Anchors not found or content mismatch");
}

Why This Helps

AI models often get the first and last lines of a function/block correct but hallucinate minor differences in the middle (comments, whitespace, formatting). Anchor matching says: "If the boundaries match and the size is right, it's probably the target."

The Risk

⚠️

False Positives

If the first and last lines aren't unique (e.g., multiple functions starting with def process():), you might edit the wrong block. Agents mitigate this with similarity thresholds (OpenCode uses 50%) and size checks.

Who Uses It

Cline (Tier 3 fallback: "Block Anchor Fallback Match"), OpenCode (BlockAnchorReplacer + ContextAwareReplacer)

5. Multi-Edit / Atomic Operations

Efficient I/O All-or-Nothing

What It Does

Apply multiple disjoint edits to a single file in one atomic operation. Read the file once, apply all changes in memory (handling offset shifts), write once.

Example (OpenCode's MultiEditTool)

TypeScript
MultiEditTool({
    filePath: "/abs/path/to/file.ts",
    edits: [
        { 
            oldString: "const API_URL = 'http://localhost'",
            newString: "const API_URL = process.env.API_URL"
        },
        {
            oldString: "console.log('debug')",
            newString: "logger.debug('request received')"
        },
        {
            oldString: "// TODO: add auth",
            newString: "validateToken(req.headers.authorization)"
        }
    ]
})

Why Use Multi-Edit?

Complexity

Multi-edit tools must handle overlapping edits (what if two edits affect the same lines?) and offset shifts (if edit 1 adds 3 lines, edit 2's target line is now 3 lines later). OpenCode applies edits sequentially in memory to handle this.

Who Uses It

OpenCode (dedicated MultiEditTool), Cline (multiple SEARCH/REPLACE blocks in one replace_in_file call)

6. Cryptographic Hash-Anchor Editing

Low Token Cost Precise Stateful

What It Does

Instead of having the LLM describe which code to find (which hallucinates), Dirac gives every line a unique, opaque word anchor β€” like AppleBanana or RiverMountain. The LLM references these anchors directly, and Dirac resolves them with cryptographic certainty. No fuzzy matching, no whitespace sensitivity, no "what if the model remembers wrong."

The Cryptographic Core: FNV-1a Hashing

Every line in a file gets hashed with FNV-1a (Fowler–Noll–Vo), a blazing-fast non-cryptographic hash that produces a 32-bit integer. The key property: any change at all β€” an extra space, a different letter, a missing comma β€” produces a completely different hash with no similarity to the original. This is the cryptographic guarantee that anchors can't be spoofed.

FNV-1a Hash β€” dirac/src/utils/line-hashing.ts
export function contentHash(content: string): string {
    let h = 2166136261          // FNV-1a offset basis
    for (let i = 0; i < content.length; i++) {
        h = Math.imul(h ^ content.charCodeAt(i), 16777619)
    }
    return (h >>> 0).toString(16).padStart(8, "0")
}

Why Words Instead of Raw Hashes?

Raw hex hashes like 4a7b2f01 are meaningless to LLMs β€” they look like noise and models can't reliably reproduce them. Dirac instead maps each hash to a two-word combination from a dictionary file (.hash_anchors), producing human-readable anchors like PlasticGrass, HappyWhistle, or BlueVortex.

The mapping is stable across edits: if a line doesn't change, its FNV-1a hash stays the same, and it keeps the same word anchor forever. When a line does change, the old anchor is retired and a new random word pair is assigned.

Word Generation β€” dirac/src/utils/AnchorStateManager.ts
private static getUniqueWord(usedWords: Set<string>, pool: string[]): string {
    while (true) {
        if (pool.length === 0) {
            AnchorStateManager.refill(usedWords, pool)
        }
        const word = pool.pop()!
        if (!usedWords.has(word)) return word
    }
}

// refill generates 10,000 two-word combinations from dictionary:
// w1 = dict[random], w2 = dict[random], word = w1 + w2
// e.g., "Plastic" + "Grass" β†’ "PlasticGrass"

How the LLM Sees Anchored Files

When Dirac reads a file, every line is prefixed with its word anchor and a delimiter (by default :):

Anchored File Output
AppleBanana:import { ToolUse } from "@core/assistant-message"
RiverMountain:import { splitAnchor, stripHashes, getDelimiter } from "@utils/line-hashing"
CloudForest:import { AppliedEdit, Edit, FailedEdit, ResolvedEdit } from "./types"
GoldenFalcon:
DesertBloom:export class EditExecutor {

How the LLM Edits

The LLM doesn't copy code β€” it references anchor names:

Dirac Edit Format
{
  "path": "src/core/EditExecutor.ts",
  "edits": [
    {
      "anchor": "AppleBanana:import { ToolUse } from \"@core/assistant-message\"",
      "edit_type": "replace",
      "text": "import { ToolUse, ToolResult } from \"@core/assistant-message\""
    }
  ]
}

The anchor format is WordWord:actual_line_content. Dirac verifies both: it looks up the word anchor to find the line index, then checks that the provided content matches the actual file line byte-for-byte. If either fails, the edit is rejected with a precise error.

πŸ”’

Triple Verification

1. Anchor word exists in file β†’ 2. Provided line content matches actual content character-for-character β†’ 3. Range is valid (start ≀ end). All three must pass before any edit is applied.

Myers Diff Reconciliation

The brilliance of this system: it survives edits. When the LLM modifies lines, the AnchorStateManager reconciles the new file content with the old anchors using a Myers diff algorithm:

This means the LLM can read a file once, get anchors, and those anchors stay valid through multiple editing rounds β€” even if the file has been partially changed. Only lines that actually changed get new words.

Why This Beats Every Other Method

Problem Search/Replace Unified Diff Anchor Match Hash-Anchor
Whitespace sensitivity ❌ Fails ❌ Fails βœ… Tolerant βœ… Irrelevant
Hallucinated content ❌ Fails ❌ Fails ⚠️ Partial βœ… Detected & rejected
Token cost (large files) βœ… Low βœ… Low βœ… Low βœ… Low
Duplicate line ambiguity ❌ Wrong instance ❌ Wrong hunk ⚠️ Risk βœ… Unique per line
Survives partial edits ⚠️ Stale context ⚠️ Offset shift ❌ Anchors break βœ… Myers diff reconcile

Who Uses It

Dirac β€” exclusive. This is Dirac's core innovation and the only agent using word-anchored cryptographic line hashing. Combined with its AST-aware tools, Dirac achieves 64.8% cost reduction over traditional search/replace agents.

πŸ“Š

Real-World Impact

64.8% of tokens saved represent anchor lookups that succeeded on the first try β€” no retries, no fuzzy fallbacks, no "file changed since you read it." The hash guarantees correctness; the words make it usable by LLMs.

Summary: Choosing the Right Method

Scenario Recommended Method Why
Creating a new file Whole File No existing content to match
Changing one function Search & Replace Precise, low token cost
Multi-file refactor Unified Diff / Patch Token efficient, multiple files at once
Exact match failing Anchor Matching Handles whitespace/formatting differences
Renaming variable everywhere Multi-Edit Atomic, handles offset shifts
Surgical precision, multiple rounds Hash-Anchor Cryptographic certainty, survives partial edits
Everything else failed Whole File Nuclear option, always works