Build Your Own AI File Editor
A practical guide with code examples. Implement fallback cascades, matching algorithms, and the verification loop that makes AI file editing robust.
The Golden Rule: Never Fail on First Try
LLMs are non-deterministic. They will hallucinate whitespace, forget context, and misquote code. Your file editing system must handle these variations gracefully.
The Fallback Ladder
Implement your editing logic as a waterfall. Each tier is a fallback for the previous one:
- Exact Match β Fast, safe, preferred
- Whitespace Flexible β Handle indentation differences
- Anchor Matching β Match first/last lines, fuzzy middle
- Diff/Patch β Use diff-match-patch libraries
- Full Overwrite β Nuclear option, always works
Alternative approach β Hash-Anchor Editing: To eliminate the fallback cascade entirely, use cryptographic line hashing (see Methods β Hash-Anchor Editing). Assign every line a unique word anchor via FNV-1a hashing. The LLM references anchors directly instead of describing code to find. No fuzzy matching, no whitespace sensitivity, no retries. Implemented exclusively by Dirac.
Step 1: Define Your Tools
Provide your AI with two distinct tools: one for surgical edits, one for full rewrites.
Tool A: replace_in_file (Preferred)
<replace_in_file>
<path>src/app.py</path>
<diff>
<<<< SEARCH
def calculate_total(items):
return sum(item.price for item in items)
====
def calculate_total(items):
subtotal = sum(item.price for item in items)
tax = subtotal * 0.1
return subtotal + tax
>>>> REPLACE
</diff>
</replace_in_file>
Tool B: write_to_file (Fallback / Creation)
<write_to_file>
<path>src/utils.py</path>
<content>
def helper():
"""Helper function."""
return True
</content>
</write_to_file>
Why XML Over JSON?
Code inside JSON strings requires escaping (\",
\n, \\). This is error-prone for LLMs
and wastes tokens. XML lets you embed code content directly
without escaping.
Step 2: Implement Matching Algorithms
Tier 1: Exact Match
The happy path. Fast and reliable when it works.
def exact_match(file_content: str, search: str, replace: str) -> str | None:
"""Try exact string replacement."""
if search in file_content:
return file_content.replace(search, replace, 1) # Replace first occurrence
return None # Failed, try next tier
Tier 2: Whitespace Flexible
Normalize whitespace before matching, but preserve original indentation in output.
def whitespace_flexible_match(file_content: str, search: str, replace: str) -> str | None:
"""Match after normalizing whitespace."""
file_lines = file_content.split('\n')
search_lines = search.split('\n')
# Trim each line for comparison
search_trimmed = [line.strip() for line in search_lines]
for i in range(len(file_lines) - len(search_lines) + 1):
file_chunk = file_lines[i:i + len(search_lines)]
file_trimmed = [line.strip() for line in file_chunk]
if file_trimmed == search_trimmed:
# Match found! Calculate indentation from original
base_indent = len(file_lines[i]) - len(file_lines[i].lstrip())
indent = ' ' * base_indent
# Apply indentation to replacement
replace_lines = replace.split('\n')
indented_replace = [indent + line if line.strip() else line
for line in replace_lines]
# Reconstruct file
return '\n'.join(
file_lines[:i] +
indented_replace +
file_lines[i + len(search_lines):]
)
return None # Failed, try next tier
Tier 3: Anchor Matching
Match first and last lines as "anchors", fuzzy-match the middle. Based on OpenCode/Cline.
def anchor_match(file_content: str, search: str, replace: str,
similarity_threshold: float = 0.5) -> str | None:
"""Match using first/last lines as anchors."""
file_lines = file_content.split('\n')
search_lines = search.split('\n')
if len(search_lines) < 3:
return None # Need at least 3 lines for anchor matching
start_anchor = search_lines[0].strip()
end_anchor = search_lines[-1].strip()
expected_length = len(search_lines)
# Find start anchor
for i, line in enumerate(file_lines):
if line.strip() != start_anchor:
continue
# Look for end anchor within reasonable range
for j in range(i + 2, min(i + expected_length * 2, len(file_lines))):
if file_lines[j].strip() != end_anchor:
continue
# Found potential match. Verify middle similarity.
file_middle = file_lines[i+1:j]
search_middle = search_lines[1:-1]
if calculate_similarity(file_middle, search_middle) >= similarity_threshold:
# Match confirmed! Replace the block.
replace_lines = replace.split('\n')
return '\n'.join(
file_lines[:i] +
replace_lines +
file_lines[j+1:]
)
return None # Failed, try next tier
def calculate_similarity(lines_a: list, lines_b: list) -> float:
"""Calculate similarity between two line lists (0.0 to 1.0)."""
if not lines_a or not lines_b:
return 0.0
# Simple token-based Jaccard similarity
tokens_a = set(' '.join(lines_a).split())
tokens_b = set(' '.join(lines_b).split())
intersection = len(tokens_a & tokens_b)
union = len(tokens_a | tokens_b)
return intersection / union if union > 0 else 0.0
Tier 4: Diff-Match-Patch
Use Google's diff-match-patch library for fuzzy patching.
from diff_match_patch import diff_match_patch
def dmp_match(file_content: str, search: str, replace: str) -> str | None:
"""Use diff-match-patch for fuzzy matching."""
dmp = diff_match_patch()
# Create a patch from search β replace
patches = dmp.patch_make(search, replace)
# Try to apply it to the file
result, success = dmp.patch_apply(patches, file_content)
if all(success):
return result
return None # Some patches failed
The Master Function
Chain all tiers together:
def apply_edit(file_content: str, search: str, replace: str) -> tuple[str, str]:
"""Apply edit with fallback cascade. Returns (new_content, method_used)."""
# Tier 1: Exact match
result = exact_match(file_content, search, replace)
if result is not None:
return result, "exact_match"
# Tier 2: Whitespace flexible
result = whitespace_flexible_match(file_content, search, replace)
if result is not None:
return result, "whitespace_flexible"
# Tier 3: Anchor matching
result = anchor_match(file_content, search, replace)
if result is not None:
return result, "anchor_match"
# Tier 4: Diff-match-patch
result = dmp_match(file_content, search, replace)
if result is not None:
return result, "diff_match_patch"
# All tiers failed
raise EditFailedError(
f"Could not match search block. "
f"Tried: exact, whitespace, anchor, dmp. "
f"Search block:\n{search[:200]}..."
)
Step 3: The Verification Loop
Don't assume an edit works. Validate it immediately using LSP or a linter.
import subprocess
import json
def verify_edit(file_path: str) -> list[dict]:
"""Run linter and return any errors."""
# Example: Use pylint for Python files
if file_path.endswith('.py'):
result = subprocess.run(
['pylint', '--output-format=json', file_path],
capture_output=True, text=True
)
if result.stdout:
return json.loads(result.stdout)
# Example: Use eslint for JS/TS
elif file_path.endswith(('.js', '.ts', '.jsx', '.tsx')):
result = subprocess.run(
['eslint', '--format=json', file_path],
capture_output=True, text=True
)
if result.stdout:
data = json.loads(result.stdout)
return data[0].get('messages', []) if data else []
return []
def apply_and_verify(file_path: str, search: str, replace: str) -> str:
"""Apply edit and return result with diagnostics."""
# Read current file
with open(file_path, 'r') as f:
content = f.read()
# Apply edit
new_content, method = apply_edit(content, search, replace)
# Write file
with open(file_path, 'w') as f:
f.write(new_content)
# Verify
errors = verify_edit(file_path)
# Build response
response = f"Edit applied successfully using {method}."
if errors:
response += "\n\n<file_diagnostics>\n"
for err in errors:
line = err.get('line', '?')
msg = err.get('message', str(err))
response += f"Line {line}: {msg}\n"
response += "</file_diagnostics>"
return response
When the AI sees syntax errors in the response, it can self-correct without waiting for the user to run the code.
Step 4: Design Good Error Messages
When an edit fails, give the AI enough context to self-correct.
def format_edit_error(file_path: str, search: str, actual_content: str) -> str:
"""Format a helpful error message for the AI."""
# Find similar content in the file
search_first_line = search.split('\n')[0].strip()
similar_lines = []
for i, line in enumerate(actual_content.split('\n'), 1):
if search_first_line[:20] in line:
# Found potential match location
context_start = max(0, i - 3)
context_end = min(len(actual_content.split('\n')), i + 5)
similar_lines.append((i, actual_content.split('\n')[context_start:context_end]))
error_msg = f"""
# SEARCH block failed to match!
The following SEARCH block was not found in {file_path}:
```
{search[:500]}{'...' if len(search) > 500 else ''}
```
"""
if similar_lines:
error_msg += "## Did you mean one of these sections?\n\n"
for line_num, context in similar_lines[:3]:
error_msg += f"### Near line {line_num}:\n```\n"
error_msg += '\n'.join(context)
error_msg += "\n```\n\n"
else:
error_msg += """
## No similar content found.
Consider:
1. Use `read_file` to get the current file contents
2. Check if the file path is correct
3. If the file has changed, your context may be stale
"""
error_msg += """
## Remember:
- SEARCH must match EXACTLY (character-for-character)
- Include all whitespace, comments, and indentation
- Use 2-3 lines of context before and after for uniqueness
"""
return error_msg
Step 5: Write Your System Prompt
Here's a template based on the patterns we've seen in production agents:
You are an AI coding assistant with file editing capabilities.
## Available Tools
### replace_in_file
Make targeted edits to existing files using SEARCH/REPLACE blocks.
**Format:**
```xml
<replace_in_file>
<path>relative/path/to/file</path>
<diff>
<<<< SEARCH
[exact content to find]
====
[new content to replace with]
>>>> REPLACE
</diff>
</replace_in_file>
```
**Critical Rules:**
1. SEARCH must match EXACTLY (character-for-character, including whitespace)
2. Include 2-3 lines of context before/after for unique matching
3. Multiple SEARCH/REPLACE blocks must appear in file order
4. To delete code: Leave REPLACE section empty
5. To add code: Include surrounding context in SEARCH, add new lines in REPLACE
### write_to_file
Create new files or completely rewrite existing files.
**Format:**
```xml
<write_to_file>
<path>relative/path/to/file</path>
<content>
[entire file content]
</content>
</write_to_file>
```
**When to use:**
- Creating new files
- File is small (< 50 lines) and changing most of it
- replace_in_file has failed 3+ times
## Workflow
1. **Before editing:** Use read_file to get current content
2. **Prefer replace_in_file:** It's safer and more precise
3. **After editing:** Check the response for <file_diagnostics>
4. **If diagnostics show errors:** Fix them immediately
5. **If edit fails:** Re-read the file and try again with exact content
6. **After 3 failures:** Fall back to write_to_file
## Anti-Patterns
- NEVER use write_to_file on existing files without reading them first
- NEVER guess at file contentβread it first
- NEVER use placeholder comments like "... rest of code ..."
Architecture Overview
Extract path, search/replace content from AI output
Get actual content for matching
Try exact β whitespace β anchor β dmp
Save changes to disk
Check for syntax errors immediately
Success with any errors, or failure with suggestions
Best Practices Checklist
β Do
- Implement multiple fallback tiers
- Verify edits with LSP/linter
- Return helpful error messages
- Use 1-indexed line numbers for AI
- Support both surgical and full-file edits
- Handle Unicode normalization (smart quotes, em-dashes)
- Show diff previews before writing
- Track edit history for undo
β Don't
- Pass code inside JSON strings (escaping nightmare)
- Trust AI whitespace to be perfect
- Fail silentlyβalways explain what went wrong
- Skip verification after edits
- Allow overwrites without reading first
- Use 0-indexed lines in AI prompts
- Ignore model-specific quirks
Final Summary: What We Learned
| Concept | Recommendation | Source Inspiration |
|---|---|---|
| Primary edit method | Search/Replace with fallbacks | Cline, Aider, OpenCode |
| Fallback depth | 4-5 tiers minimum | OpenCode (9), Aider (5) |
| Output format | XML or custom markers | Cline (XML), Codex (custom) |
| Error handling | Suggest similar content + recovery path | Aider's error templates |
| Verification | LSP/linter check after every edit | OpenCode's diagnostics |
| User approval | Diff preview with configurable auto-approve | Grok CLI's confirmation system |
| Multi-file edits | Patch format or sequential operations | Codex patch syntax |
| Unicode handling | Normalize smart quotes, dashes, whitespace | Codex's seek_sequence |
Ready to Build?
You now have the patterns, code examples, and best practices from the top AI coding agents. Go build something amazing.