Tools Used
LLM. Python. VSCode, Colab.
Techniques Used
Chunking. Prompt evaluations.
Access
Working solution here.
Upload any .docx file and see corrections in colour.
Corrects writing mistakes that slip past Microsoft Word. Effective in English, German, and Italian. Preserves original meaning, writing style, and document structure. Includes chunking and prompt evaluations.
Built with Claude 3.5 Sonnet via the Anthropic Python API.
The Problem
Surprisingly Microsoft Word misses many subtle writing mistakes. Pasting paragraph by paragraph into an AI chat is slow with writing style often altered. A better way was needed: upload a very large Word document and see all needed corrections at once (in colour) with style and meaning preserved.
Figure 1 shows mistakes undetected by Microsoft Word but detected and corrected by this solution. Unlike an AI chat, from a user point of view the corrections appear to be done all in one go.

Correct Your Docs
Access the full working solution here to correct your own Word documents. For those of us who only possess a little coding know-how, it explains how this was built using Claude Sonnet as a collaborator. Source code and README are available on GitHub.
The Solution
The solution was designed in collaboration with Claude and a bit of research on my part. For speed, I built it in VSCode with the Continue.dev extension for code generation (Claude 3.5 Sonnet). For easy use and access I placed the solution in a Google Colab notebook.
Capabilities at a glance:
- Outperforms Microsoft Word’s capabilities to detect and correct writing mistakes.
- Supports large documents tested up to 100,000 words, approx. 240 pages.
- Multi-language support: Can be used confidently in English, German, French or Italian.
- Preserves semantic meaning, writing style, and document structure.
- Comprehensive testing suite to ensure correction integrity.
With no prior knowledge how to build this, I stated my project objectives to Claude. After an hour of conversation, I pieced together what I had learnt and drew the solution approach below (Figure 2).

Looking at Figure 2, in essence the Word document is converted into a markdown file. I then break this file into little chunks. One by one I send each chunk to Claude along with the instructional prompt asking for corrections. I receive each corrected chunk back and assemble each back together into a “processed” markdown file. Finally, to make the corrections easier to see and in colour I convert this file into an html file.
Of course, the actual solution implemented is a bit more sophisticated. I provided all my Python code to Claude and instructed it to create a solution diagram. This is what is shown in Figure 3 below: the actual solution architecture as derived and drawn from the solution code.
The technical highlights of the implementation include:
- A text chunking strategy for processing large documents.
- Integration with the Anthropic API (Sonnet) for intelligent corrections
- Three-layer testing approach:
1) Prompt evaluation.
2) Semantic preservation validation.
3) Code quality validation using test-driven development.

Not Perfect but Aiming.
The more I learn about prompt engineering and understand how large language models work, the more imperfect my notebook becomes. For instance, the driving prompt shown below should be enhanced. Prompt chaining would allow the prompt to be broken into smaller steps, improving overall quality. Instead of repeatedly instructing Claude not to include its own comments, I could use prompt prefilling. These insights become strikingly clear with hindsight and experience.
PROMPT_TEMPLATE = """
CRITICAL: PROVIDE ONLY THE CORRECTED TEXT WITHOUT ANY ADDITIONAL COMMENTARY.
Your task is to take the provided text and rewrite it into a clear, grammatically correct version
while preserving the original meaning as closely as possible. Correct any spelling mistakes,
punctuation errors, verb tense issues, word choice problems, and other grammatical mistakes.
MANDATORY INSTRUCTIONS:
1. Determine and use the same linguistic language as the original text (e.g., English, German)
2. Preserve all existing markdown formatting, including heading levels, paragraphs, and lists
3. Make necessary grammatical corrections, including spelling, punctuation, verb tense,
word choice, and other grammatical issues. Only make stylistic changes if essential for clarity
4. Mark corrections with markdown syntax, apply one of these choices only:
- For changed text use bold: e.g., **changed** and **multiple changed words**
- For new text use bold: **new words**
- For removed text use bold strikethrough: **~~removed words~~**
5. Maintain the original structure:
- Don't add new lines of text
- Don't include additional commentary at all
- Don't convert markdown elements to different types
6. For ambiguous corrections, choose the option that best preserves original meaning and style
7. Ensure consistency in corrections throughout the text
8. Return the corrected text in markdown syntax
9. DO NOT add any explanations, introductions, or conclusions to your response
FINAL REMINDER: Your output should consist SOLELY of the corrected text.
Do not include phrases like "Here is the corrected text" or any other form of commentary.
The text to be corrected is provided between the triple tildes (~~~):
~~~
{the_markdown_chunk}
~~~
REMEMBER: Provide ONLY the corrected text without any additional words or explanations."""
Above: Core prompt driving the solution.
Conclusion
The most important aspect I learnt when building a product that leverages a large language model is to focus on the heart — automated prompt evaluation with real data. Although the solution does have this now, I could have implemented evaluations far sooner and saved a lot of time in manual prompt tweaking. It is tempting to concentrate on the whole product body — but without a healthy “LLM heart” there is little point.
For questions, see FAQ.