How to Convert a Scanned PDF to Editable Text (Free OCR Workflow)
Table of Contents
- Scanned PDF vs Digital PDF: Which Do You Have?
- The Free 3-Step OCR Workflow
- Step 1: Convert PDF Pages to Images
- Step 2: Run OCR on Each Page
- Step 3: Clean Up the Extracted Text
- Getting Maximum OCR Accuracy from Scans
- Bonus: Screenshot-to-Text Works the Same Way
- Common Use Cases
- Frequently Asked Questions
A scanned PDF looks like a document but behaves like a photo album: you can't select, search, copy, or edit anything in it. This guide walks through the free browser-based workflow that turns those page images into editable text - no Adobe subscription, no desktop software.
Scanned PDF vs Digital PDF: Which Do You Have?
The 5-second test: open the PDF and try to select a sentence with your cursor.
- Text highlights? It's a digital PDF - the text already exists. Just copy it; no OCR needed.
- Nothing selects (or the whole page selects as one block)? It's a scanned PDF - each page is a picture, and the text layer doesn't exist until OCR creates it.
Scanned PDFs come from office scanners, phone "scan" apps, fax archives, and old document repositories. Everything below applies to them - and to any photographed document. If you're new to the technology itself, start with What Is OCR? How Optical Character Recognition Works.
The Free 3-Step OCR Workflow
Fix recognition errors and restore formatting in Word, Google Docs, or any editor.
Step 1: Convert PDF Pages to Images
OCR engines work on images, so the first move is getting each PDF page out as a picture at the right quality:
- Open the PDF to JPG converter and upload your scanned PDF.
- Set DPI to 300 - the OCR sweet spot. Higher (400-600) only helps for very small print; lower than 200 visibly hurts accuracy.
- Convert all pages, or a single page if you only need one section.
- Download the page images.
Password-protected PDFs need to be unlocked first - the converter will tell you if the file is protected rather than failing silently.
Step 2: Run OCR on Each Page
- Open the OCR tool and upload a page image (you can also paste screenshots directly with Ctrl+V).
- Select the document language. This matters more than people expect - the engine uses language models to resolve ambiguous characters, and the right language setting can add several points of accuracy. 100+ languages are supported.
- Run the extraction and review the text output next to the original.
- Copy the text, then repeat for the remaining pages.
For multi-page documents, work in batches and paste each page into your target document as you go - it's much easier to keep page order straight than fixing it afterwards.
Step 3: Clean Up the Extracted Text
Even at 99% accuracy, a full page (~3000 characters) leaves a couple dozen errors. The predictable ones:
- Character confusion:
l/1/I,O/0,rnread asm. Spell-check catches most of these. - Broken line wraps: hard line breaks mid-sentence where the scan's lines ended. Find-and-replace single line breaks with spaces, keeping double breaks as paragraphs.
- Hyphenation: words split across lines ("docu- ment") need rejoining.
- Tables: OCR returns table contents as text lines; complex tables are usually faster to rebuild than to repair.
Verify numbers manually in anything financial or legal - a misread digit is the one OCR error spell-check will never catch.
Getting Maximum OCR Accuracy from Scans
| Factor | What to do |
|---|---|
| Resolution | 300 DPI scans; rescan anything under 200 DPI if you can |
| Skew | Straighten tilted pages - even 3-5 degrees of rotation costs accuracy. The rotate tool fixes this in seconds |
| Contrast | Faded text on yellowed paper? Boost contrast with photo filters before OCR |
| Crop | Crop away dark scanner edges, hole punches, and margin notes that confuse the engine |
| Language | Always set the correct document language in the OCR tool |
More accuracy tactics in How to Extract Text from Images Using OCR.
Bonus: Screenshot-to-Text Works the Same Way
The same OCR step works on any screenshot - error messages, slides from a webinar, text in an image someone sent you, content from apps that block copying. Skip the PDF conversion entirely: take the screenshot, paste it into the OCR tool with Ctrl+V, and copy the text out. For the full screenshot workflow, see Image OCR Online: Extract Text from Images, PDFs, and Screenshots.
Common Use Cases
- Digitizing contracts and records - make archived paperwork searchable.
- Reusing old reports - pull quotes and data out of legacy PDFs into new documents.
- Receipts and invoices - extract amounts and line items for expense tracking.
- Academic papers - quote scanned books and journal articles without retyping.
- Translations - extract source text before running it through a translator.
Frequently Asked Questions
Recap: PDF to JPG at 300 DPI, then OCR with the right language, then clean up. The whole round trip for a 10-page scan takes about five minutes. Browse all document tools for the rest of the PDF workflow.