PDF Text Extraction

Similar to Optical Character Recognition (OCR), PDF Text Extraction offers the ideal way of retrieving text from within document files to support the discovery of your materials, such as PDF journals and magazines, by making the typed text within them searchable.

With born-digital material, the original text is stored within the PDF document itself. Text extraction simply draws on this, which means 100% quality results, unlike other data capture services that can struggle with tricky or faded text. Simply identify a PDF file that you would like to perform PDF Text Extraction on and then select the ‘process’ button. 

Related Services


Digitisation - Prime content for PastView integration

Digitisation converts your physical content into digital format, priming it for successful integration into the PastView platform and opening up opportunities for online access and discovery, granting you ultimate control over the organisation of your digitised items into collections that can be filtered into categories and subcategories. We offer unrivalled quality as industry leaders in the expert scanning and indexing of all material types, whether old, rare, precious or fragile items.


Data Capture - Enhance your digital searchability

Digitisation places your physical content into a format perfect for capturing the valuable data contained within. Through a range of data capture services, typed and handwritten textual details from your archives can be captured and used in a variety of ways, seamlessly imported into your PastView platform for instant and comprehensive searchability.