DataHarvester is a powerful find in files tool with textual pattern capabilities (regular expressions).
Using pattern searching you can quickly locate email addresses, telephone numbers, website URLs, IP addresses, or any other pattern. You can even conduct regular searches.
Search a single document, or an entire folder structure comprising of a multitude of document formats.
DataHarvester is able to search all text-based formats (inclusive of html, xml, txt and so forth) as well as Adobe PDF (Portable Document Format).
Other proprietary formats are available using the DataHarvester extensibility model, such as Microsoft Office files including Word, Excel, and PowerPoint. These formats can be seemlessly integrated with the DataHarvester software.
By default DataHarvester will search documents within a Windows folder or network share. The extensibility model allows other document sources to be integrated, such as document managment systems both online or as a system.
In the case of a document management system such as Documentum, the documents will be extracted from the system, searched, and removed. This caters for a scenario where documents need to be indexed for industry-specific patterns, such as an asset number to document matrix.
Advanced searches can be configured specifying a number of jobs and pattern searches. Once an advanced search is configured it can be saved to disk, ran ad-hoc, as a command line process, or from a scheduled task.
Search results can be saved to a file for further analysis. By default, DataHarvester allows results to be saved to CSV (comma-delimited) format which may be opened in a text editor such as Notepad or Microsoft Excel, or used to populate a database.
DataHarvester utilises processor concurrency to leaverage the power of the host machine. This ensures complex searches can be ran to maximum performance, reducing search time significantly. This caters for a scenario where an extensive document store needs to be searched.
For more information on eSensible and DataGate please contact us.