The following image shows the completed ingestion pipeline that is described in the following sections.
The pipeline includes the following steps:
File System Connector. The File System Connector retrieves data from a local or network file system. The connector produces a NiFi FlowFile to represent each file that is retrieved from the file system.
KeyView Extraction. Extracts files from containers. For example, if a FlowFile represents a zip archive, KeyView extracts the contents of the archive.
KeyView Filtering. Filtering extracts the text from a file and adds it to the document content. The text can then be indexed into IDOL, which means that IDOL does not need to process the data in its original format.
Remove Document Part. This step removes the binary content or file reference from a FlowFile. Removing file references allows NiFi to delete temporary files.
Indexing. Documents are indexed into an IDOL Content component.