Introduction to FlowFiles and Documents

The basic unit of data in Apache NiFi is the FlowFile. When you look at your data flow in the NiFi web interface, you can see FlowFiles being queued and counted by your processors. So that they integrate with the Apache NiFi framework, IDOL NiFi Ingest components also handle FlowFiles. For example, a connector creates FlowFiles which are then processed by a KeyView processor or Media Analysis processor.

A FlowFile that represents an IDOL document has the following attributes.

Attribute Description
idol.type

The type of content contained in the body of the FlowFile.

The FlowFiles created by IDOL NiFi Ingest connectors have a type of document. This is a binary format that can include multiple parts, for example one part containing XML metadata and another containing binary content from a file. For more information about this format, see application/x.idol.doc.

IDOL NiFi Ingest processors also accept FlowFiles where this attribute has the following values, so that they can process FlowFiles created by other NiFi processors:

  • contentfile - The FlowFile body contains the binary body of a file.
  • contentfilename - A full, local, file path in plain text.
  • content - Text content to use as the DRECONTENT of an IDOL document.
  • xmlmetadata - XML metadata.
idol.reference Equivalent to the DREREFERENCE for the document.
idol.reference.action This attribute is usually assigned by a connector and indicates the indexing operation to perform, for example Add, Update, or Remove.
idol.src.connector.name This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It specifies the display name for the type of connector that created the FlowFile, for example "File System Connector".
idol.src.processor.identifier This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It specifies the ID of the NiFi processor that created the FlowFile.
idol.src.processor.name This attribute is present if the FlowFile was created by an IDOL NiFi Ingest Connector. It specifies the display name of the NiFi processor, for example "My GetFileSystem Processor".
mime.type

The MIME type for the body of the FlowFile:

  • application/x.idol.doc when the idol.type is document.
  • application/octet-stream when the idol.type is contentfile.
  • text/plain when the idol.type is contentfilename or content.
  • text/xml when the idol.type is xmlmetadata.

application/x.idol.doc

The application/x.idol.doc format is a binary format that can include multiple parts. A single document can contain multiple contentfilename, contentfile, or content parts but no more than one xmlmetadata part.

Part type Description
contentfilename

A document part that contains a file path.

When a document has a contentfilename part, the FlowFile can have an attribute named idol.doc.part.part_id.file.own which contains a Boolean value to indicate whether NiFi Ingest owns the file and can delete it when processing is complete.

To ensure that temporary files owned by NiFi Ingest are deleted, you can use a RemoveDocumentPart processor (as described in Remove Temporary Files).

contentfile

A document part that contains the binary content of a file.

When a document has a contentfile part, the FlowFile can have an attribute named idol.doc.part.part_id.file.name which contains a display name for the file.

content A document part that contains one or more pages of text content.
xmlmetadata A document part that contains XML metadata.

_FT_HTML5_bannerTitle.htm