You can configure IDOL Server to implement deduplication when indexing documents. This process prevents storage of the same document or document content. If IDOL Server determines that the document to index matches an existing document, it replaces the existing document with the new document.
IDOL uses deduplication options to determine whether documents match. See Deduplication Options—KillDuplicates.
You can enable deduplication in one of three ways:
Enable deduplication for all indexing jobs by using the KillDuplicates
configuration parameter in the [Server]
section of the IDOL Server configuration file. See Enable Deduplication for all Index Jobs.
You can use the KillDuplicatesChecksumField
configuration parameter with deduplication to prevent unnecessary updating of existing documents in IDOL Server. See Use KillDuplicatesChecksumField to Prevent Unnecessary Indexing.
You can also use the KillDuplicatesPreserveFields
configuration parameter with deduplication to copy the specified IDX fields from an existing document to a newer version.
Enable deduplication for individual indexing jobs by using the KillDuplicates
action parameter in the DREADD
and DREADDDATA
actions. See Enable Deduplication for Individual Index Jobs.
Use the KeepExisting
action parameter with deduplication to discard the incoming document instead of replacing the existing document, This option reduces the indexing load. See Use KeepExisting to Minimize the Index Load.
Enable deduplication when indexing with Connector Framework Server (CFS) by setting the KillDuplicates
configuration parameter for the connector. See Enable Deduplication for Connector Index Jobs.
Some other IDOL Server parameters affect the behavior of the deduplication settings. See Deduplication Constraints.
You can deduplicate after indexing by using the DREDUPLICATE
index action. See Locate Duplicate Documents.
|