You can locate duplicate documents in the data index after indexing has taken place by using the DREDUPLICATE
index action. This action locates duplicates in a specified subset of the content, and then removes them, tags a field, or moves the duplicate documents to another database.
http://IDOLhost:indexPort/DREDUPLICATE?ReferenceField=Field&DuplicateAction=Action
where:
IDOLhost
|
is the IP address or host name of the machine on which IDOL Server is installed. |
indexPort
|
is the IDOL Server index port (specified as IndexPort in the [Server] section of the IDOL Server configuration file). |
Field
|
is a ReferenceType field used as the initial determination of whether two documents are a match. |
Action
|
is the action to perform on a duplicate. The following options are available:
|
For example:
http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Database&Database=Duplicates
This action uses port 20001
to remove duplicates from the IDOL Server that is located on the machine with the host name MyHost
. IDOL Server uses the DREREFERENCE
field to identify duplicate documents, and moves them to the Duplicates
database.
http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Tag&TagField=DOCUMENT/DRETITLE&TagValue=Duplicate
In this example, IDOL Server initially uses the DREREFERENCE
field to identify the duplicate documents, and then it changes the DRETITLE
field to the value Duplicate
.
To prevent IDOL Server from indexing duplicate documents, use the KillDuplicates
parameter with the DREADD
and DREADDDATA
index actions.
For details on the other parameters that are available for the DREDUPLICATE
index action, refer to the IDOL Server Reference.
|