When a file format supports metadata, KeyView can extract and process that information. Metadata includes document information fields such as title, author, creation date, and file size. Depending on the file's format, metadata is referred to in a number of ways: for example, "summary information," "OLE summary information," "file information," and "document properties."
The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and MBX) is extracted differently than other formats. For information on extracting metadata from these formats, see Extract Mail Metadata.
NOTE: KeyView can extract metadata from a document only if metadata is defined in the document, and if the document reader can extract metadata for the file format. The section Document Readers lists the file formats for which metadata can be extracted. KeyView does not generate metadata automatically from the document contents.
You can extract the metadata at the API level. The API extracts all valid metadata fields that exist in the file.
To extract metadata using the Java API
Set the input source using the setInputSource
method.
Call the getSummaryInfo()
method of the Export
object to retrieve an object of the SummaryInfo
class.
Use the methods of the SummaryInfo
object to retrieve the metadata information.
The XmlTest
sample program demonstrates how to extract metadata through the Java API.
SummaryInfo[] sinfo = objXmlExport.getSummaryInfo();
if(sinfo != null) { System.out.println("\nSummary info has been extracted."); fos_sum = new FileOutputStream(summaryOutFile); DataOutputStream dos_sum = new DataOutputStream(fos_sum); for(int i=0; i<sinfo.length; i++) { if(sinfo[i].getElementName() != null) { dos_sum.writeBytes("Element name: " + sinfo[i].getElementName() + "\n"); dos_sum.writeBytes("Element type: " + sinfo[i].getSumInfoType() + "\n"); if(sinfo[i].getIsValid() == true) { if(sinfo[i].isDateTimeType()) { dos_sum.writeBytes("Date/time: "); dos_sum.writeBytes(sinfo[i].getDateTime()) } else { byte[] data = sinfo[i].getData(); if(data != null) { dos_sum.writeBytes("Element data: "); dos_sum.write(data); } } } dos_sum.writeBytes("\n\n"); } } dos_sum.close(); fos_sum.close(); } sinfo = null;
The SummaryInfo
class stores the metadata extraction results. After calling the XmlExport.getSummaryInfo()
method, call the get methods provided by each instance of this class to extract metadata.
The following describes each get method:
When using a template file, KeyView recognizes two types of metadata: standard and non-standard. Standard metadata includes fields, such as Title, Author, and Subject. The standard fields are enumerated from 1 to 41 in KVSumType
in the header file kvtypes.h
. Non-standard metadata includes any field not listed from 1 to 41 in KVSumType
, such as user-defined fields (for example, custom property fields in Microsoft Word documents), or fields that are unique to a particular file type (for example, "Artist" or "Genre" fields in MP3 files). Enumerated types 42 and greater are reserved for non-standard metadata.
To extract metadata by using a template file
Insert metadata tokens in a member of the KVXMLTemplate
structure in the template file. This defines the point at which the metadata appears in the XML output.
If you are using the $USERSUMMARY
or $SUMMARY
token, define the szUserSummary
member of the KVXMLTemplate
structure in the template file. This determines the markup and tokens generated when these metadata tokens are processed.
In your application, read the template file and write the data to the KVXMLTemplate
structure.
The following metadata tokens can be used in the template files:
The following markup displays the contents of the "Title" field at the top of the main XML file:
szMainTop=$SUMMARY01
In KVSumType
, 01
is the enumerated value for the "Title" metadata field.
The following markup extracts all standard fields, and includes them in the first heading level 1 XML block:
szFirstH1Start=$SUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
This example extracts the field name ($NAME
) and field content ($CONTENT
) for standard metadata and includes it at the beginning of the first heading level 1 XML block.
The generated XML might look like this:
<MetaData name="CodePage" content="1252" \> <MetaData name="Title" content="My design document" \> <MetaData name="Subject" content="design specifications" \> <MetaData name="Author" content="John Doe" \> <MetaData name="Keywords" content="" \> <MetaData name="Comments" content="" \> <MetaData name="Template" content="Normal.dot" \> <MetaData name="LastAuthor" content="lchapman" \> <MetaData name="RevNumber" content="6" \> <MetaData name="EditTime" content="01/01/1601, 0:08" \> <MetaData name="LastPrinted" content="14/01/2002, 14:06" \> <MetaData name="Create_DTM" content="27/08/2003, 10:31" \> <MetaData name="LastSave_DTM" content="29/08/2003, 14:07" \> <MetaData name="PageCount" content="1" \> <MetaData name="WordCount" content="4062" \> <MetaData name="CharCount" content="23159" \> <MetaData name="AppName" content="Microsoft Word 9.0" \> <MetaData name="Security" content="0" \> <MetaData name="Category" content="software" \> <MetaData name="LineCount" content="192" \> <MetaData name="ParCount" content="46" \> <MetaData name="ScaleCrop" content="FALSE" \> <MetaData name="Manager" content="" \> <MetaData name="Company" content="Autonomy" \> <MetaData name="LinksDirty" content="FALSE" \>
The following markup extracts non-standard fields, and includes them at the bottom of the main XML file:
szMainBottom=$USERSUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
This example extracts the field name ($NAME
) and field content ($CONTENT
) for non-standard metadata from a document, and includes it at the bottom of the main XML file.
The generated XML might look like this:
<MetaData name="Telephone number" content="444-111-2222" <MetaData name="Recorded date" content="07/03/2003, 23:00" <MetaData name="Source" content="TRUE" <MetaData name="my property" content="reserved"