Textractor API textractor-716 (20091105163204)

textractor.mg4j.document
Enum AbstractTextractorDocumentFactory.MetadataKeys

java.lang.Object
  extended by java.lang.Enum<AbstractTextractorDocumentFactory.MetadataKeys>
      extended by textractor.mg4j.document.AbstractTextractorDocumentFactory.MetadataKeys
All Implemented Interfaces:
Serializable, Comparable<AbstractTextractorDocumentFactory.MetadataKeys>
Enclosing class:
AbstractTextractorDocumentFactory

public static enum AbstractTextractorDocumentFactory.MetadataKeys
extends Enum<AbstractTextractorDocumentFactory.MetadataKeys>

Case-insensitive keys for metadata passed to DocumentFactory.getDocument(java.io.InputStream, it.unimi.dsi.fastutil.objects.Reference2ObjectMap).

The keys in this class are general-purpose keys that are meaningful for most factories. Specific factory implementations might choose to interpret more keys, but then it is up to the DocumentSequence that uses the factory to provide data for those keys.


Enum Constant Summary
MAXIMUM_LENGTH_CONSERVE_CASE
          The maximum number of characters in a term that guarantee that the term is not downcased.
MINIMUM_DASH_SPLIT_LENGTH
          Minimum index at which a word can be split.
OTHER_CHARACTER_DELIMITERS
          Other characters that the reader should consider as standalone words.
PARENTHESESAREWORDS
          Indicates that parenthesis characters are treated as words.
 
Method Summary
static AbstractTextractorDocumentFactory.MetadataKeys valueOf(String name)
          Returns the enum constant of this type with the specified name.
static AbstractTextractorDocumentFactory.MetadataKeys[] values()
          Returns an array containing the constants of this enum type, in the order they are declared.
 
Methods inherited from class java.lang.Enum
clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Enum Constant Detail

PARENTHESESAREWORDS

public static final AbstractTextractorDocumentFactory.MetadataKeys PARENTHESESAREWORDS
Indicates that parenthesis characters are treated as words.


MINIMUM_DASH_SPLIT_LENGTH

public static final AbstractTextractorDocumentFactory.MetadataKeys MINIMUM_DASH_SPLIT_LENGTH
Minimum index at which a word can be split.


OTHER_CHARACTER_DELIMITERS

public static final AbstractTextractorDocumentFactory.MetadataKeys OTHER_CHARACTER_DELIMITERS
Other characters that the reader should consider as standalone words. When these characters are encountered, the reader stops the current word, if any, and makes the character a word by itself. For instance, if OTHER_CHARACTER_DELIMITERS=";!", parsing "a; !" will result in the words "a", ";", "!" being returned, and the nonWords: "", " ", "".


MAXIMUM_LENGTH_CONSERVE_CASE

public static final AbstractTextractorDocumentFactory.MetadataKeys MAXIMUM_LENGTH_CONSERVE_CASE
The maximum number of characters in a term that guarantee that the term is not downcased.

Method Detail

values

public static AbstractTextractorDocumentFactory.MetadataKeys[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
for (AbstractTextractorDocumentFactory.MetadataKeys c : AbstractTextractorDocumentFactory.MetadataKeys.values())
    System.out.println(c);

Returns:
an array containing the constants of this enum type, in the order they are declared

valueOf

public static AbstractTextractorDocumentFactory.MetadataKeys valueOf(String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)

Parameters:
name - the name of the enum constant to be returned.
Returns:
the enum constant with the specified name
Throws:
IllegalArgumentException - if this enum type has no constant with the specified name
NullPointerException - if the argument is null

Textractor API textractor-716 (20091105163204)

Copyright © 2003-2008 Institute for Computational Biomedicine, All Rights Reserved.