Textractor API textractor-720 (20091120123250)

textractor.mg4j.document
Class SfnDocumentFactory

java.lang.Object
  extended by it.unimi.dsi.mg4j.document.AbstractDocumentFactory
      extended by it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
          extended by textractor.mg4j.document.AbstractTextractorDocumentFactory
              extended by textractor.mg4j.document.SfnDocumentFactory
All Implemented Interfaces:
DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable

public final class SfnDocumentFactory
extends AbstractTextractorDocumentFactory

A factory that can produce MG4J documents from SfnArticles and their associated Sentences.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class textractor.mg4j.document.AbstractTextractorDocumentFactory
AbstractTextractorDocumentFactory.MetadataKeys
 
Nested classes/interfaces inherited from interface it.unimi.dsi.mg4j.document.DocumentFactory
DocumentFactory.FieldType
 
Field Summary
 
Fields inherited from class textractor.mg4j.document.AbstractTextractorDocumentFactory
DEFAULT_MAXIMUM_LENGTH_CONSERVE_CASE, DEFAULT_MINIMUM_DASH_SPLIT_LENGTH, DEFAULT_OTHER_CHARACTER_DELIMITERS, DEFAULT_WORD_READER_CLASS, fields
 
Fields inherited from class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
 
Constructor Summary
SfnDocumentFactory()
          Construct a new DocumentFactory.
SfnDocumentFactory(Properties properties)
          Construct a new DocumentFactory.
SfnDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
          Construct a new DocumentFactory.
SfnDocumentFactory(String[] property)
          Construct a new DocumentFactory.
 
Method Summary
 TextractorDocumentFactory copy()
          Creates a copy of this factory.
 Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata)
          Returns the document obtained by parsing the given byte stream.
 
Methods inherited from class textractor.mg4j.document.AbstractTextractorDocumentFactory
fieldIndex, fieldName, fieldType, getDocumentNoContent, getFieldInfoList, numberOfFields, parseProperty
 
Methods inherited from class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey
 
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SfnDocumentFactory

public SfnDocumentFactory()
                   throws ConfigurationException,
                          ClassNotFoundException,
                          IllegalAccessException,
                          InstantiationException
Construct a new DocumentFactory.

Throws:
ConfigurationException - if there is a problem with the configuration of the factory.
ClassNotFoundException - if the specified WordReader cannot be found.
IllegalAccessException - if the factory is unable to create an instance of the specified WordReader.
InstantiationException - if the factory is unable to create an instance of the specified WordReader.

SfnDocumentFactory

public SfnDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
                   throws ConfigurationException,
                          ClassNotFoundException,
                          IllegalAccessException,
                          InstantiationException
Construct a new DocumentFactory.

Parameters:
defaultMetadata - meta data used to configure this factory
Throws:
ConfigurationException - if there is a problem with the configuration of the factory.
ClassNotFoundException - if the specified WordReader cannot be found.
IllegalAccessException - if the factory is unable to create an instance of the specified WordReader.
InstantiationException - if the factory is unable to create an instance of the specified WordReader.

SfnDocumentFactory

public SfnDocumentFactory(Properties properties)
                   throws ConfigurationException,
                          ClassNotFoundException,
                          IllegalAccessException,
                          InstantiationException
Construct a new DocumentFactory.

Parameters:
properties - properties used to configure this factory
Throws:
ConfigurationException - if there is a problem with the configuration of the factory.
ClassNotFoundException - if the specified WordReader cannot be found.
IllegalAccessException - if the factory is unable to create an instance of the specified WordReader.
InstantiationException - if the factory is unable to create an instance of the specified WordReader.

SfnDocumentFactory

public SfnDocumentFactory(String[] property)
                   throws ConfigurationException,
                          ClassNotFoundException,
                          IllegalAccessException,
                          InstantiationException
Construct a new DocumentFactory.

Parameters:
property - properties used to configure this factory
Throws:
ConfigurationException - if there is a problem with the configuration of the factory.
ClassNotFoundException - if the specified WordReader cannot be found.
IllegalAccessException - if the factory is unable to create an instance of the specified WordReader.
InstantiationException - if the factory is unable to create an instance of the specified WordReader.
Method Detail

copy

public TextractorDocumentFactory copy()
Creates a copy of this factory.

Returns:
a copy of this factory.

getDocument

public Document getDocument(InputStream rawContent,
                            Reference2ObjectMap<Enum<?>,Object> metadata)
Returns the document obtained by parsing the given byte stream.

Parameters:
rawContent - the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.
metadata - a map from enums (e.g., keys taken in PropertyBasedDocumentFactory) to various kind of objects.
Returns:
the document obtained by parsing the given character sequence.

Textractor API textractor-720 (20091120123250)

Copyright © 2003-2008 Institute for Computational Biomedicine, All Rights Reserved.