Textractor API textractor-720 (20091120123250)

textractor.datamodel
Class Sentence

java.lang.Object
  extended by textractor.datamodel.TextractorDocument
      extended by textractor.datamodel.Sentence
All Implemented Interfaces:
Serializable

public final class Sentence
extends TextractorDocument

See Also:
Serialized Form

Field Summary
static String REFERENCE_TAG
           
static String REFERENCE_TAG_HTML
           
static int TEXT_MAX_LENGTH
           
 
Fields inherited from class textractor.datamodel.TextractorDocument
ABSTRACT_SECTION, documentNumber, OTHER_SECTION, REFERENCE_SECTION
 
Constructor Summary
Sentence()
           
Sentence(Article article)
           
Sentence(Article article, String text)
           
 
Method Summary
 Article getArticle()
           
 Reference2ObjectMap<Enum<?>,Object> getMetaData()
          Get the metadata for this document.
 List<Integer> getPositions()
          Get the positions of words in the sentence text relative to the original document source.
 String[] getPotentialMutations()
          Obtains the terms that look like mutations in this sentence.
 MutableString getSpaceDelimitedProcessedTerms(DocumentIndexManager docmanager)
          Returns the text of this sentence in a format where words have been processed and are delimited by a single space character.
static MutableString getSpaceDelimitedProcessedTerms(DocumentIndexManager docmanager, String spaceDelimitedTerms)
          Process each term with the termProcessor and returns the result.
 MutableString getSpaceDelimitedTerms(DocumentIndexManager docmanager)
          Returns the text of this sentence in a format where words are delimited by a single space character.
 int getTermNumber()
          Returns the number of terms in this sentence.
 String getText()
          Get the text contained in this document.
 boolean hasPositons()
          Does this sentence have position information associated with the text?
 boolean isMaybeProteinMutation()
           
 boolean isMaybeProteinName()
           
 void setArticle(Article article)
           
 void setMaybeProteinMutation(boolean maybeProteinMutation)
           
 void setMaybeProteinName(boolean maybeProteinName)
           
 void setPositions(List<Integer> positions)
          Set the positions of words in the sentence text relative to the original document source.
 void setPotentialMutations(String[] potentialMutations)
          Stores the terms that look like mutations in this sentence.
 void setText(String newText)
           
 
Methods inherited from class textractor.datamodel.TextractorDocument
getDocumentNumber, getDocumentSection, setDocumentNumber, setDocumentSection
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REFERENCE_TAG

public static final String REFERENCE_TAG
See Also:
Constant Field Values

REFERENCE_TAG_HTML

public static final String REFERENCE_TAG_HTML
See Also:
Constant Field Values

TEXT_MAX_LENGTH

public static final int TEXT_MAX_LENGTH
See Also:
Constant Field Values
Constructor Detail

Sentence

public Sentence()

Sentence

public Sentence(Article article)

Sentence

public Sentence(Article article,
                String text)
Method Detail

getPotentialMutations

public String[] getPotentialMutations()
Obtains the terms that look like mutations in this sentence.

Returns:
An array where each element is the term that is a potential mutation.

setPotentialMutations

public void setPotentialMutations(String[] potentialMutations)
Stores the terms that look like mutations in this sentence.

Parameters:
potentialMutations - An array where each element is the term that is a potential mutation.

isMaybeProteinMutation

public boolean isMaybeProteinMutation()

setMaybeProteinMutation

public void setMaybeProteinMutation(boolean maybeProteinMutation)

isMaybeProteinName

public boolean isMaybeProteinName()

setMaybeProteinName

public void setMaybeProteinName(boolean maybeProteinName)

getText

public String getText()
Description copied from class: TextractorDocument
Get the text contained in this document.

Specified by:
getText in class TextractorDocument
Returns:
A string of text

setText

public void setText(String newText)

getArticle

public Article getArticle()

setArticle

public void setArticle(Article article)

getPositions

public List<Integer> getPositions()
Get the positions of words in the sentence text relative to the original document source.

Returns:
a list of positions which may be null if not available

setPositions

public void setPositions(List<Integer> positions)
Set the positions of words in the sentence text relative to the original document source.

Parameters:
positions - a list of positions

hasPositons

public boolean hasPositons()
Does this sentence have position information associated with the text?

Returns:
true if position data is available

getTermNumber

public int getTermNumber()
Returns the number of terms in this sentence.

Returns:
Number of terms in this sentence, if the method getSpaceDelimitedTerms was previously called, or -1 if the number was never calculated.

getSpaceDelimitedTerms

public MutableString getSpaceDelimitedTerms(DocumentIndexManager docmanager)
Returns the text of this sentence in a format where words are delimited by a single space character. The algorithm used to split the sentence into terms is the same as used by MG4J, allowing direct calculation of the word positions with a StringTokenizer that would split on spaces. As a side effect, this method calculates and stores the number of terms in this sentence. That number is then available through getTermNumber().

This method does not process the terms: each term is returned with the capitalization that it had in the input text.

Parameters:
docmanager - DocumentIndexManager used to process terms
Returns:
the text of this sentence in a format where words are delimited by a space character, or null if getText() returns null.

getSpaceDelimitedProcessedTerms

public MutableString getSpaceDelimitedProcessedTerms(DocumentIndexManager docmanager)
Returns the text of this sentence in a format where words have been processed and are delimited by a single space character. The algorithm used to split the sentence into terms is the same as used by MG4J, allowing direct calculation of the word positions with a StringTokenizer that would split on spaces. As a side effect, this method calculates and stores the number of terms in this sentence. That number is then available through getTermNumber().

Parameters:
docmanager - DocumentIndexManager used to process terms
Returns:
the text of this sentence in a format where words are delimited by a single space character, or null if getText() returns null.
See Also:
Implementations of this interface are used to process terms to construct the result.

getSpaceDelimitedProcessedTerms

public static MutableString getSpaceDelimitedProcessedTerms(DocumentIndexManager docmanager,
                                                            String spaceDelimitedTerms)
Process each term with the termProcessor and returns the result.

Parameters:
docmanager - DocumentIndexManager used to process terms
spaceDelimitedTerms - Input string, where terms are delimited by single space characters.
Returns:
Input term sequence where terms are substituted by their processed term equivalent.

getMetaData

public Reference2ObjectMap<Enum<?>,Object> getMetaData()
Get the metadata for this document.

Overrides:
getMetaData in class TextractorDocument
Returns:
map containing metadata

Textractor API textractor-720 (20091120123250)

Copyright © 2003-2008 Institute for Computational Biomedicine, All Rights Reserved.