|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecttextractor.datamodel.FeatureCreationParameters
textractor.datamodel.SingleBagOfWordFeatureCreationParameters
public final class SingleBagOfWordFeatureCreationParameters
Holds parameters to calculate features with the bag of word approach. In the bag of word approach, a window is centered around a word of interest (the position of this word is encoded in the TextFragmentAnnotation#getWordPosition). The window has a size given by the windowSize in this class. A set of words/terms are used in turn to perform boolean tests on the text within the window around the term. The results of these tests provides the features (features are 0 or 1). As an example, consider the following window, centered around term A23P: The mutant A23P exhibits a strong phenotype. If the window size is 2 and the terms considered for calculating the bag of words are "the", "mutant", and "strong" (in this order), then the features calculated for the text within the window in the example will be: 1 1 0 The first 1 means that the word "the" occurs in the window around the word of interest. the second 1 that the term "mutant" occurs. The last feature is 0 because the word "strong" does not occur within the window of size 2 centered on the word of interest. Note that the order of the terms is significant, because the features must be calculated in exactly the same order when exporting the training set and to perform predictions on the test set. If the order is not maintained, the behavior of prediction engines (such as SVMs) is undefined. User: Fabien Campagne Date: Jan 19, 2004 Time: 11:42:11 AM
| Field Summary | |
|---|---|
static int |
LOCATION_CENTERED_ON_WORD
The window is centered on the reference word. |
static int |
LOCATION_LEFT_OF_WORD
The window is at the left of the reference word. |
static int |
LOCATION_RIGHT_OF_WORD
The window is at the right of the reference word. |
| Fields inherited from class textractor.datamodel.FeatureCreationParameters |
|---|
windowSize |
| Constructor Summary | |
|---|---|
SingleBagOfWordFeatureCreationParameters()
Constructs default parameters. |
|
| Method Summary | |
|---|---|
void |
addWordsToTerms(int[] indexedTerms,
int wordOfInterestPosition,
int termLength,
int[] excludedPositions)
To find out what terms are in the window and add them to the non-redundant termsInWindows. |
int |
calculateMaxWindowWidth(int wordOfInterestPosition,
int termLength,
int numTerms)
|
int |
calculateMinWindowIndex(int wordOfInterestPosition)
|
void |
clearTerms()
Clear the terms stored in this parameter. |
int |
getFirstFeatureNumber()
|
int |
getLastFeatureNumber()
|
String[] |
getTerms()
Returns the terms for which features should be calculated. |
IntArrayList |
getTermsInWindows()
|
int |
getWindowLocation()
Window location with respect to the reference word. |
void |
removeTerm(String term,
int indexedTerm)
|
void |
setFirstFeatureNumber(int firstFeatureNumber)
Sets the number of the first feature that this exporter will generate. |
void |
setTerms(DocumentIndexManager docmanager)
Convert indexed terms to their string representation. |
void |
setTermsInWindows(IntArrayList termsInWindows)
|
void |
setWindowLocation(int windowLocation)
|
void |
setWindowSize(int windowSize)
Sets the size of the window. |
void |
updateIndex(DocumentIndexManager docmanager)
Update the index of the termsInWindows in SingleBagOfWordFeatureCretionParameters with the current index. |
boolean |
windowContainsTerm(int[] annotationIndexedTerms,
int term,
int windowCenter,
int termLength,
int minIndex,
int maxIndex)
|
| Methods inherited from class textractor.datamodel.FeatureCreationParameters |
|---|
getParameterNumber, getWindowSize, positionIsExcluded, setParameterNumber |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int LOCATION_RIGHT_OF_WORD
public static final int LOCATION_LEFT_OF_WORD
public static final int LOCATION_CENTERED_ON_WORD
| Constructor Detail |
|---|
public SingleBagOfWordFeatureCreationParameters()
| Method Detail |
|---|
public int getFirstFeatureNumber()
public void setFirstFeatureNumber(int firstFeatureNumber)
firstFeatureNumber - public int getLastFeatureNumber()
public int getWindowLocation()
public void setWindowLocation(int windowLocation)
public String[] getTerms()
getTerms in class FeatureCreationParameterspublic void updateIndex(DocumentIndexManager docmanager)
FeatureCreationParameters
updateIndex in class FeatureCreationParameterspublic void clearTerms()
FeatureCreationParameters
clearTerms in class FeatureCreationParameters
public void removeTerm(String term,
int indexedTerm)
removeTerm in class FeatureCreationParameterspublic void setTerms(DocumentIndexManager docmanager)
setTerms in class FeatureCreationParametersdocmanager - the index managerpublic void setTermsInWindows(IntArrayList termsInWindows)
public IntArrayList getTermsInWindows()
public void setWindowSize(int windowSize)
FeatureCreationParameters
setWindowSize in class FeatureCreationParameterswindowSize - Size of the window (in words) each side of the special
word. The total window width will be double this value (minus 1 if
the reference word is ignored).
public void addWordsToTerms(int[] indexedTerms,
int wordOfInterestPosition,
int termLength,
int[] excludedPositions)
indexedTerms - wordOfInterestPosition - termLength - excludedPositions - public int calculateMinWindowIndex(int wordOfInterestPosition)
public int calculateMaxWindowWidth(int wordOfInterestPosition,
int termLength,
int numTerms)
public boolean windowContainsTerm(int[] annotationIndexedTerms,
int term,
int windowCenter,
int termLength,
int minIndex,
int maxIndex)
|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||