|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecttextractor.sentence.AbstractSentenceProcessor
textractor.chain.AbstractSentenceConsumer
textractor.chain.indexer.Indexer
public final class Indexer
A Command that creates an index with mg4j.
| Field Summary | |
|---|---|
static int |
DOCUMENT_SEPARATOR
|
| Fields inherited from class textractor.chain.AbstractSentenceConsumer |
|---|
productionCompleted, textractorContext |
| Fields inherited from interface org.apache.commons.chain.Command |
|---|
CONTINUE_PROCESSING, PROCESSING_COMPLETE |
| Constructor Summary | |
|---|---|
Indexer()
Create a new indexer Command. |
|
| Method Summary | |
|---|---|
static Object |
assureFieldType(TextractorFieldInfo fieldInfo,
Object value)
Assure that value is appropriate for fieldInfo.type. |
void |
consume(Article article,
Collection<Sentence> sentences)
Process sentences along with their associated article into individual sentences so that the IndexBuilder process
can read them properly. |
static String |
fieldErrorString(TextractorFieldInfo fieldInfo,
Object value)
The value is not valid for the fieldInfo.tyep. |
String |
getBasename()
|
int |
getCombineBufferSize()
|
int |
getDocumentsPerBatch()
Gets the number of documents per batch. |
int |
getHeight()
|
int |
getIndexingQueueSize()
Get the size of the indexing queue. |
int |
getMinimumDashSplitLength()
|
int |
getNumberOfArticlesProcessed()
Get the number of articles processed so far. |
int |
getNumberOfSentencesProcessed()
Get the number of sentences processed so far. |
int |
getPasteBufferSize()
|
Map<CompressionFlags.Component,CompressionFlags.Coding> |
getPayloadWriterFlags()
|
int |
getQuantum()
|
int |
getScanBufferSize()
|
int |
getSkipBufferSize()
|
Map<CompressionFlags.Component,CompressionFlags.Coding> |
getStandardWriterFlags()
|
String |
getTermProcessorClass()
|
TextractorWordReader |
getWordReader()
|
String |
getZipDocumentCollectionName()
|
boolean |
isLowercaseIndex()
|
boolean |
isParenthesesAreWords()
|
boolean |
isSkips()
|
boolean |
okToComplete()
Indicate that all processing is complete and it's ok to terminate. |
void |
setBasename(String name)
|
void |
setCombineBufferSize(int size)
|
void |
setDocumentFactoryClass(String factoryClassName)
|
void |
setDocumentsPerBatch(int numberOfDocuments)
Sets the number of documents per batch. |
void |
setHeight(int height)
|
void |
setIndexConfigurationFile(String indexConfigurationFileVal)
|
void |
setIndexingQueueSize(int size)
Set the size of the indexing queue. |
void |
setLowercaseIndex(boolean lowercaseIndex)
|
void |
setMinimumDashSplitLength(int minimumDashSplitLength)
|
void |
setParenthesesAreWords(boolean parenthesesAreWords)
Choose if parentheses should be indexed as words. |
void |
setPasteBufferSize(int size)
|
void |
setPayloadWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> flags)
|
void |
setQuantum(int quantum)
|
void |
setScanBufferSize(int size)
|
void |
setSkips(boolean skips)
|
void |
setStandardWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> flags)
|
void |
setTermProcessorClass(String termProcessorClass)
|
void |
setWordReader(TextractorWordReader wordReader)
|
void |
setWordReaderClass(String wordReader)
|
void |
setZipDocumentCollectionName(String zipDocumentCollectionName)
|
void |
size(int size)
|
protected Properties |
storeProperties(String filename)
Stores properties of this class into the given filename. |
| Methods inherited from class textractor.chain.AbstractSentenceConsumer |
|---|
call, execute, processingComplete |
| Methods inherited from class textractor.sentence.AbstractSentenceProcessor |
|---|
addSentenceProcessedListener, addSentenceProcessingCompleteListener, fireSentenceProcessedEvent, fireSentenceProcessingCompleteEvent, removeSentenceProcessedListener, removeSentenceProcessingCompleteListener |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface textractor.sentence.SentenceProcessor |
|---|
addSentenceProcessedListener, addSentenceProcessingCompleteListener, removeSentenceProcessedListener, removeSentenceProcessingCompleteListener |
| Field Detail |
|---|
public static final int DOCUMENT_SEPARATOR
| Constructor Detail |
|---|
public Indexer()
throws IllegalAccessException,
InstantiationException,
ClassNotFoundException
Command.
IllegalAccessException - error creating object
InstantiationException - error creating object
ClassNotFoundException - error creating object| Method Detail |
|---|
public void consume(Article article,
Collection<Sentence> sentences)
IndexBuilder process
can read them properly.
article - The article assoicated with the sentenceQueue.sentences - A collection of Sentences to process.
protected Properties storeProperties(String filename)
throws ConfigurationException
filename - Name of the file to store the properties in
ConfigurationException - if the file cannot be written properly
public static Object assureFieldType(TextractorFieldInfo fieldInfo,
Object value)
fieldInfo - the field we are indexing the value forvalue - the value for the field
IllegalArgumentException - the value is no appropriate for fieldInfo.type
public static String fieldErrorString(TextractorFieldInfo fieldInfo,
Object value)
fieldInfo - the field which is in errorvalue - the value which is invalid
public void setParenthesesAreWords(boolean parenthesesAreWords)
parenthesesAreWords - True if the index should be built with
parentheses indexed as words.public String getBasename()
public void setBasename(String name)
name - the basename to setpublic int getDocumentsPerBatch()
Scan
will attempt to add to each batch.public void setDocumentsPerBatch(int numberOfDocuments)
numberOfDocuments - the number of documents
Scan will attempt to add to each batch.public int getMinimumDashSplitLength()
public void setMinimumDashSplitLength(int minimumDashSplitLength)
minimumDashSplitLength - the minimumDashSplitLength to setpublic TextractorWordReader getWordReader()
public void setWordReader(TextractorWordReader wordReader)
public void setWordReaderClass(String wordReader)
throws ClassNotFoundException,
IllegalAccessException,
InstantiationException
ClassNotFoundException
IllegalAccessException
InstantiationExceptionpublic String getZipDocumentCollectionName()
public void setZipDocumentCollectionName(String zipDocumentCollectionName)
zipDocumentCollectionName - the zipDocumentCollectionName to setpublic boolean isParenthesesAreWords()
public int getHeight()
public void setHeight(int height)
height - the height to setpublic int getQuantum()
public void setQuantum(int quantum)
quantum - the quantum to setpublic boolean isSkips()
public void setSkips(boolean skips)
skips - the skips to setpublic boolean isLowercaseIndex()
public void setLowercaseIndex(boolean lowercaseIndex)
public Map<CompressionFlags.Component,CompressionFlags.Coding> getStandardWriterFlags()
public void setStandardWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> flags)
public Map<CompressionFlags.Component,CompressionFlags.Coding> getPayloadWriterFlags()
public void setPayloadWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> flags)
public String getTermProcessorClass()
public void setTermProcessorClass(String termProcessorClass)
public void setDocumentFactoryClass(String factoryClassName)
public void setIndexConfigurationFile(String indexConfigurationFileVal)
public int getNumberOfArticlesProcessed()
public int getNumberOfSentencesProcessed()
public boolean okToComplete()
SentenceProcessingCompleteEvent.
Be aware of this and send the event if you override the default
behavior.
okToComplete in class AbstractSentenceConsumerpublic int getCombineBufferSize()
public void setCombineBufferSize(int size)
size - the combineBufferSize to setpublic int getPasteBufferSize()
public void setPasteBufferSize(int size)
size - the pasteBufferSize to setpublic int getScanBufferSize()
public void setScanBufferSize(int size)
size - the scanBufferSize to setpublic int getSkipBufferSize()
public void size(int size)
size - the skipBufferSize to setpublic int getIndexingQueueSize()
public void setIndexingQueueSize(int size)
size - The size of the queue.
|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||