|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecttextractor.sentence.AbstractSentenceProcessor
textractor.chain.AbstractSentenceProducer
textractor.chain.loader.AbstractFileLoader
public abstract class AbstractFileLoader
Abstract base class that can be used to load articles from files and process
them into sentences that can be then further processed, indexed or stored
into a database via an appropriate
SentenceConsumer or
SentenceProcessor.
| Field Summary | |
|---|---|
protected boolean |
appendSentencesInOneDocument
Indicates that each article should produce a single sentence with all the article text. |
protected int |
currentIteration
The iteration currently in progress (should start at 1). |
protected int |
numberOfIterations
The number of times to iterate over a file or directory. |
protected static String |
PARAGRAPH_BOUNDARY_TAG
Separator to use between paragraphs in a document. |
protected String |
paragraphBoundary
This string will be placed between sentences. |
protected static String |
SENTENCE_BOUNDARY_TAG
Separator to use when creating sentences in a single document. |
protected String |
sentenceBoundary
This string will be placed between sentences. |
| Fields inherited from class textractor.chain.AbstractSentenceProducer |
|---|
commands, frozen |
| Fields inherited from interface org.apache.commons.chain.Command |
|---|
CONTINUE_PROCESSING, PROCESSING_COMPLETE |
| Constructor Summary | |
|---|---|
AbstractFileLoader()
Create a new SentenceProducer that loads
sentences from files. |
|
| Method Summary | |
|---|---|
protected void |
beginIteration(int iteration)
Called at the start of an iteration. |
Boolean |
call()
Thread that will process a file, directory or list of files. |
protected void |
endIteration(int iteration)
Called at the end of an iteration. |
String |
getDirectory()
Get the name of the directory to process. |
List<String> |
getExtensionList()
Get the filename extensions used to filter which files to process. |
String |
getExtensions()
Get the filename extensions used to filter which files to process. |
String |
getFile()
Get the name of the file to process. |
String |
getList()
Get the name of the file containing the list of files to process. |
int |
getNumberOfSentencesProcessed()
Get the number of sentences processed so far. |
String |
getParagraphBoundaryTag()
Get the string that will be placed between paragraphs. |
String |
getProcessedFileLog()
Get the name (if any) of the file where the names of the files processed are written to. |
String |
getSentenceBoundary()
Get the string that will be placed between sentences. |
boolean |
isAppendSentencesInOneDocument()
|
boolean |
isRecursive()
Should subdirectories be processed as well as files. |
static String |
padWithSpaces(String stringToPad)
|
void |
processDirectory(String directory)
Processes all the files in a given directory. |
void |
processDirectory(String directory,
FilenameFilter filter)
Processes all the files that match a filter in a given directory. |
void |
processDirectory(String directory,
List<String> extensions)
Processes all the files in a given directory. |
void |
processFileList(String list)
Process a list of filenames. |
abstract void |
processFilename(String filename)
Process a single file designated by name. |
void |
processingComplete(SentenceProcessingCompleteEvent event)
This method gets called when a sentence processing is complete. |
Sentence |
produce(Article article,
CharSequence text)
Produce a new Sentence. |
void |
setAppendSentencesInOneDocument(boolean value)
|
void |
setDirectory(String name)
Set the name of the directory to process. |
void |
setExtensions(String extensionString)
Set the filename extensions used to filter which files to process. |
void |
setFile(String name)
Set the name of the file to process. |
void |
setList(String name)
Set the name of the file containing the list of files to process. |
void |
setParagraphBoundary(String boundaryString)
Set the string that will be placed between paragraphs. |
void |
setProcessedFileLog(String name)
Set the name of the file where the names of the files processed are written to. |
void |
setRecursive(boolean value)
Indicated whether subdirectories should be processed as well as files. |
void |
setSentenceBoundary(String boundaryString)
This string will be placed between sentences. |
| Methods inherited from class textractor.chain.AbstractSentenceProducer |
|---|
addCommand, execute, getWorkQueueSize, produce, setWorkQueueSize |
| Methods inherited from class textractor.sentence.AbstractSentenceProcessor |
|---|
addSentenceProcessedListener, addSentenceProcessingCompleteListener, fireSentenceProcessedEvent, fireSentenceProcessingCompleteEvent, removeSentenceProcessedListener, removeSentenceProcessingCompleteListener |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface textractor.sentence.SentenceProcessor |
|---|
addSentenceProcessedListener, addSentenceProcessingCompleteListener, getNumberOfArticlesProcessed, removeSentenceProcessedListener, removeSentenceProcessingCompleteListener |
| Field Detail |
|---|
protected int numberOfIterations
protected int currentIteration
protected boolean appendSentencesInOneDocument
protected static final String SENTENCE_BOUNDARY_TAG
protected static final String PARAGRAPH_BOUNDARY_TAG
protected String sentenceBoundary
protected String paragraphBoundary
| Constructor Detail |
|---|
public AbstractFileLoader()
SentenceProducer that loads
sentences from files.
| Method Detail |
|---|
public final void processDirectory(String directory)
throws IOException
directory - The name of the directory to process.
IOException - if there is a problem reading the directory
processing the files in the directory.
public final void processDirectory(String directory,
List<String> extensions)
throws IOException
extensions - The file extensions to allow, must not be nulldirectory - The name of the directory to process.
IOException - if there is a problem reading the directory
processing the files in the directory.
public final void processDirectory(String directory,
FilenameFilter filter)
throws IOException
directory - The name of the directory to process.filter - Filename filter
IOException - if there is a problem reading the directory
processing the files in the directory.
public final void processFileList(String list)
throws IOException
list - The name of the file containing the list
IOException - if there is a problem reading the list or
processing the files in the list
public abstract void processFilename(String filename)
throws IOException
filename - The name of the file to process.
IOException - if there is a problem reading the file.
public final Boolean call()
throws Exception
IOException - if there is a problem processing the file(s).
Exception
public final Sentence produce(Article article,
CharSequence text)
article - The article the sentence will be associated with.text - The text to be used for the sentence.
public int getNumberOfSentencesProcessed()
public final String getDirectory()
public final void setDirectory(String name)
name - the name of the directory to processpublic final String getFile()
public final void setFile(String name)
name - the name of the file to processpublic final String getList()
public final void setList(String name)
name - the name of the file to processpublic boolean isRecursive()
public void setRecursive(boolean value)
value - true if subdirectories should be processedpublic final String getExtensions()
public final void setExtensions(String extensionString)
extensionString - A comma separated list of filename extensionspublic final List<String> getExtensionList()
public final String getProcessedFileLog()
public final void setProcessedFileLog(String name)
name - The name of the file to write to.protected void beginIteration(int iteration)
iteration - The iteration that just completedprotected void endIteration(int iteration)
iteration - The iteration that just completedpublic void processingComplete(SentenceProcessingCompleteEvent event)
processingComplete in interface SentenceProcessingCompleteListenerprocessingComplete in class AbstractSentenceProducerevent - A SentenceProcessingCompleteEvent object describing
the event source.public boolean isAppendSentencesInOneDocument()
public void setAppendSentencesInOneDocument(boolean value)
public String getParagraphBoundaryTag()
public void setParagraphBoundary(String boundaryString)
boundaryString - the String that will be placed between paragraphspublic String getSentenceBoundary()
public void setSentenceBoundary(String boundaryString)
boundaryString - the String that should be placed between sentencespublic static String padWithSpaces(String stringToPad)
|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||