|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objecttextractor.mg4j.docstore.DocumentStoreWriter
public final class DocumentStoreWriter
Provides for writing a document store. A DocumentStore provides random access to documents in a corpus. This class lets you write documents to the store. After constructing this writer, you must call appendDocument() in sequence for each document that is to be written to the store. Writting must be done sequentially, it is not possible to insert documents between documents already written. However, gaps in the index of the documents are supported. For instance, if document 0 is submitted, followed by document 4, documents 1,2 and 3 will be created with empty content, so that random access to documents after the first gap work as expected.
A significant storage compression advantage is realized if the writer is optimized before documents are appended (optimized writer may require 50% less storage for the same content). Optimization leverages the term frequency data found in the inverted index. More frequent terms are coded with shorter bit streams. User: Fabien Campagne Date: Oct 29, 2005 Time: 11:57:37 AM
| Field Summary | |
|---|---|
static int |
DOCUMENT_NOT_FOUND
|
| Constructor Summary | |
|---|---|
DocumentStoreWriter(DocumentIndexManager docmanager)
Initialize a document store writer for the "text" index. |
|
DocumentStoreWriter(IndexDetails indexDetailsVal,
boolean writePmidsVal,
boolean writePositionsVal)
Initialize a document store writer. |
|
| Method Summary | |
|---|---|
void |
addDocumentPMID(long documentNumber,
long pmid)
Add a pmid for a document number. |
int |
appendDocument(int documentIndex,
int[] tokens)
Append a document to this docstore at the given document number. |
int |
appendPositions(List<IntRange> ranges)
Append position information to the document store. |
void |
close()
Close. |
protected void |
finalize()
Finalize (close). |
void |
flush()
Flush the output files. |
static String |
getDocumentDataFilename(String basename)
Create the document data filename based on the index basename. |
static String |
getOffsetFilename(String basename)
Create the document offsets filename based on the index basename. |
static String |
getPMIDMapFilename(String basename)
Get the pmid map filename based on the basename. |
static String |
getPositionFilename(String basename)
The positions data filename based on the index basename. |
static String |
getPositionOffsetFilename(String basename)
The positions offsets filename based on the index basename. |
static String |
getSmallIndexToTermFilename(String basename)
Get the small index to term filename based on the basename. |
boolean |
getWritePositions()
Get the value of writePositions. |
void |
optimizeTermOrdering()
Optimize the term ordering. |
void |
writePMIDs()
Write document PMID info to the pmid map file. |
int |
writtenDocuments()
The number of documents written by this writter. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int DOCUMENT_NOT_FOUND
| Constructor Detail |
|---|
public DocumentStoreWriter(DocumentIndexManager docmanager)
throws FileNotFoundException
docmanager - the document index manager
FileNotFoundException - When docstore files cannot be written with
the given basename.
public DocumentStoreWriter(IndexDetails indexDetailsVal,
boolean writePmidsVal,
boolean writePositionsVal)
throws FileNotFoundException
indexDetailsVal - the document index to write the documents forwritePmidsVal - if true, the pmids file will be writtenwritePositionsVal - if true, the positions data will be written
FileNotFoundException - When docstore files cannot be written with
the given basename.| Method Detail |
|---|
public static String getDocumentDataFilename(String basename)
basename - the index basename
public static String getOffsetFilename(String basename)
basename - the index basename
public static String getPositionFilename(String basename)
basename - the index basename
public static String getPositionOffsetFilename(String basename)
basename - the index basename
public int appendDocument(int documentIndex,
int[] tokens)
throws IOException
documentIndex - the document numbertokens - the document
IOException - error appending a document
public int appendPositions(List<IntRange> ranges)
throws IOException
ranges - The ranges for each term in the document
IOException - If the positions cannot be written to
public void flush()
throws IOException
IOException - error flushingpublic boolean getWritePositions()
protected void finalize()
throws Throwable
finalize in class ObjectThrowable - error finalizing
public void close()
throws IOException
close in interface CloseableIOException - error closing
public void writePMIDs()
throws IOException
IOException - if the file cannot be written or createdpublic static String getPMIDMapFilename(String basename)
basename - the basename to create the filename for
public void optimizeTermOrdering()
throws IOException
IOException - error optimizingpublic static String getSmallIndexToTermFilename(String basename)
basename - the basename to create the filename for
public int writtenDocuments()
public void addDocumentPMID(long documentNumber,
long pmid)
documentNumber - the document numberpmid - the pmid for documentNumber
|
Textractor API textractor-720 (20091120123250) | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||