Textractor API textractor-720 (20091120123250)

textractor.mg4j.docstore
Class DocumentStoreReader

java.lang.Object
  extended by textractor.mg4j.docstore.DocumentStoreReader
All Implemented Interfaces:
Closeable

public final class DocumentStoreReader
extends Object
implements Closeable

Access a DocumentStore on disk. User: Fabien Campagne Date: Oct 29, 2005 Time: 5:17:17 PM To change this template use File | Settings | File Templates.


Field Summary
static int DOCUMENT_NOT_FOUND
           
 
Constructor Summary
DocumentStoreReader(DocumentIndexManager docmanager)
          Initialize a document store reader for the "text" index.
DocumentStoreReader(IndexDetails indexDetailsVal, boolean readPmidsVal)
          Initialize a document store reader for the inex specified by indexDetailsVal.
DocumentStoreReader(IndexDetails indexDetailsVal, int maxReads, boolean readPmidsVal)
          Initialize a document store reader for the inex specified by indexDetailsVal.
 
Method Summary
 void close()
          Closes this reader and releases any system resources associated with it.
 MutableString document(int documentIndex)
          Retrieves a document from this document store.
 int document(int documentIndex, List<Integer> result)
          Retrieves a document from this document store.
 void document(int documentIndex, MutableString result)
          Retrieves a document from this document store.
 int frequencies(int documentIndex, int[] termFrequencies, int[] numDocForTerm)
          Get frequencies.
 int getDocumentNumber(long pmid)
          Retrieve the document number that corresponds to a given PMID.
 int getNumberOfDocuments()
          Get the total number of documents in the index associated with this document store.
 int getNumberOfTerms()
          Get the total number of terms in the index associated with this document store.
 long getPMID(int documentNumber)
          Retrieve the PMID that corresponds to a given document number.
 int[] getTermPermutation()
          Get term permutation.
 boolean isPositionsAvailable()
          Does this document store contain postion information?
 List<IntRange> positions(int documentIndex)
          Get the range of positions that this term occupied in the original document source.
 void readPMIDs()
          Read PMID information.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DOCUMENT_NOT_FOUND

public static final int DOCUMENT_NOT_FOUND
See Also:
Constant Field Values
Constructor Detail

DocumentStoreReader

public DocumentStoreReader(DocumentIndexManager docmanager)
                    throws IOException
Initialize a document store reader for the "text" index. This will read pmid values.

Parameters:
docmanager - the document index manager
Throws:
IOException - When docstore files cannot be read with the given basename.

DocumentStoreReader

public DocumentStoreReader(IndexDetails indexDetailsVal,
                           boolean readPmidsVal)
                    throws IOException
Initialize a document store reader for the inex specified by indexDetailsVal. It is only necessary to "readPmids" if reading from the "text" index.

Parameters:
indexDetailsVal - the document index to read the documents for
readPmidsVal - if true, the pmids file will be read
Throws:
IOException - When docstore files cannot be read with the given basename.

DocumentStoreReader

public DocumentStoreReader(IndexDetails indexDetailsVal,
                           int maxReads,
                           boolean readPmidsVal)
                    throws IOException
Initialize a document store reader for the inex specified by indexDetailsVal. It is only necessary to "readPmids" if reading from the "text" index.

Parameters:
indexDetailsVal - the document index to read the documents for
maxReads - xx
readPmidsVal - if true, the pmids file will be read
Throws:
IOException - When docstore files cannot be read with the given basename.
Method Detail

readPMIDs

public void readPMIDs()
               throws IOException
Read PMID information. This operation is optional. If performed, getPMID can be used.

Throws:
IOException - if the pmid map file cannot be read
See Also:
getPMID(int)

getPMID

public long getPMID(int documentNumber)
Retrieve the PMID that corresponds to a given document number. This method requires that PMID information has been read.

Parameters:
documentNumber - Document for which the PMID is sought
Returns:
PMID of the article corresponding to the document number
See Also:
readPMIDs()

getDocumentNumber

public int getDocumentNumber(long pmid)
Retrieve the document number that corresponds to a given PMID. This method requires that PMID information has been read. Note also that this is an O(n) operation where n is the number of documents in the store.

Parameters:
pmid - PMID for which the document number is sought
Returns:
document number of the article corresponding to the PMID or #DOCUMENT_NOT_FOUND (-1) if not found.
See Also:
readPMIDs()

document

public MutableString document(int documentIndex)
                       throws IOException
Retrieves a document from this document store.

Parameters:
documentIndex - the document index number to read
Returns:
The text of this document.
Throws:
IOException - error reading the data

document

public void document(int documentIndex,
                     MutableString result)
              throws IOException
Retrieves a document from this document store.

Parameters:
documentIndex - the document index number to read
result - Text of the document will be appended to this MutableString.
Throws:
IOException - error reading the data

document

public int document(int documentIndex,
                    List<Integer> result)
             throws IOException
Retrieves a document from this document store. The document is represented as a list of integer, where each integer codes for a word. Word coding is given by the document manager this document store is associated to.

Parameters:
documentIndex - Index of the document to retrieve.
result - Words of the document will be appended to this IntList.
Returns:
nothing useful? a zero, it appears.
Throws:
IOException - error reading the data

getTermPermutation

public int[] getTermPermutation()
Get term permutation.


frequencies

public int frequencies(int documentIndex,
                       int[] termFrequencies,
                       int[] numDocForTerm)
                throws IOException
Get frequencies.

Throws:
IOException

positions

public List<IntRange> positions(int documentIndex)
                         throws IOException
Get the range of positions that this term occupied in the original document source.

Parameters:
documentIndex - index of the document to get the positions for
Returns:
A range of positons. This will be null if this document store does not contain positon information
Throws:
IOException - if there is a problem reading the positions
See Also:
isPositionsAvailable()

isPositionsAvailable

public boolean isPositionsAvailable()
Does this document store contain postion information?

Returns:
true if the document store has postion information

getNumberOfTerms

public int getNumberOfTerms()
Get the total number of terms in the index associated with this document store.

Returns:
the number of terms

getNumberOfDocuments

public int getNumberOfDocuments()
Get the total number of documents in the index associated with this document store.

Returns:
the number of documents

close

public void close()
           throws IOException
Closes this reader and releases any system resources associated with it. If the reader is already closed then invoking this method has no effect.

Specified by:
close in interface Closeable
Throws:
IOException - if an I/O error occurs

Textractor API textractor-720 (20091120123250)

Copyright © 2003-2008 Institute for Computational Biomedicine, All Rights Reserved.