Textractor API textractor-720 (20091120123250)

textractor.mg4j.document
Class DocStoreDocumentCollection

java.lang.Object
  extended by it.unimi.dsi.mg4j.document.AbstractDocumentSequence
      extended by it.unimi.dsi.mg4j.document.AbstractDocumentCollection
          extended by textractor.mg4j.document.DocStoreDocumentCollection
All Implemented Interfaces:
DocumentCollection, DocumentSequence, SafelyCloseable, FlyweightPrototype<DocumentCollection>, Closeable, Serializable

public final class DocStoreDocumentCollection
extends AbstractDocumentCollection
implements Serializable

The DocStore document collection for all TEXT indicies.

See Also:
Serialized Form

Field Summary
 
Fields inherited from interface it.unimi.dsi.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
 
Constructor Summary
DocStoreDocumentCollection(DocumentIndexManager docmanagerVal)
          Creates a document collection based on the documents in the document store associated to a full text index.
 
Method Summary
 void close()
          Closes this stream and releases any system resources associated with it.
 DocumentCollection copy()
          Make a copy of this DocStoreDocumentCollection.
 Document document(int index)
          Obtain the Document for document number index within the "text" index.
 Document document(String indexAlias, int index)
          Obtain the Document for document number index within the indexAlias index.
 Document documentNoContent(int index)
          Obtain the Document for document number index within the "text" index, but do not retrieve the document content.
 Document documentNoContent(String indexAlias, int index)
          Obtain the Document for document number index within the indexAlias index, but do not retrieve the document content.
 String doi(int index)
          Obtain the DOI for document number index within the "text" index.
 DocumentFactory factory()
          Return the DocumentFactory associated with this DocStoreDocumentCollection.
 DocumentStoreReader getDocumentStoreReader()
          Returns the current DocumentStoreReader for the "text" alias.
 DocumentStoreReader getDocumentStoreReader(String indexAlias)
          Returns the current DocumentStoreReader.
static void main(String[] arg)
           
 Reference2ObjectMap<Enum<?>,Object> metadata(int index)
          Creates metadata for the document at index within the "text" index.
 Reference2ObjectMap<Enum<?>,Object> metadata(String indexAlias, int index)
          Creates metadata for the document at index within the indexAlias index.
 int size()
          Returns the number of documents in the document store/ full text index.
 InputStream stream(int index)
          Obtain the stream for document number index within the "text" index.
 InputStream stream(String indexAlias, int index)
          Obtain the stream for document number index within the indexAlias index.
 
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, printAllDocuments, toString
 
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocumentSequence
finalize
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DocStoreDocumentCollection

public DocStoreDocumentCollection(DocumentIndexManager docmanagerVal)
                           throws IOException
Creates a document collection based on the documents in the document store associated to a full text index.

Parameters:
docmanagerVal - Manager for the full text index associated with the document store.
Throws:
IOException - error opening appropriate files?
Method Detail

getDocumentStoreReader

public DocumentStoreReader getDocumentStoreReader()
Returns the current DocumentStoreReader for the "text" alias.

Returns:
DocumentStoreReader the current DocumentStoreReader.

getDocumentStoreReader

public DocumentStoreReader getDocumentStoreReader(String indexAlias)
Returns the current DocumentStoreReader.

Parameters:
indexAlias - the index alias to obtain the document store reader for.
Returns:
DocumentStoreReader the current DocumentStoreReader.

size

public int size()
Returns the number of documents in the document store/ full text index.

Specified by:
size in interface DocumentCollection
Returns:
Number of documents (sentences) in the database.

document

public Document document(int index)
                  throws IOException
Obtain the Document for document number index within the "text" index.

Specified by:
document in interface DocumentCollection
Parameters:
index - the document number
Returns:
the Document
Throws:
IOException - error retrieving the document

document

public Document document(String indexAlias,
                         int index)
                  throws IOException
Obtain the Document for document number index within the indexAlias index.

Parameters:
indexAlias - the indexAlias from which to retrieve the document
index - the document number
Returns:
the Document
Throws:
IOException - error retrieving the document

documentNoContent

public Document documentNoContent(int index)
                           throws IOException
Obtain the Document for document number index within the "text" index, but do not retrieve the document content.

Parameters:
index - the document number
Returns:
the Document (but without the content)
Throws:
IOException - error retrieving the document

documentNoContent

public Document documentNoContent(String indexAlias,
                                  int index)
                           throws IOException
Obtain the Document for document number index within the indexAlias index, but do not retrieve the document content.

Parameters:
indexAlias - the indexAlias from which to retrieve the document
index - the document number
Returns:
the Document (but without the content)
Throws:
IOException - error retrieving the document

doi

public String doi(int index)
           throws IOException
Obtain the DOI for document number index within the "text" index.

Parameters:
index - the document number
Returns:
the DOI
Throws:
IOException - error retrieving the document

stream

public InputStream stream(int index)
                   throws IOException
Obtain the stream for document number index within the "text" index.

Specified by:
stream in interface DocumentCollection
Parameters:
index - the document number
Returns:
the stream for the document
Throws:
IOException - error retrieving the document

stream

public InputStream stream(String indexAlias,
                          int index)
                   throws IOException
Obtain the stream for document number index within the indexAlias index.

Parameters:
indexAlias - the indexAlias from which to retrieve the stream
index - the document number
Returns:
the stream for the document
Throws:
IOException - error retrieving the document

metadata

public Reference2ObjectMap<Enum<?>,Object> metadata(int index)
                                             throws IOException
Creates metadata for the document at index within the "text" index.

Specified by:
metadata in interface DocumentCollection
Parameters:
index - a document index.
Returns:
the metadata for the document index.
Throws:
IOException - xx

metadata

public Reference2ObjectMap<Enum<?>,Object> metadata(String indexAlias,
                                                    int index)
                                             throws IOException
Creates metadata for the document at index within the indexAlias index.

Parameters:
indexAlias - the indexAlias for which to create metadata
index - a document index.
Returns:
the metadata for the document index.
Throws:
IOException - xx

copy

public DocumentCollection copy()
Make a copy of this DocStoreDocumentCollection.

Specified by:
copy in interface DocumentCollection
Specified by:
copy in interface FlyweightPrototype<DocumentCollection>
Returns:
the copy

factory

public DocumentFactory factory()
Return the DocumentFactory associated with this DocStoreDocumentCollection.

Specified by:
factory in interface DocumentSequence
Returns:
the DocumentFactory

close

public void close()
           throws IOException
Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect.

Specified by:
close in interface DocumentSequence
Specified by:
close in interface Closeable
Overrides:
close in class AbstractDocumentSequence
Throws:
IOException - if an I/O error occurs

main

public static void main(String[] arg)
                 throws IOException,
                        com.martiansoftware.jsap.JSAPException,
                        InstantiationException,
                        IllegalAccessException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        ConfigurationException,
                        ClassNotFoundException,
                        URISyntaxException
Throws:
IOException
com.martiansoftware.jsap.JSAPException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
ConfigurationException
ClassNotFoundException
URISyntaxException

Textractor API textractor-720 (20091120123250)

Copyright © 2003-2008 Institute for Computational Biomedicine, All Rights Reserved.