Twease

twease.passages
Interface PassageTokenizer

All Known Implementing Classes:
BoundaryPassageTokenizer, FixedWindowPassageTokenizer, FixedWindowWithBoundariesPassageTokenizer, OverlappingFixedWindowPassageTokenizer, OverlappingFixedWindowWithBoundariesPassageTokenizer, Window100WithBoundariesPassageTokenizer, Window300WithBoundariesPassageTokenizer

public interface PassageTokenizer

Implementations of this interface process term match information to produce distinct passage limits. Limits are returned as the interval of the span of the passage in the document.

Author:
Fabien Campagne Date: Feb 16, 2007 Time: 3:26:38 PM

Method Summary
 Iterator<it.unimi.dsi.mg4j.search.Interval> iterator()
          Returns an iterator over passage limits in the current document.
 int passageLength(it.unimi.dsi.mg4j.search.Interval intervalInTMI)
          Returns the length in words for this passage.
 void reset(TermMatchInfo termMatchInfo)
          Associate the tokenizer to term match information for a new document.
 

Method Detail

reset

void reset(TermMatchInfo termMatchInfo)
Associate the tokenizer to term match information for a new document.

Parameters:
termMatchInfo - term match information for a new document

iterator

Iterator<it.unimi.dsi.mg4j.search.Interval> iterator()
Returns an iterator over passage limits in the current document. Note that the Interval represents the indices to the positions stored in the TermMatchInfo object and NOT the actual term positions. This is done to simplify scoring.

Returns:
an iterator over passage limits in the current document.

passageLength

int passageLength(it.unimi.dsi.mg4j.search.Interval intervalInTMI)
Returns the length in words for this passage. This method exists so that Fixed length window tokenizer can return their windowSize, irrespective of how many words are included in the interval.

Parameters:
intervalInTMI - Interval for the passage in the TermMatchInfo object.
Returns:
The window size, or actual length of this passage for variable length tokenizers.

Twease

Copyright © 2005-2009 Institute for Computational Biomedicine, All Rights Reserved.