Qizx/open API

net.axyana.qizxopen.util
Interface WordSifter

All Known Implementing Classes:
DefaultWordSifter

public interface WordSifter

Analyzes text chunks to extract and normalize words. Used in full-text indexing and search.

To parse words, the sifter is first initialized with method start on a text chunk. Then the nextWord() method is called repeatedly until the last word is parsed.


Method Summary
 char charAt(int ahead)
          Returns the character at current position + ahead, or 0 if after end.
 boolean isWordPart(char c)
          Returns true if the char can be part of a word.
 boolean isWordStart(char c)
          Returns true if the char can be at start of a word.
 char mapChar(char c)
          Normalizes a character (belonging to a word)
 char nextChar()
          Moves to next character and return it, returns 0 if at end.
 char[] nextWord()
          Gets the next normalized word, or null if no more words.
 void start(char[] text, int length)
          Starts the analysis of a new text chunk.
 char wildcardSeveral()
          Returns the wildcard character which matches several characters.
 char wildcardSingle()
          Returns the wildcard character which matches a single character.
 int wordLength()
          Returns the original length of the last word returned by nextWord.
 int wordOffset()
          Returns the offset of the last word returned by nextWord.
 

Method Detail

start

public void start(char[] text,
                  int length)
Starts the analysis of a new text chunk.


nextWord

public char[] nextWord()
Gets the next normalized word, or null if no more words. Must return a new char array for each word.


wordOffset

public int wordOffset()
Returns the offset of the last word returned by nextWord.


wordLength

public int wordLength()
Returns the original length of the last word returned by nextWord. (Most often equal to the length of the returned token).


charAt

public char charAt(int ahead)
Returns the character at current position + ahead, or 0 if after end.


nextChar

public char nextChar()
Moves to next character and return it, returns 0 if at end.


mapChar

public char mapChar(char c)
Normalizes a character (belonging to a word)


isWordStart

public boolean isWordStart(char c)
Returns true if the char can be at start of a word.


isWordPart

public boolean isWordPart(char c)
Returns true if the char can be part of a word.


wildcardSeveral

public char wildcardSeveral()
Returns the wildcard character which matches several characters. In SQL LIKE patterns, it is '%', in Unix-glob patterns, it is '*'.


wildcardSingle

public char wildcardSingle()
Returns the wildcard character which matches a single character. In SQL LIKE patterns, it is '_', in Unix-glob patterns, it is '?'.


© 2005 Axyana Software