Qizx/open API

net.axyana.qizxopen.util
Class DefaultWordSifter

java.lang.Object
  extended bynet.axyana.qizxopen.util.DefaultWordSifter
All Implemented Interfaces:
java.io.Serializable, WordSifter
Direct Known Subclasses:
DefaultWordSifterU

public class DefaultWordSifter
extends java.lang.Object
implements WordSifter, java.io.Serializable

A default word extractor suitable for European languages compatible with ISO-8859-1.

By default, words start on a letter, accept letters/digits inside. Optionally (and by default), characters are folded to lowercase and accented letters are converted to the corresponding non-accented letters.

See Also:
Serialized Form

Constructor Summary
DefaultWordSifter()
          Builds a case-insensitive and accents-insensitive sifter.
DefaultWordSifter(boolean caseSensitive, boolean accentSensitive)
          Builds a sifter specifying case and accent sensitiveness.
 
Method Summary
 char charAt(int ahead)
          Returns the character at current position + ahead, or 0 if after end.
 boolean isWordPart(char c)
          Returns true if the char can be part of a word.
 boolean isWordStart(char c)
          Returns true if the char can be at start of a word.
 char mapChar(char c)
          Normalizes a character (belonging to a word)
 char nextChar()
          Moves to next character and return it, returns 0 if at end.
 char[] nextWord()
          Gets the next normalized word, or null if no more words.
 void start(char[] text, int length)
          Starts the analysis of a new text chunk.
 char wildcardSeveral()
          Returns the wildcard character which matches several characters.
 char wildcardSingle()
          Returns the wildcard character which matches a single character.
 int wordLength()
          Returns the original length of the last word returned by nextWord.
 int wordOffset()
          Returns the offset of the last word returned by nextWord.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultWordSifter

public DefaultWordSifter()
Builds a case-insensitive and accents-insensitive sifter.


DefaultWordSifter

public DefaultWordSifter(boolean caseSensitive,
                         boolean accentSensitive)
Builds a sifter specifying case and accent sensitiveness.

Parameters:
caseSensitive - if false, uppercase and lowercase characters are equivalent.
accentSensitive - if false, a letter with diacritic signs is equivalent to the same letter without diacritic sign, for example '?' is equivalent to 'e'.
Method Detail

start

public void start(char[] text,
                  int length)
Description copied from interface: WordSifter
Starts the analysis of a new text chunk.

Specified by:
start in interface WordSifter

isWordStart

public boolean isWordStart(char c)
Description copied from interface: WordSifter
Returns true if the char can be at start of a word.

Specified by:
isWordStart in interface WordSifter

isWordPart

public boolean isWordPart(char c)
Description copied from interface: WordSifter
Returns true if the char can be part of a word.

Specified by:
isWordPart in interface WordSifter

wildcardSeveral

public char wildcardSeveral()
Description copied from interface: WordSifter
Returns the wildcard character which matches several characters. In SQL LIKE patterns, it is '%', in Unix-glob patterns, it is '*'.

Specified by:
wildcardSeveral in interface WordSifter

wildcardSingle

public char wildcardSingle()
Description copied from interface: WordSifter
Returns the wildcard character which matches a single character. In SQL LIKE patterns, it is '_', in Unix-glob patterns, it is '?'.

Specified by:
wildcardSingle in interface WordSifter

mapChar

public char mapChar(char c)
Description copied from interface: WordSifter
Normalizes a character (belonging to a word)

Specified by:
mapChar in interface WordSifter

nextWord

public char[] nextWord()
Description copied from interface: WordSifter
Gets the next normalized word, or null if no more words. Must return a new char array for each word.

Specified by:
nextWord in interface WordSifter

charAt

public char charAt(int ahead)
Description copied from interface: WordSifter
Returns the character at current position + ahead, or 0 if after end.

Specified by:
charAt in interface WordSifter

nextChar

public char nextChar()
Description copied from interface: WordSifter
Moves to next character and return it, returns 0 if at end.

Specified by:
nextChar in interface WordSifter

wordOffset

public int wordOffset()
Description copied from interface: WordSifter
Returns the offset of the last word returned by nextWord.

Specified by:
wordOffset in interface WordSifter

wordLength

public int wordLength()
Description copied from interface: WordSifter
Returns the original length of the last word returned by nextWord. (Most often equal to the length of the returned token).

Specified by:
wordLength in interface WordSifter

© 2005 Axyana Software