public final class ChineseTokenizerAdapter extends Object implements ITokenizer
TF_COMMON_WORD, TF_QUERY_WORD, TF_SEPARATOR_DOCUMENT, TF_SEPARATOR_FIELD, TF_SEPARATOR_SENTENCE, TF_TERMINATOR, TT_ACRONYM, TT_BARE_URL, TT_EMAIL, TT_EOF, TT_FILE, TT_FULL_URL, TT_HYPHTERM, TT_NUMERIC, TT_PUNCTUATION, TT_TERM, TYPE_MASK| Constructor and Description |
|---|
ChineseTokenizerAdapter() |
| Modifier and Type | Method and Description |
|---|---|
short |
nextToken()
Returns the next token from the input stream.
|
void |
reset(Reader input)
Resets the tokenizer to process new data
|
void |
setTermBuffer(MutableCharArray array)
Sets the current token image to the provided buffer.
|
public short nextToken()
throws IOException
ITokenizernextToken in interface ITokenizerITokenizer.TT_TERM and other
constants or ITokenizer.TT_EOF when the end of the data stream has been
reached.IOExceptionTokenTypeUtilspublic void setTermBuffer(MutableCharArray array)
ITokenizersetTermBuffer in interface ITokenizerarray - buffer in which the current token image should be
storedpublic void reset(Reader input) throws IOException
ITokenizerreset in interface ITokenizerinput - the input to tokenize. The reader will not be closed
by the tokenizer when the end of stream is reached.IOException