public class PreprocessingContext.AllStems extends Object
PreprocessingContext.documents. Each entry in each array corresponds to one
base form different words can be transformed to by the IStemmer used while
processing. E.g. the English mining and mine will be aggregated
to one entry in the arrays, while they will have separate entries in
PreprocessingContext.AllWords.
All arrays in this class have the same length and values across different arrays correspond to each other for the same index.
| Modifier and Type | Field and Description |
|---|---|
byte[] |
fieldIndices
A bit-packed indices of all fields in which this word appears at least once.
|
char[][] |
image
Stem image as produced by the
IStemmer, may not correspond to any
correct word. |
int[] |
mostFrequentOriginalWordIndex
Pointer to the
PreprocessingContext.AllWords arrays, to the most frequent original form of
the stem. |
int[] |
tf
Term frequency of the stem, i.e.
|
int[][] |
tfByDocument
Term frequency of the stem for each document.
|
| Constructor and Description |
|---|
PreprocessingContext.AllStems() |
public char[][] image
IStemmer, may not correspond to any
correct word.
This array is produced by LanguageModelStemmer.
public int[] mostFrequentOriginalWordIndex
PreprocessingContext.AllWords arrays, to the most frequent original form of
the stem. Pointers to the less frequent variants are not available.
This array is produced by LanguageModelStemmer.
public int[] tf
PreprocessingContext.AllWords.tf values
for which the PreprocessingContext.AllWords.stemIndex points to this stem.
This array is produced by LanguageModelStemmer.
public int[][] tfByDocument
PreprocessingContext.AllWords.tfByDocument.
This array is produced by LanguageModelStemmer. The order of documents in this
array is not defined.
public byte[] fieldIndices
PreprocessingContext.AllFields arrays. Fast conversion between the bit-packed representation
and byte[] with index values is done by PreprocessingContext.toFieldIndexes(byte)
This array is produced by LanguageModelStemmer