Package org.apache.lucene.analysis.ngram
Class EdgeNGramTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.ngram.NGramTokenizer
org.apache.lucene.analysis.ngram.EdgeNGramTokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
Tokenizes the input from an edge into n-grams of given size(s).
This Tokenizer create n-grams from the beginning edge of a input token.
As of Lucene 4.4, this class supports pre-tokenization and correctly handles supplementary characters.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intFields inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
DEFAULT_MAX_NGRAM_SIZE, DEFAULT_MIN_NGRAM_SIZEFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsConstructorDescriptionEdgeNGramTokenizer(int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given rangeEdgeNGramTokenizer(AttributeFactory factory, int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range -
Method Summary
Methods inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
end, incrementToken, isTokenChar, resetMethods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader, setReaderTestPointMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
DEFAULT_MAX_GRAM_SIZE
public static final int DEFAULT_MAX_GRAM_SIZE- See Also:
-
DEFAULT_MIN_GRAM_SIZE
public static final int DEFAULT_MIN_GRAM_SIZE- See Also:
-
-
Constructor Details
-
EdgeNGramTokenizer
public EdgeNGramTokenizer(int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range- Parameters:
minGram- the smallest n-gram to generatemaxGram- the largest n-gram to generate
-
EdgeNGramTokenizer
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range- Parameters:
factory-AttributeFactoryto useminGram- the smallest n-gram to generatemaxGram- the largest n-gram to generate
-