public final class CustomAnalyzer extends Analyzer
TokenizerFactory,
TokenFilterFactory, and CharFilterFactory.
You can create an instance of this Analyzer using the builder:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer("standard")
.addTokenFilter("standard")
.addTokenFilter("lowercase")
.addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();
The parameters passed to components are also used by Apache Solr and are documented
on their corresponding factory classes. Refer to documentation of subclasses
of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.
The list of names to be used for components can be looked up through:
TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(),
and CharFilterFactory.availableCharFilters().
| Modifier and Type | Class and Description |
|---|---|
static class |
CustomAnalyzer.Builder
Builder for
CustomAnalyzer. |
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponentsGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY| Modifier and Type | Method and Description |
|---|---|
static CustomAnalyzer.Builder |
builder()
Returns a builder for custom analyzers that loads all resources from classpath.
|
static CustomAnalyzer.Builder |
builder(Path configDir)
Returns a builder for custom analyzers that loads all resources from the given
file system base directory.
|
static CustomAnalyzer.Builder |
builder(ResourceLoader loader)
Returns a builder for custom analyzers that loads all resources using the given
ResourceLoader. |
protected Analyzer.TokenStreamComponents |
createComponents(String fieldName)
Creates a new
Analyzer.TokenStreamComponents instance for this analyzer. |
List<CharFilterFactory> |
getCharFilterFactories()
Returns the list of char filters that are used in this analyzer.
|
int |
getOffsetGap(String fieldName)
Just like
Analyzer.getPositionIncrementGap(java.lang.String), except for
Token offsets instead. |
int |
getPositionIncrementGap(String fieldName)
Invoked before indexing a IndexableField instance if
terms have already been added to that field.
|
List<TokenFilterFactory> |
getTokenFilterFactories()
Returns the list of token filters that are used in this analyzer.
|
TokenizerFactory |
getTokenizerFactory()
Returns the tokenizer that is used in this analyzer.
|
protected Reader |
initReader(String fieldName,
Reader reader)
Override this if you want to add a CharFilter chain.
|
String |
toString() |
close, getReuseStrategy, getVersion, setVersion, tokenStream, tokenStreampublic static CustomAnalyzer.Builder builder()
public static CustomAnalyzer.Builder builder(Path configDir)
public static CustomAnalyzer.Builder builder(ResourceLoader loader)
ResourceLoader.protected Reader initReader(String fieldName, Reader reader)
Analyzer
The default implementation returns reader
unchanged.
initReader in class AnalyzerfieldName - IndexableField name being indexedreader - original Readerprotected Analyzer.TokenStreamComponents createComponents(String fieldName)
AnalyzerAnalyzer.TokenStreamComponents instance for this analyzer.createComponents in class AnalyzerfieldName - the name of the fields content passed to the
Analyzer.TokenStreamComponents sink as a readerAnalyzer.TokenStreamComponents for this analyzer.public int getPositionIncrementGap(String fieldName)
AnalyzergetPositionIncrementGap in class AnalyzerfieldName - IndexableField name being indexed.Analyzer.tokenStream(String,Reader).
This value must be >= 0.public int getOffsetGap(String fieldName)
AnalyzerAnalyzer.getPositionIncrementGap(java.lang.String), except for
Token offsets instead. By default this returns 1.
This method is only called if the field
produced at least one token for indexing.getOffsetGap in class AnalyzerfieldName - the field just indexedAnalyzer.tokenStream(String,Reader).
This value must be >= 0.public List<CharFilterFactory> getCharFilterFactories()
public TokenizerFactory getTokenizerFactory()
public List<TokenFilterFactory> getTokenFilterFactories()
Copyright © 2000–2015 The Apache Software Foundation. All rights reserved.