public final class CzechAnalyzer extends Analyzer
Analyzer for Czech language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
NOTE: This class uses the same Version
dependent settings as StandardAnalyzer.
| Modifier and Type | Field and Description |
|---|---|
static String[] |
CZECH_STOP_WORDS
List of typical stopwords.
|
overridesTokenStreamMethod| Constructor and Description |
|---|
CzechAnalyzer()
Deprecated.
Use
CzechAnalyzer(Version) instead |
CzechAnalyzer(File stopwords)
Deprecated.
Use
CzechAnalyzer(Version, File) instead |
CzechAnalyzer(HashSet stopwords)
Deprecated.
Use
CzechAnalyzer(Version, HashSet) instead |
CzechAnalyzer(String[] stopwords)
Deprecated.
Use
CzechAnalyzer(Version, String[]) instead |
CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (
CZECH_STOP_WORDS). |
CzechAnalyzer(Version matchVersion,
File stopwords)
Builds an analyzer with the given stop words.
|
CzechAnalyzer(Version matchVersion,
HashSet stopwords) |
CzechAnalyzer(Version matchVersion,
String[] stopwords)
Builds an analyzer with the given stop words.
|
| Modifier and Type | Method and Description |
|---|---|
void |
loadStopWords(InputStream wordfile,
String encoding)
Loads stopwords hash from resource stream (file, database...).
|
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in
the provided Reader. |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader. |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStreampublic static final String[] CZECH_STOP_WORDS
public CzechAnalyzer()
CzechAnalyzer(Version) insteadCZECH_STOP_WORDS).public CzechAnalyzer(Version matchVersion)
CZECH_STOP_WORDS).public CzechAnalyzer(String[] stopwords)
CzechAnalyzer(Version, String[]) insteadpublic CzechAnalyzer(Version matchVersion, String[] stopwords)
public CzechAnalyzer(HashSet stopwords)
CzechAnalyzer(Version, HashSet) insteadpublic CzechAnalyzer(File stopwords) throws IOException
CzechAnalyzer(Version, File) insteadIOExceptionpublic CzechAnalyzer(Version matchVersion, File stopwords) throws IOException
IOExceptionpublic void loadStopWords(InputStream wordfile, String encoding)
wordfile - File containing the wordlistencoding - Encoding used (win-1250, iso-8859-2, ...), null for default system encodingpublic final TokenStream tokenStream(String fieldName, Reader reader)
TokenStream which tokenizes all the text in the provided Reader.tokenStream in class AnalyzerTokenStream built from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilterpublic TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
TokenStream which tokenizes all the text in
the provided Reader.reusableTokenStream in class AnalyzerTokenStream built from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilterIOExceptionCopyright © 2000-2012 Apache Software Foundation. All Rights Reserved.