@Bindable(inherit=LexicalDataLoader.class) public class DefaultLexicalDataFactory extends Object implements ILexicalDataFactory
resourceLookup
, reloadResources
,
mergeResources
.Modifier and Type | Field and Description |
---|---|
boolean |
mergeResources
Merges stop words and stop labels from all known languages.
|
boolean |
reloadResources |
ResourceLookup |
resourceLookup |
Constructor and Description |
---|
DefaultLexicalDataFactory() |
Modifier and Type | Method and Description |
---|---|
ILexicalData |
getLexicalData(LanguageCode languageCode)
The main logic for acquiring a shared
ILexicalData instance. |
static HashSet<String> |
load(IResource resource)
Loads words from a given
IResource (UTF-8, one word per line, #-starting lines
are considered comments). |
@Processing @Input @Attribute(key="reload-resources", inherit=true) public boolean reloadResources
@Processing @Input @Attribute(key="merge-resources") @Label(value="Merge lexical resources") @Level(value=MEDIUM) @Group(value="Preprocessing") public boolean mergeResources
false
, only stop words and stop labels of the active language will be
used. If set to true
, stop words from all LanguageCode
s will
be used together and stop labels from all languages will be used together, no
matter the active language. Lexical resource merging is useful when clustering data
in a mix of different languages and should increase clustering quality in such
settings.@Processing @Input @Internal @Attribute(key="resource-lookup", inherit=true) public ResourceLookup resourceLookup
public ILexicalData getLexicalData(LanguageCode languageCode)
ILexicalData
instance.getLexicalData
in interface ILexicalDataFactory
public static HashSet<String> load(IResource resource) throws IOException
IResource
(UTF-8, one word per line, #-starting lines
are considered comments).IOException