@Bindable(prefix="MultilingualClustering") public class MultilingualClustering extends Object
Document.LANGUAGE
, clusters each such monolingual
partition separately and then aggregates the partial cluster lists based on the
selected MultilingualClustering.LanguageAggregationStrategy
.Modifier and Type | Class and Description |
---|---|
static class |
MultilingualClustering.LanguageAggregationStrategy
Defines how monolingual partial clusters will be combined to form final results.
|
Modifier and Type | Field and Description |
---|---|
LanguageCode |
defaultLanguage
Default clustering language.
|
MultilingualClustering.LanguageAggregationStrategy |
languageAggregationStrategy
Language aggregation strategy.
|
Map<String,Integer> |
languageCounts
Document languages.
|
String |
majorityLanguage
Majority language.
|
Constructor and Description |
---|
MultilingualClustering() |
Modifier and Type | Method and Description |
---|---|
List<Cluster> |
process(List<Document> documents,
IMonolingualClusteringAlgorithm algorithm) |
@Input @Processing @Attribute @Required @Group(value="Multilingual clustering") @Level(value=MEDIUM) public MultilingualClustering.LanguageAggregationStrategy languageAggregationStrategy
MultilingualClustering.LanguageAggregationStrategy
for the list of available options.@Input @Processing @Attribute @Required @Group(value="Multilingual clustering") @Level(value=MEDIUM) public LanguageCode defaultLanguage
Document.LANGUAGE
.@Output @Processing @Attribute @Group(value="Multilingual clustering") @Level(value=MEDIUM) public Map<String,Integer> languageCounts
@Output @Processing @Attribute @Group(value="Multilingual clustering") @Level(value=MEDIUM) public String majorityLanguage
languageAggregationStrategy
is
MultilingualClustering.LanguageAggregationStrategy.CLUSTER_IN_MAJORITY_LANGUAGE
,
this attribute will provide the majority language that was used to cluster all the documents.
If the majority of the documents have undefined language, this attribute will be
empty and the clustering will be performed in the defaultLanguage
.public List<Cluster> process(List<Document> documents, IMonolingualClusteringAlgorithm algorithm)