@Bindable(inherit=CommonAttributes.class) @Label(value="By URL Clustering") public class ByUrlClusteringAlgorithm extends ProcessingComponentBase implements IClusteringAlgorithm
Document.CONTENT_URL
property will be used to obtain a document's URL.
Groups at the top level of the hierarchy will correspond to the last segments of the URLs, usually domain suffixes, such as ".com" or ".co.uk". Subgroups will be created based on further segments of the URLs, very often domains subdomains, e.g. "yahoo.com", "bbc.co.uk" and then e.g. "mail.yahoo.com", "news.yahoo.com". The "www" segment of the URLs will be ignored.
Clusters will be ordered by size (number of documents) descendingly; in case of equal
sizes, alphabetically by URL, see Cluster.BY_REVERSED_SIZE_AND_LABEL_COMPARATOR
.
Modifier and Type | Field and Description |
---|---|
List<Cluster> |
clusters
Clusters created by the algorithm.
|
List<Document> |
documents
Documents to cluster.
|
Constructor and Description |
---|
ByUrlClusteringAlgorithm() |
Modifier and Type | Method and Description |
---|---|
void |
process()
Performs by URL clustering.
|
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
afterProcessing, beforeProcessing, dispose, init
@Processing @Input @Internal @Attribute(key="documents", inherit=true) public List<Document> documents
@Processing @Output @Internal @Attribute(key="clusters", inherit=true) public List<Cluster> clusters
public void process() throws ProcessingException
process
in interface IProcessingComponent
process
in class ProcessingComponentBase
ProcessingException
- when processing failed. If thrown, the
IProcessingComponent.afterProcessing()
method will be called and the component will
be ready to accept further requests or to be disposed of. Finally, the
exception will be rethrown from the controller method that caused the
component to perform processing.