@Bindable(inherit=CommonAttributes.class) @Label(value="By URL Clustering") public class ByUrlClusteringAlgorithm extends ProcessingComponentBase implements IClusteringAlgorithm
Document.CONTENT_URLproperty will be used to obtain a document's URL.
Groups at the top level of the hierarchy will correspond to the last segments of the URLs, usually domain suffixes, such as ".com" or ".co.uk". Subgroups will be created based on further segments of the URLs, very often domains subdomains, e.g. "yahoo.com", "bbc.co.uk" and then e.g. "mail.yahoo.com", "news.yahoo.com". The "www" segment of the URLs will be ignored.
Clusters will be ordered by size (number of documents) descendingly; in case of equal
sizes, alphabetically by URL, see
|Modifier and Type||Field and Description|
Clusters created by the algorithm.
Documents to cluster.
|Constructor and Description|
|Modifier and Type||Method and Description|
Performs by URL clustering.
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
@Processing @Input @Internal @Attribute(key="documents", inherit=true) public List<Document> documents
public void process() throws ProcessingException
ProcessingException- when processing failed. If thrown, the
IProcessingComponent.afterProcessing()method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.