Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm" |
public final String |
description |
"Unlike other algorithms in Carrot2, this one creates hard clusterings (one document belongs only to one cluster). On the other hand, the clusters are labeled only with individual words that may not always fully correspond to all documents in the cluster." |
public final String |
label |
"" |
public final String |
prefix |
"BisectingKMeansClusteringAlgorithm" |
public final String |
title |
"A very simple implementation of bisecting k-means clustering" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTER_COUNT |
"BisectingKMeansClusteringAlgorithm.clusterCount" |
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
LABEL_COUNT |
"BisectingKMeansClusteringAlgorithm.labelCount" |
public static final String |
MAX_ITERATIONS |
"BisectingKMeansClusteringAlgorithm.maxIterations" |
public static final String |
PARTITION_COUNT |
"BisectingKMeansClusteringAlgorithm.partitionCount" |
public static final String |
PREPROCESSING_PIPELINE |
"BisectingKMeansClusteringAlgorithm.preprocessingPipeline" |
public static final String |
USE_DIMENSIONALITY_REDUCTION |
"BisectingKMeansClusteringAlgorithm.useDimensionalityReduction" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.lingo.ClusterBuilder" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"LingoClusteringAlgorithm" |
public final String |
title |
"Builds cluster labels based on the reduced term-document matrix and assigns documents to the labels" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTER_MERGING_THRESHOLD |
"LingoClusteringAlgorithm.clusterMergingThreshold" |
public static final String |
LABEL_ASSIGNER |
"LingoClusteringAlgorithm.labelAssigner" |
public static final String |
PHRASE_LABEL_BOOST |
"LingoClusteringAlgorithm.phraseLabelBoost" |
public static final String |
PHRASE_LENGTH_PENALTY_START |
"LingoClusteringAlgorithm.phraseLengthPenaltyStart" |
public static final String |
PHRASE_LENGTH_PENALTY_STOP |
"LingoClusteringAlgorithm.phraseLengthPenaltyStop" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.lingo.LingoClusteringAlgorithm" |
public final String |
description |
"Implementation as described in: <i> \"Stanis\u0142aw Osi\u0144ski, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48\u201454.\"</i>." |
public final String |
label |
"Lingo Clustering" |
public final String |
prefix |
"LingoClusteringAlgorithm" |
public final String |
title |
"Lingo clustering algorithm" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DESIRED_CLUSTER_COUNT_BASE |
"LingoClusteringAlgorithm.desiredClusterCountBase" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
PREPROCESSING_PIPELINE |
"LingoClusteringAlgorithm.preprocessingPipeline" |
public static final String |
QUERY |
"query" |
public static final String |
SCORE_WEIGHT |
"LingoClusteringAlgorithm.scoreWeight" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.lingo.SimpleLabelAssigner" |
public final String |
description |
"For each base vector chooses the label that maximizes the base vector--label term vector cosine similarity. Different vectors can get the same label assigned, which means the number of final labels (after duplicate removal) may be smaller than the number of base vectors on input." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A simple and fast label assigner" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.lingo.UniqueLabelAssigner" |
public final String |
description |
"For each base vector chooses the label that maximizes the base vector--label term vector cosine similarity and has not been previously selected. Once a label is selected, it will not be used to label any other vector. This algorithm does not create duplicate cluster labels, which usually means that this assignment method will create more clusters than <code>SimpleLabelAssigner</code>. This method is slightly slower than <code>SimpleLabelAssigner</code>." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Assigns unique labels to each base vector using a greedy algorithm" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.stc.STCClusteringAlgorithm" |
public final String |
description |
"Pretty much as described in: <i>Oren Zamir, Oren Etzioni, Grouper: A Dynamic Clustering Interface to Web Search Results, 1999.</i> Some liberties were taken wherever STC\'s description was not clear enough or where we thought some improvements could be made." |
public final String |
label |
"STC Clustering" |
public final String |
prefix |
"STCClusteringAlgorithm" |
public final String |
title |
"Suffix Tree Clustering (STC) algorithm" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENT_COUNT_BOOST |
"STCClusteringAlgorithm.documentCountBoost" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
IGNORE_WORD_IF_IN_FEWER_DOCS |
"STCClusteringAlgorithm.ignoreWordIfInFewerDocs" |
public static final String |
IGNORE_WORD_IF_IN_HIGHER_DOCS_PERCENT |
"STCClusteringAlgorithm.ignoreWordIfInHigherDocsPercent" |
public static final String |
MAX_BASE_CLUSTERS |
"STCClusteringAlgorithm.maxBaseClusters" |
public static final String |
MAX_CLUSTERS |
"STCClusteringAlgorithm.maxClusters" |
public static final String |
MAX_DESC_PHRASE_LENGTH |
"STCClusteringAlgorithm.maxDescPhraseLength" |
public static final String |
MAX_PHRASE_OVERLAP |
"STCClusteringAlgorithm.maxPhraseOverlap" |
public static final String |
MAX_PHRASES |
"STCClusteringAlgorithm.maxPhrases" |
public static final String |
MERGE_STEM_EQUIVALENT_BASE_CLUSTERS |
"STCClusteringAlgorithm.mergeStemEquivalentBaseClusters" |
public static final String |
MERGE_THRESHOLD |
"STCClusteringAlgorithm.mergeThreshold" |
public static final String |
MIN_BASE_CLUSTER_SCORE |
"STCClusteringAlgorithm.minBaseClusterScore" |
public static final String |
MIN_BASE_CLUSTER_SIZE |
"STCClusteringAlgorithm.minBaseClusterSize" |
public static final String |
MOST_GENERAL_PHRASE_COVERAGE |
"STCClusteringAlgorithm.mostGeneralPhraseCoverage" |
public static final String |
OPTIMAL_PHRASE_LENGTH |
"STCClusteringAlgorithm.optimalPhraseLength" |
public static final String |
OPTIMAL_PHRASE_LENGTH_DEV |
"STCClusteringAlgorithm.optimalPhraseLengthDev" |
public static final String |
PREPROCESSING_PIPELINE |
"STCClusteringAlgorithm.preprocessingPipeline" |
public static final String |
QUERY |
"query" |
public static final String |
SCORE_WEIGHT |
"STCClusteringAlgorithm.scoreWeight" |
public static final String |
SINGLE_TERM_BOOST |
"STCClusteringAlgorithm.singleTermBoost" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.synthetic.ByFieldClusteringAlgorithm" |
public final String |
description |
"By default the <code>Document.SOURCES</code> field is used." |
public final String |
label |
"By Attribute Clustering" |
public final String |
prefix |
"ByAttributeClusteringAlgorithm" |
public final String |
title |
"Clusters documents into a flat structure based on the values of some field of the documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
FIELD_NAME |
"ByAttributeClusteringAlgorithm.fieldName" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.synthetic.ByUrlClusteringAlgorithm" |
public final String |
description |
"<code>Document.CONTENT_URL</code> property will be used to obtain a document\'s URL. <p> Groups at the top level of the hierarchy will correspond to the last segments of the URLs, usually domain suffixes, such as \".com\" or \".co.uk\". Subgroups will be created based on further segments of the URLs, very often domains subdomains, e.g. \"yahoo.com\", \"bbc.co.uk\" and then e.g. \"mail.yahoo.com\", \"news.yahoo.com\". The \"www\" segment of the URLs will be ignored. <p> Clusters will be ordered by size (number of documents) descendingly; in case of equal sizes, alphabetically by URL, see <code>Cluster.BY_REVERSED_SIZE_AND_LABEL_COMPARATOR</code>." |
public final String |
label |
"By URL Clustering" |
public final String |
prefix |
"" |
public final String |
title |
"Hierarchically clusters documents according to their content URLs" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.clustering.synthetic.PassthroughClusteringAlgorithm" |
public final String |
description |
"If no clusters are provided from predecessor components, it produces an empty set of clusters. Otherwise it just passes through the input cluster set." |
public final String |
label |
"By Attribute Clustering" |
public final String |
prefix |
"PassthroughClusteringAlgorithm" |
public final String |
title |
"A do-nothing implementation of <code>IClusteringAlgorithm</code>" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
OTHER_TOPICS |
"other-topics" |
public static final String |
OTHER_TOPICS_LABEL |
"Other Topics" |
public static final String |
SCORE |
"score" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLICK_URL |
"click-url" |
public static final String |
CONTENT_URL |
"url" |
public static final String |
LANGUAGE |
"language" |
public static final String |
PARTITIONS |
"partitions" |
public static final String |
SCORE |
"score" |
public static final String |
SIZE |
"size" |
public static final String |
SOURCES |
"sources" |
public static final String |
SUMMARY |
"snippet" |
public static final String |
THUMBNAIL_URL |
"thumbnail-url" |
public static final String |
TITLE |
"title" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
PASSWORD_PROPERTY |
"http.auth.password" |
public static final String |
USERNAME_PROPERTY |
"http.auth.username" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
PROCESSING_RESULT_TITLE |
"processing-result.title" |
public static final String |
PROCESSING_TIME_ALGORITHM |
"processing-time-algorithm" |
public static final String |
PROCESSING_TIME_SOURCE |
"processing-time-source" |
public static final String |
PROCESSING_TIME_TOTAL |
"processing-time-total" |
public static final String |
QUERY |
"query" |
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
public static final String |
START |
"start" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.core.attribute.CommonAttributes" |
public final String |
description |
"Extracted for consistency." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Attributes shared and inherited by many clustering algorithms" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
PROCESSING_RESULT_TITLE |
"processing-result.title" |
public static final String |
PROCESSING_TIME_ALGORITHM |
"processing-time-algorithm" |
public static final String |
PROCESSING_TIME_SOURCE |
"processing-time-source" |
public static final String |
PROCESSING_TIME_TOTAL |
"processing-time-total" |
public static final String |
QUERY |
"query" |
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
public static final String |
START |
"start" |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final double |
big |
4.503599627370496E15 |
protected static final double |
biginv |
2.220446049250313E-16 |
protected static final double |
LOGPI |
1.1447298858494002 |
protected static final double |
MACHEP |
1.1102230246251565E-16 |
protected static final double |
MAXGAM |
171.6243769563027 |
protected static final double |
MAXLOG |
709.782712893384 |
protected static final double |
MINLOG |
-745.1332191019412 |
protected static final double |
SQRTH |
0.7071067811865476 |
protected static final double |
SQTPI |
2.5066282746310007 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
COL |
1 |
public static final int |
ROW |
0 |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final byte |
FREE |
0 |
protected static final byte |
FULL |
1 |
protected static final int |
NO_KEY_VALUE |
0 |
protected static final byte |
REMOVED |
2 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
largestPrime |
2147483647 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
defaultCapacity |
277 |
public static final double |
defaultMaxLoadFactor |
0.5 |
public static final double |
defaultMinLoadFactor |
0.2 |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_K |
15 |
protected static final int |
DEFAULT_MAX_ITERATIONS |
15 |
protected static final boolean |
DEFAULT_ORDERED |
true |
protected static final double |
DEFAULT_STOP_THRESHOLD |
-1.0 |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_MAX_ITERATIONS |
15 |
protected static final boolean |
DEFAULT_ORDERED |
false |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_MAX_ITERATIONS |
15 |
protected static final boolean |
DEFAULT_ORDERED |
false |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_MAX_ITERATIONS |
15 |
protected static final boolean |
DEFAULT_ORDERED |
false |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_MAX_ITERATIONS |
15 |
protected static final boolean |
DEFAULT_ORDERED |
false |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_K |
-1 |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final int |
DEFAULT_K |
-1 |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.output.metrics.ClusteringMetricsCalculator" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Calculates a set of quality metrics for clusters" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CONTAMINATION |
"contamination" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.output.metrics.ContaminationMetric" |
public final String |
description |
"If a cluster groups documents found in the same <code>Document.PARTITIONS</code>, its contamination is 0. If a cluster groups an equally distributed mix of all partitions, its contamination is 1.0. For a full definition, please see section 4.4.1 of <a href=\"http://project.carrot2.org/publications/osinski04-dimensionality.pdf\">this work</a>. <p> Contamination is calculated for top-level clusters only, taking into account documents from the cluster and all subclusters. Finally, contamination will be calculated only if all input documents have non-blank <code>Document.PARTITIONS</code>s. </p>" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Computes cluster contamination" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
ENABLED |
"org.carrot2.output.metrics.ContaminationMetric.enabled" |
public static final String |
WEIGHTED_AVERAGE_CONTAMINATION |
"org.carrot2.output.metrics.ContaminationMetric.weightedAverageContamination" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.output.metrics.IdealPartitioningBasedMetric" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A base class for metrics based on some reference partitioning" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
PARTITION_ID_FIELD_NAME |
"org.carrot2.output.metrics.IdealPartitioningBasedMetric.partitionIdFieldName" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.output.metrics.NormalizedMutualInformationMetric" |
public final String |
description |
"<p> Metrics will be calculated only if all input documents have non-blank <code>Document.PARTITIONS</code>. </p>" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Computes Normalized Mutual Information (NMI) metric for the cluster set" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
ENABLED |
"org.carrot2.output.metrics.NormalizedMutualInformationMetric.enabled" |
public static final String |
NORMALIZED_MUTUAL_INFORMATION |
"org.carrot2.output.metrics.NormalizedMutualInformationMetric.normalizedMutualInformation" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
BEST_F_MEASURE_PARTITION |
"best-f-measure-partition" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.output.metrics.PrecisionRecallMetric" |
public final String |
description |
"<p> Metrics will be calculated only if all input documents have non-blank <code>Document.PARTITIONS</code>. </p>" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Computes precision, recall and F-metric for all partitions against the provided clusters" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
ENABLED |
"org.carrot2.output.metrics.PrecisionRecallMetric.enabled" |
public static final String |
F_MEASURE_BY_PARTITION |
"org.carrot2.output.metrics.PrecisionRecallMetric.fMeasureByPartition" |
public static final String |
PRECISION_BY_PARTITION |
"org.carrot2.output.metrics.PrecisionRecallMetric.precisionByPartition" |
public static final String |
RECALL_BY_PARTITION |
"org.carrot2.output.metrics.PrecisionRecallMetric.recallByPartition" |
public static final String |
WEIGHTED_AVERAGE_F_MEASURE |
"org.carrot2.output.metrics.PrecisionRecallMetric.weightedAverageFMeasure" |
public static final String |
WEIGHTED_AVERAGE_PRECISION |
"org.carrot2.output.metrics.PrecisionRecallMetric.weightedAveragePrecision" |
public static final String |
WEIGHTED_AVERAGE_RECALL |
"org.carrot2.output.metrics.PrecisionRecallMetric.weightedAverageRecall" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.MultipageSearchEngine" |
public final String |
description |
"This class implements helper methods for concurrent querying of search services that limit the number of search results returned in one request." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A base class facilitating implementation of <code>IDocumentSource</code>s wrapping external search engines with remote/ network-based interfaces" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
SEARCH_MODE |
"search-mode" |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final String |
POSTPROCESSING |
"Postprocessing" |
public static final String |
SERVICE |
"Service" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.SearchEngineBase" |
public final String |
description |
"The base class defines the common attribute fields used by more specific base classes and concrete implementations." |
public final String |
label |
"" |
public final String |
prefix |
"SearchEngineBase" |
public final String |
title |
"A base class facilitating implementation of <code>IDocumentSource</code>s wrapping external search engines with remote/ network-based interfaces" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
COMPRESSED |
"SearchEngineBase.compressed" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
QUERY |
"query" |
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
public static final String |
START |
"start" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
COMPRESSION_KEY |
"compression" |
public static final String |
RESULTS_TOTAL_KEY |
"resultsTotal" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.SearchEngineStats" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"SearchEngineStats" |
public final String |
title |
"Usage statistics for an instance of <code>MultipageSearchEngine</code>" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
PAGE_REQUESTS |
"SearchEngineStats.pageRequests" |
public static final String |
QUERIES |
"SearchEngineStats.queries" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.SimpleSearchEngine" |
public final String |
description |
"This implementation assumes that all requested results can be fetched from the search engine using one request." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A base class facilitating implementation of <code>IDocumentSource</code>s wrapping external search engines with remote/ network-based interfaces" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.ambient.AmbientDocumentSource" |
public final String |
description |
"Ambient (AMBIgous ENTries) is a data set designed for evaluating subtopic information retrieval. It consists of 44 topics, each with a set of subtopics and a list of 100 ranked documents. For more information, please see <a href=\"http://credo.fub.it/ambient/\">Ambient home page</a>." |
public final String |
label |
"" |
public final String |
prefix |
"AmbientDocumentSource" |
public final String |
title |
"Serves documents from the Ambient test set" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
public static final String |
TOPIC |
"AmbientDocumentSource.topic" |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final String |
TOPIC_ID |
"Topic ID" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.ambient.FubDocumentSource" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"FubDocumentSource" |
public final String |
title |
"A base document source for test collections developed at Fondazione Ugo Bordoni" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
DOCUMENTS |
"documents" |
public static final String |
INCLUDE_DOCUMENTS_WITHOUT_TOPIC |
"FubDocumentSource.includeDocumentsWithoutTopic" |
public static final String |
MIN_TOPIC_SIZE |
"FubDocumentSource.minTopicSize" |
public static final String |
QUERY |
"query" |
public static final String |
TOPIC_IDS |
"FubDocumentSource.topicIds" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.ambient.Odp239DocumentSource" |
public final String |
description |
"For more details, please see: http://credo.fub.it/odp239/." |
public final String |
label |
"" |
public final String |
prefix |
"Odp239DocumentSource" |
public final String |
title |
"Serves documents from the ODP239 test set" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
public static final String |
TOPIC |
"Odp239DocumentSource.topic" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.etools.EToolsDocumentSource" |
public final String |
description |
"For commercial licensing of the eTools feed, please e-mail: <code>contact@comcepta.com</code>." |
public final String |
label |
"" |
public final String |
prefix |
"EToolsDocumentSource" |
public final String |
title |
"A Carrot2 input component for the eTools service (https://www.etools.ch)" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
COUNTRY |
"EToolsDocumentSource.country" |
public static final String |
CUSTOMER_ID |
"EToolsDocumentSource.customerId" |
public static final String |
DATA_SOURCES |
"EToolsDocumentSource.dataSources" |
public static final String |
LANGUAGE |
"EToolsDocumentSource.language" |
public static final String |
PARTNER_ID |
"EToolsDocumentSource.partnerId" |
public static final String |
SAFE_SEARCH |
"EToolsDocumentSource.safeSearch" |
public static final String |
SERVICE_URL_BASE |
"EToolsDocumentSource.serviceUrlBase" |
public static final String |
SITE |
"EToolsDocumentSource.site" |
public static final String |
TIMEOUT |
"EToolsDocumentSource.timeout" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.idol.IdolDocumentSource" |
public final String |
description |
"Please note that you will need to install an XSLT stylesheet in your IDOL instance that transforms the search results into the OpenSearch format. The XSLT stylesheet is available under the <tt>org.carrot2.source.idol</tt> package, next to the binaries of this class. <p> Based on code donated by Julien Nioche. Autonomy IDOL support contributed by James Sealey. </p>" |
public final String |
label |
"" |
public final String |
prefix |
"IdolDocumentSource" |
public final String |
title |
"A <code>IDocumentSource</code> fetching <code>Document</code>s (search results) from an IDOL Search Engine" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
IDOL_SERVER_NAME |
"IdolDocumentSource.idolServerName" |
public static final String |
IDOL_SERVER_PORT |
"IdolDocumentSource.idolServerPort" |
public static final String |
MAXIMUM_RESULTS |
"IdolDocumentSource.maximumResults" |
public static final String |
MIN_SCORE |
"IdolDocumentSource.minScore" |
public static final String |
OTHER_SEARCH_ATTRIBUTES |
"IdolDocumentSource.otherSearchAttributes" |
public static final String |
RESULTS_PER_PAGE |
"IdolDocumentSource.resultsPerPage" |
public static final String |
USER_AGENT |
"IdolDocumentSource.userAgent" |
public static final String |
USER_NAME |
"IdolDocumentSource.userName" |
public static final String |
XSL_TEMPLATE_NAME |
"IdolDocumentSource.xslTemplateName" |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final String |
INDEX_PROPERTIES |
"Index properties" |
public static final String |
LUCENE_DOCUMENT_FIELD |
"luceneDocument" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.lucene.LuceneDocumentSource" |
public final String |
description |
"The index should be binary-compatible with the Lucene version actually imported by this plugin." |
public final String |
label |
"" |
public final String |
prefix |
"LuceneDocumentSource" |
public final String |
title |
"A <code>IDocumentSource</code> fetching <code>Document</code>s from a local Apache Lucene index" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ANALYZER |
"LuceneDocumentSource.analyzer" |
public static final String |
DIRECTORY |
"LuceneDocumentSource.directory" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
FIELD_MAPPER |
"LuceneDocumentSource.fieldMapper" |
public static final String |
KEEP_LUCENE_DOCUMENTS |
"LuceneDocumentSource.keepLuceneDocuments" |
public static final String |
QUERY |
"query" |
public static final String |
RESULTS |
"results" |
public static final String |
RESULTS_TOTAL |
"results-total" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.lucene.SimpleFieldMapper" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A simple <code>IFieldMapper</code> with one-to-one mapping between the default title, url and summary fields" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CONTENT_FIELD |
"org.carrot2.source.lucene.SimpleFieldMapper.contentField" |
public static final String |
CONTEXT_FRAGMENTS |
"org.carrot2.source.lucene.SimpleFieldMapper.contextFragments" |
public static final String |
FORMATTER |
"org.carrot2.source.lucene.SimpleFieldMapper.formatter" |
public static final String |
FRAGMENT_JOIN |
"org.carrot2.source.lucene.SimpleFieldMapper.fragmentJoin" |
public static final String |
SEARCH_FIELDS |
"org.carrot2.source.lucene.SimpleFieldMapper.searchFields" |
public static final String |
TITLE_FIELD |
"org.carrot2.source.lucene.SimpleFieldMapper.titleField" |
public static final String |
URL_FIELD |
"org.carrot2.source.lucene.SimpleFieldMapper.urlField" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
SYSPROP_BING7_API |
"bing7.key" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.microsoft.v7.Bing7DocumentSource" |
public final String |
description |
"<p>Important: there are limits for free use of the above API (beyond which it is a paid service)." |
public final String |
label |
"" |
public final String |
prefix |
"Bing7DocumentSource" |
public final String |
title |
"A <code>IDocumentSource</code> fetching web page search results from Bing, using Search API V7" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ADULT |
"Bing7DocumentSource.adult" |
public static final String |
API_KEY |
"Bing7DocumentSource.apiKey" |
public static final String |
MARKET |
"Bing7DocumentSource.market" |
public static final String |
REDIRECT_STRATEGY |
"Bing7DocumentSource.redirectStrategy" |
public static final String |
RESPECT_RATE_LIMITS |
"Bing7DocumentSource.respectRateLimits" |
public static final String |
SITE |
"Bing7DocumentSource.site" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.microsoft.v7.Bing7NewsDocumentSource" |
public final String |
description |
"<p>Important: there are limits for free use of the above API (beyond which it is a paid service)." |
public final String |
label |
"" |
public final String |
prefix |
"Bing7NewsDocumentSource" |
public final String |
title |
"A <code>IDocumentSource</code> fetching news search results from Bing, using Search API V7" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
FRESHNESS |
"Bing7NewsDocumentSource.freshness" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.opensearch.OpenSearchDocumentSource" |
public final String |
description |
"<p> Based on code donated by Julien Nioche." |
public final String |
label |
"" |
public final String |
prefix |
"OpenSearchDocumentSource" |
public final String |
title |
"A <code>IDocumentSource</code> fetching <code>Document</code>s (search results) from an OpenSearch feed" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
FEED_URL_PARAMS |
"OpenSearchDocumentSource.feedUrlParams" |
public static final String |
FEED_URL_TEMPLATE |
"OpenSearchDocumentSource.feedUrlTemplate" |
public static final String |
MAXIMUM_RESULTS |
"OpenSearchDocumentSource.maximumResults" |
public static final String |
RESULTS_PER_PAGE |
"OpenSearchDocumentSource.resultsPerPage" |
public static final String |
USER_AGENT |
"OpenSearchDocumentSource.userAgent" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
E_FETCH_URL |
"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" |
public static final String |
E_SEARCH_URL |
"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" |
public static final int |
PUBMED_TIMEOUT |
24000 |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.pubmed.PubMedDocumentSource" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"PubMedDocumentSource" |
public final String |
title |
"Performs searches on the PubMed database using its on-line e-utilities: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
MAX_RESULTS |
"PubMedDocumentSource.maxResults" |
public static final String |
REDIRECT_STRATEGY |
"PubMedDocumentSource.redirectStrategy" |
public static final String |
TOOL_NAME |
"PubMedDocumentSource.toolName" |
Modifier and Type | Constant Field | Value |
---|---|---|
protected static final String |
FIELD_MAPPING |
"Index field mapping" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.solr.SolrDocumentSource" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"SolrDocumentSource" |
public final String |
title |
"Fetches documents from an instance of Solr" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
COPY_FIELDS |
"SolrDocumentSource.copyFields" |
public static final String |
READ_CLUSTERS |
"SolrDocumentSource.readClusters" |
public static final String |
SERVICE_URL_BASE |
"SolrDocumentSource.serviceUrlBase" |
public static final String |
SOLR_FILTER_QUERY |
"SolrDocumentSource.solrFilterQuery" |
public static final String |
SOLR_ID_FIELD_NAME |
"SolrDocumentSource.solrIdFieldName" |
public static final String |
SOLR_SUMMARY_FIELD_NAME |
"SolrDocumentSource.solrSummaryFieldName" |
public static final String |
SOLR_TITLE_FIELD_NAME |
"SolrDocumentSource.solrTitleFieldName" |
public static final String |
SOLR_URL_FIELD_NAME |
"SolrDocumentSource.solrUrlFieldName" |
public static final String |
SOLR_XSLT_ADAPTER |
"SolrDocumentSource.solrXsltAdapter" |
public static final String |
USE_HIGHLIGHTER_OUTPUT |
"SolrDocumentSource.useHighlighterOutput" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.xml.RemoteXmlSimpleSearchEngineBase" |
public final String |
description |
"The XSLT stylesheet will be loaded once during component initialization and cached for all further requests." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"A base class for implementing data sources based on XML/XSLT" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
REDIRECT_STRATEGY |
"org.carrot2.source.xml.RemoteXmlSimpleSearchEngineBase.redirectStrategy" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.xml.XmlDocumentSource" |
public final String |
description |
"For additional flexibility, an XSLT stylesheet can be applied to the XML stream before it is deserialized into Carrot2 data." |
public final String |
label |
"" |
public final String |
prefix |
"XmlDocumentSource" |
public final String |
title |
"Fetches documents from XML files and streams" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"clusters" |
public static final String |
DOCUMENTS |
"documents" |
public static final String |
QUERY |
"query" |
public static final String |
READ_ALL |
"XmlDocumentSource.readAll" |
public static final String |
READ_CLUSTERS |
"XmlDocumentSource.readClusters" |
public static final String |
RESULTS |
"results" |
public static final String |
TITLE |
"processing-result.title" |
public static final String |
XML |
"XmlDocumentSource.xml" |
public static final String |
XML_PARAMETERS |
"XmlDocumentSource.xmlParameters" |
public static final String |
XSLT |
"XmlDocumentSource.xslt" |
public static final String |
XSLT_PARAMETERS |
"XmlDocumentSource.xsltParameters" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.source.xml.XmlDocumentSourceHelper" |
public final String |
description |
"This helper does note expose any attributes, so that different implementations can decide which attributes they expose." |
public final String |
label |
"" |
public final String |
prefix |
"XmlDocumentSourceHelper" |
public final String |
title |
"Exposes the common functionality a <code>IDocumentSource</code> based on XML/XSLT is likely to need" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
TIMEOUT |
"XmlDocumentSourceHelper.timeout" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
YYEOF |
-1 |
public static final int |
YYINITIAL |
0 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final short |
TF_COMMON_WORD |
4096 |
public static final short |
TF_QUERY_WORD |
8192 |
public static final short |
TF_SEPARATOR_DOCUMENT |
512 |
public static final short |
TF_SEPARATOR_FIELD |
1024 |
public static final short |
TF_SEPARATOR_SENTENCE |
256 |
public static final short |
TF_TERMINATOR |
2048 |
public static final int |
TT_ACRONYM |
5 |
public static final int |
TT_BARE_URL |
7 |
public static final int |
TT_EMAIL |
4 |
public static final int |
TT_EOF |
-1 |
public static final int |
TT_FILE |
8 |
public static final int |
TT_FULL_URL |
6 |
public static final int |
TT_HYPHTERM |
9 |
public static final int |
TT_NUMERIC |
2 |
public static final int |
TT_PUNCTUATION |
3 |
public static final int |
TT_TERM |
1 |
public static final int |
TYPE_MASK |
15 |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.clustering.MultilingualClustering" |
public final String |
description |
"The helper partitions the input documents by <code>org.carrot2.core.Document.LANGUAGE</code>, clusters each such monolingual partition separately and then aggregates the partial cluster lists based on the selected <code>LanguageAggregationStrategy</code>." |
public final String |
label |
"" |
public final String |
prefix |
"MultilingualClustering" |
public final String |
title |
"A helper for clustering multilingual collections of documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
DEFAULT_LANGUAGE |
"MultilingualClustering.defaultLanguage" |
public static final String |
LANGUAGE_AGGREGATION_STRATEGY |
"MultilingualClustering.languageAggregationStrategy" |
public static final String |
LANGUAGE_COUNTS |
"MultilingualClustering.languageCounts" |
public static final String |
MAJORITY_LANGUAGE |
"MultilingualClustering.majorityLanguage" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.linguistic.DefaultLexicalDataFactory" |
public final String |
description |
"Resources are read from disk, cached and shared between <b>all</b> threads using this class. Additional attributes control resource reloading and merging: <code>org.carrot2.text.linguistic.DefaultLexicalDataFactory.resourceLookup</code>, <code>org.carrot2.text.linguistic.DefaultLexicalDataFactory.reloadResources</code>, <code>org.carrot2.text.linguistic.DefaultLexicalDataFactory.mergeResources</code>." |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"The default management of lexical resources" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
MERGE_RESOURCES |
"merge-resources" |
public static final String |
RELOAD_RESOURCES |
"reload-resources" |
public static final String |
RESOURCE_LOOKUP |
"resource-lookup" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.linguistic.DefaultStemmerFactory" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.linguistic.DefaultTokenizerFactory" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.linguistic.LexicalDataLoader" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Common attributes related to loading and caching of lexical resources" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
RELOAD_RESOURCES |
"reload-resources" |
public static final String |
RESOURCE_LOOKUP |
"resource-lookup" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.CaseNormalizer" |
public final String |
description |
"The aim of case normalization is to find the most frequently appearing variants of words in terms of case. For example, if in the input documents <i>MacOS</i> appears 20 times, <i>Macos</i> 5 times and <i>macos</i> 2 times, case normalizer will select <i>MacOS</i> to represent all variants and assign the aggregated term frequency of 27 to it. <p> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllTokens.wordIndex</code></li> <li><code>AllWords.image</code></li> <li><code>AllWords.tf</code></li> <li><code>AllWords.tfByDocument</code></li> </ul> <p> This class requires that <code>Tokenizer</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"CaseNormalizer" |
public final String |
title |
"Performs case normalization and calculates a number of frequency statistics for words" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
DF_THRESHOLD |
"CaseNormalizer.dfThreshold" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.DocumentAssigner" |
public final String |
description |
"For each label candidate from <code>AllLabels.featureIndex</code> an <code>BitSet</code> with the assigned documents is constructed. The assignment algorithm is rather simple: in order to be assigned to a label, a document must contain at least one occurrence of each non-stop word from the label. <p> This class saves the following results to the <code>PreprocessingContext</code> : <ul> <li><code>AllLabels.documentIndices</code></li> </ul> <p> This class requires that <code>Tokenizer</code>, <code>CaseNormalizer</code>, <code>StopListMarker</code>, <code>PhraseExtractor</code> and <code>LabelFilterProcessor</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"DocumentAssigner" |
public final String |
title |
"Assigns document to label candidates" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
EXACT_PHRASE_ASSIGNMENT |
"DocumentAssigner.exactPhraseAssignment" |
public static final String |
MIN_CLUSTER_SIZE |
"DocumentAssigner.minClusterSize" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.LabelFilterProcessor" |
public final String |
description |
"Filtering is applied to <code>AllWords</code> and <code>AllPhrases</code>, the results are saved to <code>AllLabels</code>. Currently, the following filters are applied: <ol> <li><code>StopWordLabelFilter</code></li> <li><code>CompleteLabelFilter</code></li> </ol> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllLabels.featureIndex</code></li> </ul> <p> This class requires that <code>Tokenizer</code>, <code>CaseNormalizer</code>, <code>StopListMarker</code> and <code>PhraseExtractor</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"LabelFilterProcessor" |
public final String |
title |
"Applies basic filtering to words and phrases to produce candidates for cluster labels" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.LabelFormatter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"" |
public final String |
title |
"Formats cluster labels for final rendering" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.LanguageModelStemmer" |
public final String |
description |
"<p> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllWords.stemIndex</code></li> <li><code>AllStems.image</code></li> <li><code>AllStems.mostFrequentOriginalWordIndex</code></li> <li><code>AllStems.tf</code></li> <li><code>AllStems.tfByDocument</code></li> <li><code>AllWords.type</code> is populated with <code>ITokenizer.TF_QUERY_WORD</code></li> </ul> This class requires that <code>Tokenizer</code> and <code>CaseNormalizer</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"LanguageModelStemmer" |
public final String |
title |
"Applies stemming to words and calculates a number of frequency statistics for stems" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.PhraseExtractor" |
public final String |
description |
"A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase <i>computing science</i> appears 2 times and <i>computer sciences</i> appears 4 times, the latter will be returned with aggregated frequency of 6. <p> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllPhrases.wordIndices</code></li> <li><code>AllPhrases.tf</code></li> <li><code>AllPhrases.tfByDocument</code></li> <li><code>AllTokens.suffixOrder</code></li> <li><code>AllTokens.lcp</code></li> </ul> <p> This class requires that <code>Tokenizer</code>, <code>CaseNormalizer</code> and <code>LanguageModelStemmer</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"PhraseExtractor" |
public final String |
title |
"Extracts frequent phrases from the provided document" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
DF_THRESHOLD |
"PhraseExtractor.dfThreshold" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.StopListMarker" |
public final String |
description |
"<p> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllWords.type</code></li> </ul> <p> This class requires that <code>Tokenizer</code> and <code>CaseNormalizer</code> be invoked first." |
public final String |
label |
"" |
public final String |
prefix |
"StopListMarker" |
public final String |
title |
"Marks stop words based on the current language model" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.Tokenizer" |
public final String |
description |
"<p> This class saves the following results to the <code>PreprocessingContext</code>: <ul> <li><code>AllTokens.image</code></li> <li><code>AllTokens.documentIndex</code></li> <li><code>AllTokens.fieldIndex</code></li> <li><code>AllTokens.type</code></li> </ul>" |
public final String |
label |
"" |
public final String |
prefix |
"Tokenizer" |
public final String |
title |
"Performs tokenization of documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
DOCUMENT_FIELDS |
"Tokenizer.documentFields" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.CompleteLabelFilter" |
public final String |
description |
"<p> See <a href=\"http://project.carrot2.org/publications/osinski-2003-lingo.pdf\">this document</a>, page 31 for a definition of a complete phrase." |
public final String |
label |
"" |
public final String |
prefix |
"CompleteLabelFilter" |
public final String |
title |
"A filter that removes \"incomplete\" labels" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"CompleteLabelFilter.enabled" |
public static final String |
LABEL_OVERRIDE_THRESHOLD |
"CompleteLabelFilter.labelOverrideThreshold" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.GenitiveLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"GenitiveLabelFilter" |
public final String |
title |
"Accepts labels that do not end in words in the Saxon Genitive form (e.g. \"Threatening the Country\'s\")" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"GenitiveLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.MinLengthLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"MinLengthLabelFilter" |
public final String |
title |
"Accepts labels whose length in characters is greater or equal to the provided value" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"MinLengthLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.NumericLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"NumericLabelFilter" |
public final String |
title |
"Accepts labels that do not consist only of numeric tokens and start with a non-numeric token" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"NumericLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.QueryLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"QueryLabelFilter" |
public final String |
title |
"Accepts labels that do not consist only of query words" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"QueryLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.StopLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"StopLabelFilter" |
public final String |
title |
"Accepts labels that are not declared as stop labels in the stoplabels.<lang> files" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"StopLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.filter.StopWordLabelFilter" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"StopWordLabelFilter" |
public final String |
title |
"Accepts words that are not stop words and phrases that do not start nor end in a stop word" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
ENABLED |
"StopWordLabelFilter.enabled" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline" |
public final String |
description |
"The preprocessing consists of the following steps: <ol> <li><code>Tokenizer.tokenize(PreprocessingContext)</code></li> <li><code>CaseNormalizer.normalize(PreprocessingContext)</code></li> <li><code>LanguageModelStemmer.stem(PreprocessingContext)</code></li> <li><code>StopListMarker.mark(PreprocessingContext)</code></li> </ol>" |
public final String |
label |
"" |
public final String |
prefix |
"PreprocessingPipeline" |
public final String |
title |
"Performs basic preprocessing steps on the provided documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
LEXICAL_DATA_FACTORY |
"PreprocessingPipeline.lexicalDataFactory" |
public static final String |
STEMMER_FACTORY |
"PreprocessingPipeline.stemmerFactory" |
public static final String |
TOKENIZER_FACTORY |
"PreprocessingPipeline.tokenizerFactory" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline" |
public final String |
description |
"The preprocessing consists of the following steps: <ol> <li><code>Tokenizer.tokenize(PreprocessingContext)</code></li> <li><code>CaseNormalizer.normalize(PreprocessingContext)</code></li> <li><code>LanguageModelStemmer.stem(PreprocessingContext)</code></li> <li><code>StopListMarker.mark(PreprocessingContext)</code></li> <li><code>PhraseExtractor.extractPhrases(PreprocessingContext)</code></li> <li><code>LabelFilterProcessor.process(PreprocessingContext)</code></li> <li><code>DocumentAssigner.assign(PreprocessingContext)</code></li> </ol>" |
public final String |
label |
"" |
public final String |
prefix |
"PreprocessingPipeline" |
public final String |
title |
"Performs a complete preprocessing on the provided documents" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
NO_EDGE |
-1 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
MATRIX_MODEL |
"Matrix model" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.vsm.TermDocumentMatrixBuilder" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"TermDocumentMatrixBuilder" |
public final String |
title |
"Builds a term document matrix based on the provided <code>PreprocessingContext</code>" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
MAX_WORD_DF |
"TermDocumentMatrixBuilder.maxWordDf" |
public static final String |
MAXIMUM_MATRIX_SIZE |
"TermDocumentMatrixBuilder.maximumMatrixSize" |
public static final String |
TERM_WEIGHTING |
"TermDocumentMatrixBuilder.termWeighting" |
public static final String |
TITLE_WORDS_BOOST |
"TermDocumentMatrixBuilder.titleWordsBoost" |
Modifier and Type | Constant Field | Value |
---|---|---|
public final String |
bindableClassName |
"org.carrot2.text.vsm.TermDocumentMatrixReducer" |
public final String |
description |
"" |
public final String |
label |
"" |
public final String |
prefix |
"TermDocumentMatrixReducer" |
public final String |
title |
"Reduces the dimensionality of a term-document matrix using a matrix factorization algorithm" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
FACTORIZATION_FACTORY |
"TermDocumentMatrixReducer.factorizationFactory" |
public static final String |
FACTORIZATION_QUALITY |
"TermDocumentMatrixReducer.factorizationQuality" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
MILLIS |
1 |
public static final int |
MINUTE |
60000 |
public static final int |
SECOND |
1000 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
CLUSTERS |
"Clusters" |
public static final String |
DOCUMENTS |
"Documents" |
public static final String |
FILTERING |
"Filtering" |
public static final String |
LABELS |
"Labels" |
public static final String |
PHRASE_EXTRACTION |
"Phrase extraction" |
public static final String |
PREPROCESSING |
"Preprocessing" |
public static final String |
QUERY |
"Search query" |
public static final String |
RESULT_INFO |
"Search result information" |
public static final String |
SOURCE_PAGING |
"Data source paging" |
public static final String |
WORD_FILTERING |
"Word filtering" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final int |
DEFAULT_TIMEOUT |
8000 |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
TEMPLATE_CACHING_PROPERTY |
"template.caching" |
Modifier and Type | Constant Field | Value |
---|---|---|
public static final String |
NO_XSLT_PROCESSING |
"xslt.filter:disable" |
public static final String |
XSLT_PARAMS_MAP |
"xslt.filter:stylesheet-params" |