public final class Document extends Object implements Cloneable
TITLE
or
CONTENT_URL
.Modifier and Type | Class and Description |
---|---|
static class |
Document.DocumentToId
Deprecated.
Please use #getStringId() directly or use your own
Function
implementation. |
static interface |
Document.IDocumentSerializationListener
Enables listening to events related to XML/JSON serialization of
Document s. |
Modifier and Type | Field and Description |
---|---|
static Comparator<Document> |
BY_ID_COMPARATOR
Deprecated.
semantics of the identifiers depends on the document source, please
roll your own comparator that is aware of the actual id semantics.
|
static String |
CLICK_URL
Click URL.
|
static String |
CONTENT_URL
Field name for an URL pointing to the full version of the document.
|
static String |
LANGUAGE
Field name for the language in which the document is written.
|
static String |
PARTITIONS
Identifiers of reference clustering partitions this document belongs to.
|
static String |
SCORE
Document score.
|
static String |
SIZE
Document size.
|
static String |
SOURCES
Field name for a list of sources the document was found in.
|
static String |
SUMMARY
Field name for a short summary of the document, e.g.
|
static String |
THUMBNAIL_URL
Field name for an URL pointing to the thumbnail image associated with the document.
|
static String |
TITLE
Field name for the title of the document.
|
Constructor and Description |
---|
Document()
Creates an empty document with no fields.
|
Document(String title)
Creates a document with the provided
title . |
Document(String title,
String summary)
Creates a document with the provided
title and summary . |
Document(String title,
String summary,
LanguageCode language)
Creates a document with the provided
title , summary and
language . |
Document(String title,
String summary,
String contentUrl)
Creates a document with the provided
title , summary and
contentUrl . |
Document(String title,
String summary,
String contentUrl,
LanguageCode language)
Creates a document with the provided
title , summary ,
contentUrl and language . |
Document(String title,
String summary,
String contentUrl,
LanguageCode language,
String id)
Creates a document with the provided
title , summary ,
contentUrl and language and ID. |
Modifier and Type | Method and Description |
---|---|
void |
addSerializationListener(Document.IDocumentSerializationListener listener)
Adds a serialization listener to this document.
|
static void |
assignDocumentIds(Collection<Document> documents)
Assigns sequential identifiers to the provided
documents . |
Document |
clone()
Creates a shallow clone of itself.
|
String |
getContentUrl()
Returns this document's
CONTENT_URL field. |
<T> T |
getField(String name)
Returns value of the specified field of this document.
|
Map<String,Object> |
getFields()
Returns all fields of this document.
|
Integer |
getId()
Deprecated.
please use
getStringId() instead. Currently, this method
attempts to parse the string identifier returned by
getStringId() into an integer. |
LanguageCode |
getLanguage()
Returns this document's
LANGUAGE . |
Double |
getScore()
Returns this document's
SCORE . |
List<String> |
getSources()
Returns this document's
SOURCES field. |
String |
getStringId()
Identifier of this document.
|
String |
getSummary()
Returns this document's
SUMMARY field. |
String |
getTitle()
Returns this document's
TITLE field. |
Document |
setContentUrl(String contentUrl)
Sets this document's
CONTENT_URL field. |
Document |
setField(String name,
Object value)
Sets a field in this document.
|
Document |
setLanguage(LanguageCode language)
Sets this document's
LANGUAGE . |
Document |
setScore(Double score)
Sets this document's
SCORE . |
Document |
setSources(List<String> sources)
Sets this document's
SOURCES field. |
Document |
setSummary(String summary)
Sets this document's
SUMMARY field. |
Document |
setTitle(String title)
Sets this document's
TITLE field. |
public static final String TITLE
public static final String SUMMARY
public static final String CONTENT_URL
public static final String CLICK_URL
CONTENT_URL
.public static final String THUMBNAIL_URL
public static final String SIZE
public static final String SCORE
public static final String SOURCES
List<String>
public static final String LANGUAGE
LanguageCode
. If the language
field is not defined or is
null
, it means the language of the document is unknown or it is
outside of the list defined in LanguageCode
.public static final String PARTITIONS
Value type: Collection<Object>
. There is no constraint on the
actual type of the partition identifier in the collection. Identifiers are assumed
to correctly implement the Object.equals(Object)
and Object.hashCode()
methods.
public static final Comparator<Document> BY_ID_COMPARATOR
public Document()
public Document(String title)
title
.public Document(String title, String summary)
title
and summary
.public Document(String title, String summary, LanguageCode language)
title
, summary
and
language
.public Document(String title, String summary, String contentUrl)
title
, summary
and
contentUrl
.public Document(String title, String summary, String contentUrl, LanguageCode language)
title
, summary
,
contentUrl
and language
.public Document(String title, String summary, String contentUrl, LanguageCode language, String id)
title
, summary
,
contentUrl
and language
and ID. IDs should be unique
for clustering. If all documents passed for clustering have null IDs then
IDs are automatically generated.public Integer getId()
getStringId()
instead. Currently, this method
attempts to parse the string identifier returned by
getStringId()
into an integer.NumberFormatException
- if the identifier could not be converted to an integer
numberpublic String getStringId()
IDocumentSource
that produced the documents.
When processing documents produced by Carrot2-provided IDocumentSource
, the
framework generates unique integer identifiers for all the documents. However, when
XML document sets are loaded using the
ProcessingResult.deserialize(java.io.InputStream)
or
ProcessingResult.deserialize(CharSequence)
methods, the original document
identifiers are preserved, which means they may be non-unique or not present at all.
null
public Document setTitle(String title)
TITLE
field.title
- title to setpublic Document setSummary(String summary)
SUMMARY
field.summary
- summary to setpublic String getContentUrl()
CONTENT_URL
field.public Document setContentUrl(String contentUrl)
CONTENT_URL
field.contentUrl
- content URL to setpublic Document setSources(List<String> sources)
SOURCES
field.sources
- the sources list to setpublic LanguageCode getLanguage()
LANGUAGE
.public Document setLanguage(LanguageCode language)
LANGUAGE
.language
- the language to setpublic Document setScore(Double score)
SCORE
.score
- the SCORE
to setpublic Map<String,Object> getFields()
public <T> T getField(String name)
name
, null
will be returned.name
- of the field to be returnednull
public Document setField(String name, Object value)
name
- of the field to setvalue
- value of the fieldpublic Document clone()
public static void assignDocumentIds(Collection<Document> documents)
documents
. If any
document in the set has a non-empty identifier, no identifiers will be generated at
all.documents
- documents to assign identifiers to.IllegalArgumentException
- Thrown if the collection of documents already contains
identifiers and they are not unique.public void addSerializationListener(Document.IDocumentSerializationListener listener)
listener
- the listener to add