Mapping a document
The document is also referred to as the root object. It has special parameters that control its behavior, which are mainly used internally to do special processing, such as routing or managing the time-to-live of documents.
In this recipe, we'll take a look at these special fields and learn how to use them.
Getting ready
You need a working ElasticSearch cluster.
How to do it...
You can extend the preceding order
example by adding some special fields, as follows:
{ "order": { "_id": { "path": "order_id" }, "_type": { "store": "yes" }, "_source": { "store": "yes" }, "_all": { "enable": false }, "_analyzer": { "path": "analyzer_field" }, "_boost": { "null_value": 1.0 }, "_routing": { "path": "customer_id", "required": true }, "_index": { "enabled": true }, "_size": { "enabled": true, "store": "yes" }, "_timestamp": { "enabled": true, "store": "yes", "path": "date" }, "_ttl": { "enabled": true, "default": "3y" }, "properties": { … truncated …. } } }
How it works...
Every special field has its own parameters and value options, as follows:
_id
(by default, it's not indexed or stored): This allows you to index only the ID part of the document. It can be associated with apath
field that will be used to extract the id from the source of the document:"_id" : { "path" : "order_id" },
_type
(by default, it's indexed and not stored): This allows you to index the type of the document._index
(the default value isenabled=false
): This controls whether the index must be stored as part of the document. It can be enabled by setting the parameter asenabled=true
._boost
(the default value isnull_value=1.0
): This controls the boost (the value used to increment the score) level of the document. It can be overridden in the boost parameter for the field._size
(the default value isenabled=false
): This controls the size if it stores the size of the source record._timestamp
(by default,enabled=false
): This automatically enables the indexing of the document's timestamp. If given apath
value, it can be extracted by the source of the document and used as a timestamp value. It can be queried as a standard datetime._ttl
(by default,enabled=false
): The time-to-live parameter sets the expiration time of the document. When a document expires, it will be removed from the index. It allows you to define an optional parameter,default
, to provide a default value for a type level._all
(the default isenabled=true
): This controls the creation of the_all
field (a special field that aggregates all the text of all the document fields). Because this functionality requires a lot of CPU and storage, if it is not required it is better to disable it._source
(by default,enabled=true
): This controls the storage of the document source. Storing the source is very useful, but it's a storage overhead; so, if it is not required, it's better to turn it off._parent
: This defines the parent document (see the Mapping a child document recipe in this chapter)._routing
: This controls in which shard the document is to be stored. It supports the following additional parameters:path
: This is used to provide a field to be used for routing (customer_id
in the earlier example).required
(true/false
): This is used to force the presence of the routing value, raising an exception if it is not provided
_analyzer
: This allows you to define a document field that contains the name of the analyzer to be used for fields that do not explicitly define an analyzer or anindex_analyzer
.
The power of control to index and process a document is very important and allows you to resolve issues related to complex data types.
Every special field has parameters to set a particular configuration, and some of their behaviors may change in different releases of ElasticSearch.
See also
- The Using dynamic templates in document mapping recipe in this chapter
- The Putting a mapping in an index recipe in Chapter 4, Basic Operations