ElasticSearch Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Mapping a document

The document is also referred to as the root object. It has special parameters that control its behavior, which are mainly used internally to do special processing, such as routing or managing the time-to-live of documents.

In this recipe, we'll take a look at these special fields and learn how to use them.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

You can extend the preceding order example by adding some special fields, as follows:

{
  "order": {
    "_id": {
 "path": "order_id"
 },
 "_type": {
 "store": "yes"
 },
 "_source": {
 "store": "yes"
 },
 "_all": {
 "enable": false
 },
 "_analyzer": {
 "path": "analyzer_field"
 },
 "_boost": {
 "null_value": 1.0
 },
 "_routing": {
 "path": "customer_id",
 "required": true
 },
 "_index": {
 "enabled": true
 },
 "_size": {
 "enabled": true,
 "store": "yes"
 },
 "_timestamp": {
 "enabled": true,
 "store": "yes",
 "path": "date"
 },
 "_ttl": {
 "enabled": true,
 "default": "3y"
 },
    "properties": {
… truncated ….
    }
  }
}

How it works...

Every special field has its own parameters and value options, as follows:

  • _id (by default, it's not indexed or stored): This allows you to index only the ID part of the document. It can be associated with a path field that will be used to extract the id from the source of the document:
            "_id" : {
              "path" : "order_id"
            },
  • _type (by default, it's indexed and not stored): This allows you to index the type of the document.
  • _index (the default value is enabled=false): This controls whether the index must be stored as part of the document. It can be enabled by setting the parameter as enabled=true.
  • _boost (the default value is null_value=1.0): This controls the boost (the value used to increment the score) level of the document. It can be overridden in the boost parameter for the field.
  • _size (the default value is enabled=false): This controls the size if it stores the size of the source record.
  • _timestamp (by default, enabled=false): This automatically enables the indexing of the document's timestamp. If given a path value, it can be extracted by the source of the document and used as a timestamp value. It can be queried as a standard datetime.
  • _ttl (by default, enabled=false): The time-to-live parameter sets the expiration time of the document. When a document expires, it will be removed from the index. It allows you to define an optional parameter, default, to provide a default value for a type level.
  • _all (the default is enabled=true): This controls the creation of the _all field (a special field that aggregates all the text of all the document fields). Because this functionality requires a lot of CPU and storage, if it is not required it is better to disable it.
  • _source (by default, enabled=true): This controls the storage of the document source. Storing the source is very useful, but it's a storage overhead; so, if it is not required, it's better to turn it off.
  • _parent: This defines the parent document (see the Mapping a child document recipe in this chapter).
  • _routing: This controls in which shard the document is to be stored. It supports the following additional parameters:
    • path: This is used to provide a field to be used for routing (customer_id in the earlier example).
    • required (true/false): This is used to force the presence of the routing value, raising an exception if it is not provided
  • _analyzer: This allows you to define a document field that contains the name of the analyzer to be used for fields that do not explicitly define an analyzer or an index_analyzer.

The power of control to index and process a document is very important and allows you to resolve issues related to complex data types.

Every special field has parameters to set a particular configuration, and some of their behaviors may change in different releases of ElasticSearch.

See also

  • The Using dynamic templates in document mapping recipe in this chapter
  • The Putting a mapping in an index recipe in Chapter 4, Basic Operations