Mapping base types
Using explicit mapping allows you to be faster when you start inserting data using a schema-less approach, without being concerned about the field types. Therefore, in order to achieve better results and performance when indexing, it's necessary to manually define a mapping.
Fine-tuning the mapping has some advantages, as follows:
- Reduces the size of the index on disk (disabling functionalities for custom fields)
- Indexes only interesting fields (a general boost to performance)
- Precooks data for a fast search or real-time analytics (such as aggregations)
- Correctly defines whether a field must be analyzed in multiple tokens or whether it should be considered as a single token
ElasticSearch also allows you to use base fields with a wide range of configurations.
Getting ready
You need a working ElasticSearch cluster and an index named test (refer to the Creating an index recipe in Chapter 4, Basic Operations) where you can put the mappings.
How to do it...
Let's use a semi-real-world example of a shop order for our ebay-like shop.
Initially, we define the following order:
Our order record must be converted to an ElasticSearch mapping definition:
{ "order" : { "properties" : { "id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},"sent" : {"type" : "boolean", "index":"not_analyzed"},"name" : {"type" : "string", "index":"analyzed"},"quantity" : {"type" : "integer", "index":"not_analyzed"},"vat" : {"type" : "double", "index":"no"} } } }
Now the mapping is ready to be put in the index. We'll see how to do this in the Putting a mapping in an index recipe in Chapter 4, Basic Operations.
How it works...
The field type must be mapped to one of ElasticSearch's base types, adding options for how the field must be indexed.
The next table is a reference of the mapping types:
Depending on the data type, it's possible to give explicit directives to ElasticSearch on processing the field for better management. The most-used options are:
store
: This marks the field to be stored in a separate index fragment for fast retrieval. Storing a field consumes disk space, but it reduces computation if you need to extract the field from a document (that is, in scripting and aggregations). The possible values for this option areno
andyes
(the default value isno
).index
: This configures the field to be indexed (the default value isanalyzed
). The following are the possible values for this parameter:no
: This field is not indexed at all. It is useful to hold data that must not be searchable.analyzed
: This field is analyzed with the configured analyzer. It is generally lowercased and tokenized, using the default ElasticSearch configuration (StandardAnalyzer
).not_analyzed
: This field is processed and indexed, but without being changed by an analyzer. The default ElasticSearch configuration uses theKeywordAnalyzer
field, which processes the field as a single token.
null_value
: This defines a default value if the field is missing.boost
: This is used to change the importance of a field (the default value is1.0
).index_analyzer
: This defines an analyzer to be used in order to process a field. If it is not defined, the analyzer of the parent object is used (the default value isnull
).search_analyzer
: This defines an analyzer to be used during the search. If it is not defined, the analyzer of the parent object is used (the default value isnull
).analyzer
: This sets both theindex_analyzer
andsearch_analyzer
field to the defined value (the default value isnull
).include_in_all
: This marks the current field to be indexed in the special_all
field (a field that contains the concatenated text of all the fields). The default value istrue
.index_name
: This is the name of the field to be stored in the Index. This property allows you to rename the field at the time of indexing. It can be used to manage data migration in time without breaking the application layer due to changes.norms
: This controls the Lucene norms. This parameter is used to better score queries, if the field is used only for filtering. Its best practice to disable it in order to reduce the resource usage (the default value istrue
for analyzed fields andfalse
for thenot_analyzed
ones).
There's more...
In this recipe, we saw the most-used options for the base types, but there are many other options that are useful for advanced usage.
An important parameter, available only for string mapping, is the term_vector
field (the vector of the terms that compose a string; check out the Lucene documentation for further details at http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/Terms.html) to define the details:
no
: This is the default value, which skipsterm_vector
fieldyes
: This stores theterm_vector
fieldwith_offsets
: This storesterm_vector
with a token offset (the start or end position in a block of characters)with_positions
: This stores the position of the token in theterm_vector
fieldwith_positions_offsets
: This stores all theterm_vector
data
See also
- The ElasticSearch online documentation provides a full description of all the properties for the different mapping fields at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
- The Specifying a different analyzer recipe in this chapter shows alternative analyzers to the standard one.