ElasticSearch Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Managing a child document

In the previous recipe, you saw how it's possible to manage relationships between objects with the nested object type. The disadvantage of using nested objects is their dependency on their parent. If you need to change the value of a nested object, you need to reindex the parent (this brings about a potential performance overhead if the nested objects change too quickly). To solve this problem, ElasticSearch allows you to define child documents.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

You can modify the mapping of the order example from the Mapping a document recipe by indexing the items as separate child documents.

You need to extract the item object and create a new type of document item with the _parent property set:

{
  "order": {
    "properties": {
      "id": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
      "date": {
        "type": "date",
        "store": "no",
        "index": "not_analyzed"
      },
      "customer_id": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
      "sent": {
        "type": "boolean",
        "store": "no",
        "index": "not_analyzed"
      }
    }
  },
  "item": {
    "_parent": {
 "type": "order"
 },
    "properties": {
      "name": {
        "type": "string",
        "store": "no",
        "index": "analyzed"
      },
      "quantity": {
        "type": "integer",
        "store": "no",
        "index": "not_analyzed"
      },
      "vat": {
        "type": "double",
        "store": "no",
        "index": "not_analyzed"
      }
    }
  }
}

The preceding mapping is similar to the mapping shown in the previous recipes. The item object is extracted from the order (in the previous example, it was nested) and added as a new mapping. The only difference is that "type": "nested" becomes "type": "object" (it can be omitted) and there is a new special field, _parent, which defines the parent-child relation.

How it works...

The child object is a standard root object (document) with an extra property defined, which is _parent.

The type property of _parent refers to the type of parent document.

The child document must be indexed in the same shard as the parent, so that when it is indexed, an extra parameter must be passed: parent id. (We'll see how to do this in later chapters.)

Child documents don't require you to reindex the parent document when you want to change their values, so they are faster for indexing, reindexing (updating), and deleting.

There's more...

In ElasticSearch, there are different ways in which you can manage relationships between objects:

  • Embedding with type=object: This is implicitly managed by ElasticSearch, and it considers the embedded as part of the main document. It's fast but you need to reindex the main document to change a value of the embedded object.
  • Nesting with type=nested: This allows a more accurate search and filtering of the parent, using a nested query on the children. Everything works as in the case of an embedded object, except for the query.
  • External child documents: This is a document in which the children are external documents, with a _parent property to bind them to the parent. They must be indexed in the same shard as the parent. The join with the parent is a bit slower than with the nested one, because the nested objects are in the same data block as the parent in the Lucene index and they are loaded with the parent; otherwise the child documents require more read operations.

Choosing how to model the relationship between objects depends on your application scenario.

There is another approach that can be used, but only on big data documents, which brings poor performance as it's a decoupling join relation. You have to do the join query in two steps: first, you collect the ID of the children/other documents and then you search them in a field of their parent.

See also

  • The Using a has_child query/filter, Using a top_children query, and Using a has_parent query/filter recipes in Chapter 5, Search, Queries, and Filters, for more information on child/parent queries.