72364

Elasticsearch aggregation with hierarchical category, subcategory; limit the levels

Question:

I have products with categories field. Using the aggregation I can get the full categories with all subcategories. I want to limit the levels in the facet.

e.g. I have the facets like:

auto, tools & travel (115) auto, tools & travel > luggage tags (90) auto, tools & travel > luggage tags > luggage spotters (40) auto, tools & travel > luggage tags > something else (50) auto, tools & travel > car organizers (25)

Using aggregation like

"aggs": { "cat_groups": { "terms": { "field": "categories.keyword", "size": 10, "include": "auto, tools & travel > .*" } } }

I am getting buckets like

"buckets": [ { "auto, tools & travel > luggage tags", "doc_count": 90 }, { "key": "auto, tools & travel > luggage tags > luggage spotters", "doc_count": 40 }, { "key": "auto, tools & travel > luggage tags > something else", "doc_count": 50 }, { "key": "auto, tools & travel > car organizers", "doc_count": 25 } ]

But I want to limit the level. e.g. I want to get only the results for auto, tools & travel > luggage tags. How can I limit the levels? By the way, "exclude": ".* > .* > .*" does not work for me.

<strong>I need to get buckets for different levels according to search. Sometimes first level, and sometimes second or third. When I want first level, I don't want the second levels to appear on buckets; and so on for other levels.</strong>

Elasticsearch version 6.4

Answer1:

Finally I've been able to figure the below technique.

I have implemented a custom analyzer using <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html" rel="nofollow">Path Hierarchy Tokenizer</a> and I have created multi-field called categories so that you can use categories.facets for aggregations/facets and do normal text search using categories.

The custom analyzer would only apply for categories.facets

Do note the property "fielddata": "true" for my field categories.facet

<h2>Mapping</h2> PUT myindex { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "path_hierarchy", "delimiter": ">" } } } }, "mappings": { "mydocs": { "properties": { "categories": { "type": "text", "fields": { "facet": { "type": "text", "analyzer": "my_analyzer", "fielddata": "true" } } } } } } } <h2>Sample Documents</h2> POST myindex/mydocs/1 { "categories" : "auto, tools & travel > luggage tags > luggage spotters" } POST myindex/mydocs/2 { "categories" : "auto, tools & travel > luggage tags > luggage spotters" } POST myindex/mydocs/3 { "categories" : "auto, tools & travel > luggage tags > luggage spotters" } POST myindex/mydocs/4 { "categories" : "auto, tools & travel > luggage tags > something else" } <h2>Query</h2>

You can try the below query which you are looking for. Again I've implemented <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html" rel="nofollow">Filter Aggregation</a> because you need only specific words along with <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html" rel="nofollow">Terms Aggregation</a>.

{ "size": 0, "aggs":{ "facets": { "filter": { "bool": { "must": [ { "match": { "categories": "luggage"} } ] } }, "aggs": { "categories": { "terms": { "field": "categories.facet" } } } } } } <h2>Response</h2> { "took": 43, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 11, "max_score": 0, "hits": [] }, "aggregations": { "facets": { "doc_count": 4, "categories": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "auto, tools & travel ", "doc_count": 4 }, { "key": "auto, tools & travel > luggage tags ", "doc_count": 4 }, { "key": "auto, tools & travel > luggage tags > luggage spotters", "doc_count": 3 }, { "key": "auto, tools & travel > luggage tags > something else", "doc_count": 1 } ] } } } }

Final Answer Post Discussion On Chat

POST myindex/_search { "size": 0, "aggs":{ "facets": { "filter": { "bool": { "must": [ { "match": { "categories": "luggage"} } ] } }, "aggs": { "categories": { "terms": { "field": "categories.facet", "exclude": ".*>{1}.*>{1}.*" } } } } } }

Note that I've added exclude with a regular expression in such a way that it would not consider any facets which is having more than one occurrence of >

Let me know this if it helps.

Recommend

  • Leap year and aggregations per week in ElasticSearch
  • How to calculate difference between metrics in different aggregations in elasticsearch
  • SQL like GROUP BY AND HAVING
  • Why In iOS 9 and Xcode 7 not show full view height and width in iPhone 6 and above?
  • hybris's maven doesn't download transitive dependencies
  • Assign a big number to unsigned int in C
  • Add completion handler to presentViewControllerAsSheet(NSViewController)?
  • ElasticSearch searching with hyphen inside a word
  • Database for Full Text Search and 200M+ Records
  • How track json request sent to Elasticsearch via elastic4s client?
  • How to restrict number of concurrent processes?
  • Selenium and Google - How do you use cookies?
  • Broadcast advanced indexing numpy
  • Storing a copy of a document embedded in another document in MongoDB via Mongoose
  • Use neo4j server instead of embedded mode
  • 'doc_del_count' bigger than 'doc_count' on CouchDB
  • cell spacing in div table
  • XSLT foreach repeating nodes to flat
  • How to create a 2D image by rotating 1D vector of numbers around its center element?
  • Azure table store snapshot/backup capability
  • Thread 1: EXC_BAD_ACCESS (code =1 address = 0x0)
  • How to detect interior vertices in groups of 2d polygons? (E.g. ZIP Codes to determine a territory)
  • How to remove a SwiftyJSON element?
  • Not able to aggregate on nested fields in elasticsearch
  • Google Custom Search with transparent background
  • Why value captured by reference in lambda is broken? [duplicate]
  • Insert into database using onclick function
  • What is Eclipse's Declaration View used for?
  • Can Jackson SerializationFeature be overridden per field or class?
  • Deserializing XML into class C#
  • Can I make an Android app that runs a web view in Chrome 39?
  • Redux, normalised entities and lodash merge
  • Android Studio and gradle
  • How to include full .NET prerequisite for Wix Burn installer
  • Why joiner is not used after Sequence generator or Update statergy
  • Recursive/Hierarchical Query Using Postgres
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • costura.fody for a dll that references another dll
  • UserPrincipal.Current returns apppool on IIS
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize