evitaDB - Fast e-commerce database
logo
page-background

Histogram

Histograms serve a pivotal role in e-commerce parametrized filtering by visually representing the distribution of product attributes, enabling customers to adjust their search criteria efficiently. They facilitate a more interactive and precise filtering experience, allowing users to modify the range of properties like price or size based on actual item availability.

There are actually only a few use cases in e-commerce websites where histograms are used. The most common is the price histogram, which is used to filter products by price. You can see an example of such a histogram on the Booking.com website:

Booking.com price histogram filterBooking.com price histogram filter

It's a shame that the histogram isn't used more often, because it's a very useful tool for gaining insight into the distribution of product attributes with high cardinality values such as weight, height, width and so on.

The histogram data structure is optimized for frontend rendering. It contains the following fields:

  • min - the minimum value of the attribute in the current filter context
  • max - the maximum value of the attribute in the current filter context
  • overallCount - the number of elements whose attribute value falls into any of the buckets (it's basically a sum of all bucket occurrences)
  • buckets - an sorted array of buckets, each of which contains the following fields:
    • threshold - the minimum value of the attribute in the bucket, the maximum value is the threshold of the next bucket (or max for the last bucket)
    • occurrences - the number of elements whose attribute value falls into the bucket
    • relativeFrequency - a value used for visualizing bucket height in UI (0-100 scale):
      • For standard histograms: percentage of total occurrences, calculated as (occurrences / overallCount) * 100
      • For equalized histograms: normalized value density that considers both occurrences and bucket width:
        1. Raw frequency is calculated as occurrences * (totalRange / bucketWidth) - this rewards buckets with many occurrences packed into narrow ranges
        2. Values are then normalized to sum to 100 across all buckets
        3. Empty buckets always have relativeFrequency = 0
    • requested:
      • contains true if the query didn't contain any attributeBetween or priceBetween constraints
      • contains true if the query contained attributeBetween or priceBetween constraint for particular attribute / price and the bucket threshold lies within the range (inclusive) of the constraint
      • contains false otherwise

Attribute histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED|EQUALIZED|EQUALIZED_OPTIMIZED)

The behavior of the histogram calculation:

  • STANDARD (default): Returns exactly the requested number of buckets with equal-width intervals across the value range.
  • OPTIMIZED: Returns fewer buckets when data is sparse to avoid large gaps (empty buckets).
  • EQUALIZED: Returns exactly the requested number of buckets, but positions bucket boundaries based on cumulative frequency distribution so each bucket covers approximately equal portion of total records. This provides better user experience when data is heavily skewed.
  • EQUALIZED_OPTIMIZED: Combines EQUALIZED bucketing with optimization to reduce empty buckets.
argument:string+
one or more names of the entity attribute whose values will be used to generate the histograms
The
can be computed from any filterable attribute whose type is numeric. The histogram is computed only from the attributes of elements that match the current mandatory part of the filter. The interval related constraints - i.e. attributeBetween and priceBetween in the userFilter part are excluded for the sake of histogram calculation. If this weren't the case, the user narrowing the filtered range based on the histogram results would be driven into a narrower and narrower range and eventually into a dead end.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Attribute histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Attribute histogram equalization

Standard histograms use equal-width buckets across the entire value range. This works well for uniformly distributed data but can be problematic when data is heavily skewed. For example, if 90% of products have width between 10-50 cm and only 10% have width between 50-500 cm, equal-width buckets would cram most products into the first few buckets while leaving many empty buckets in the upper range.

The EQUALIZED behavior solves this by positioning bucket boundaries based on cumulative frequency distribution. Instead of dividing the value range into equal intervals, it divides the records into approximately equal groups. Each bucket then covers roughly the same number of items, providing a more balanced and informative histogram.
This technique is inspired by histogram equalization in image processing, adapted for filter slider UX. The algorithm:
  1. Calculates the total weight (sum of all record counts)
  2. Calculates cumulative frequency for each unique value
  3. Positions bucket boundaries at points where cumulative frequency crosses threshold (i/bucketCount)
  4. Counts actual occurrences in each resulting bucket

To demonstrate equalized histogram, we will use the following example:

The simplified result looks like this:

The equalized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, unlike standard histograms where bucket widths are equal, equalized histograms adjust bucket widths to distribute records more evenly. This makes the histogram more useful for filtering when data has a skewed distribution.

Price histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED|EQUALIZED|EQUALIZED_OPTIMIZED)

The behavior of the histogram calculation:

  • STANDARD (default): Returns exactly the requested number of buckets with equal-width intervals across the value range.
  • OPTIMIZED: Returns fewer buckets when data is sparse to avoid large gaps (empty buckets).
  • EQUALIZED: Returns exactly the requested number of buckets, but positions bucket boundaries based on cumulative frequency distribution so each bucket covers approximately equal portion of total records. This provides better user experience when data is heavily skewed.
  • EQUALIZED_OPTIMIZED: Combines EQUALIZED bucketing with optimization to reduce empty buckets.
The
is computed from the price for sale. The interval related constraints - i.e. attributeBetween and priceBetween in the userFilter part are excluded for the sake of histogram calculation. If this weren't the case, the user narrowing the filtered range based on the histogram results would be driven into a narrower and narrower range and eventually into a dead end.
The priceType requirement the source price property for the histogram computation. If no requirement, the histogram visualizes the price with tax.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Price histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Price histogram equalization

Just as with attribute histograms, standard price histograms use equal-width buckets which can be problematic for skewed price distributions. For example, in a marketplace where most items cost $10-$50 but a few luxury items cost $500-$5000, equal-width buckets would waste slider space on the expensive (but sparse) end.

The EQUALIZED behavior for price histograms positions bucket boundaries based on cumulative frequency distribution, so each bucket covers approximately the same number of products. This provides a better filtering experience, especially for e-commerce catalogs with diverse price ranges.

To demonstrate equalized price histogram, we will use the following example:

The simplified result looks like this:

The equalized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the bucket boundaries are positioned to distribute products more evenly across the slider range.

Author: Ing. Jan Novotný

Date updated: 7.11.2023

Documentation Source