DAX Patterns
https://www.daxpatterns.com
Patterns for Microsoft Excel 2010 & 2013Wed, 15 Feb 2023 07:54:34 +0000en-US
hourly
1 https://wordpress.org/?v=6.5.4Ranking
https://www.daxpatterns.com/ranking/
https://www.daxpatterns.com/ranking/#respondMon, 10 Aug 2020 08:53:42 +0000http://www.sqlbi.com/daxpatterns/?p=6199Read more]]>The ability to rank things is a very common requirement. Finding the best customers, computing the ranking position of products, or detecting the countries with the best sales volumes are among the questions most frequently asked by management.

Ranking can be either static or dynamic. Static ranking assigns to each product a ranking position that is not affected by filters, whereas in dynamic ranking the position is computed every time the user interacts with the report. For example, in dynamic ranking the year selected in the report defines a new calculation of the ranking value.

All the basic ranking calculations are based on the RANKX function, whereas more advanced techniques – like filtering the top 10 products – require the TOPN function and advanced table calculations.

Static ranking

You assign a static ranking to a product by using a calculated column. The calculated column is computed during data refresh. Therefore, the value of the static ranking does not depend on the report filters. For example, in Figure 1 the first product is ranked 1 because the LCD HDTV M140 is the top seller among products of any category, whereas the second product (SV 16xDVD M360 Black) shows a product rank equal to 4 instead of 2. The reason is that there are another two products ranked 2 and 3 that are not included in the TV and Video category, which is selected in the Category slicer. Nevertheless, the ranking being static does not consider report filters. It shows the overall ranking of the products visible in the report.

Removing the filter on Category, the overall ranking shows all the products as one would expect. This is shown in the following figure.

To compute the static ranking of a product based on the Sales Amount measure we need a calculated column in the Product table:

In this code, the ALL function is not needed. However, it clarifies the intention of ranking against all the products which is why we added it; it makes the code easier to read over time.

A similar formula can be used to obtain the ranking over a subset of products. For example, the following calculated column computes the ranking of a product inside its category:

As shown in the figure below, the fourth row (SV 16xDVD M360 Black) has a Product Rank of 4 and a Rank in Category of 2, because the latter is the ranking in the TV and Video category.

Dynamic ranking

The dynamic ranking pattern produces a ranking that changes depending on the report filters. Consequently, it is based on measures instead of calculated columns.

The code of the Product Rank measure is the following:

Measure in the Product table

Product Rank :=
IF (
ISINSCOPE ( 'Product'[Product Name] ),
VAR SalesAmountCurrentProduct = [Sales Amount]
VAR ProductRank =
RANKX (
ALLSELECTED ( 'Product' ),
[Sales Amount]
)
VAR Result =
IF (
NOT ISBLANK ( SalesAmountCurrentProduct ),
ProductRank
)
RETURN
Result
)

Obtaining different rankings requires modifying the table iterated by RANKX. For example, the following figure shows a Rank in Category measure that returns the ranking of a product between the products of the same category, still considering any other filter existing in the report, if any.

The definition of the Rank in Category measure is the following:

Measure in the Product table

Rank in Category :=
VAR SalesAmountCurrentProduct = [Sales Amount]
VAR ProductsInCategory =
CALCULATETABLE (
'Product',
REMOVEFILTERS ( 'Product'[Product Name] ),
ALLSELECTED ( 'Product' ),
VALUES ( 'Product'[Category] )
)
VAR ProductRank =
IF (
ISINSCOPE ( 'Product'[Product Name] ),
RANKX (
ProductsInCategory,
[Sales Amount]
)
)
VAR Result =
IF (
NOT ISBLANK ( SalesAmountCurrentProduct ),
ProductRank
)
RETURN
Result

Showing the top 3 products by category

Ranking is useful to obtain reports that filter products based on their local ranking in a given group. For example, the report below shows how to obtain the top three products for each category. There are two possible solutions to this scenario, depending on whether the product name is part of the report or not.

If the report contains the product name, then we can use the Rank in Category measure of the dynamic pattern and rely on Power BI visual filters.

Although this technique is not the most powerful, we show it because it is a very efficient way of filtering the top three products. Besides, it also solves the most common requirement which is to actually show by name the products included in the top three.

Nevertheless, if the product name is not part of the visual, then this technique cannot be used. The reason is that the granularity of the visual is not compatible with the measure and the previous technique no longer works. In the figure below, we removed the product names from the report above.

The reason the visual filter is not effective is because it is only applied to the maximum granularity of the visual. Therefore, the visual filter does not necessarily apply to the products. In order to enforce the filter over product names, the measure displaying Sales Amount must enforce the computation of the ranking at the correct granularity, determining the products to be included in the calculation and then using those products as a filter. The report must display the amount using the following definition of Sales Top 3 Products:

Measure in the Sales table

Sales Top 3 Products :=
VAR TopThreeProducts =
GENERATE (
ALLSELECTED ( 'Product'[Category] ), -- For each category
TOPN ( -- retrieve the top
3, -- three product
ALLSELECTED ( 'Product'[Product Name] ), -- names based on the
[Sales Amount] -- sales amount
)
)
VAR Result =
CALCULATE (
[Sales Amount], -- Compute sales amount
KEEPFILTERS ( TopThreeProducts ) -- using TopThreeProducts as a
) -- further filter
RETURN
Result

The following figure shows the result of Sales Top 3 Products side by side with Sales Amount. Though the product name is not part of the report, the formula for Sales Top 3 Products retrieves sales strictly for the top three products of each category, ignoring all other products. This also applies to the grand total of the report.

Performance-wise, the formula used for Sales Top 3 Products is slightly slower than the one using the visual-level filter. Therefore, we suggest implementing the first solution, if feasible, and reverting to the full pattern only if strictly necessary or if the client tool does not support visual-level filters.

]]>
https://www.daxpatterns.com/ranking/feed/0Transition matrix
https://www.daxpatterns.com/transition-matrix/
https://www.daxpatterns.com/transition-matrix/#respondMon, 10 Aug 2020 08:48:11 +0000http://www.sqlbi.com/daxpatterns/?p=6187Read more]]>The Transition matrix pattern analyzes changes in an attribute assigned to an entity at regular intervals. For example, customers might receive a ranking evaluation every month, or products might have a rating score measured every week. Measuring the changes in rating between two points in time might require the evaluation of how many items moved from one rating to another within the considered interval. The transition matrix enables the end user to make this kind of analysis by just manipulating filters in a report and without having to write any custom query.

Introduction

Each product is assigned a monthly rating based on the comparison between the percentage of sales in the current month and in the previous month. The configuration is depicted in Figure 1.

A simple implementation of the dynamic segmentation lets you analyze how many products fall under each rating every month, like in Figure 2.

As you can imagine, one same product might be assigned different ratings over time. The same matrix in Figure 3, focusing on a single product, shows the situation for A. Datum SLR Camera.

From a broader point of view, an interesting analysis is: taking all the products that had a given rating in a starting month, how did they evolve over time? Has their rating improved or worsened in the following months? You can see the result in Figure 4.

The report is showing that there are 36 products rated Stable in March 2007. The rating for that same set of products changes in different months, and a product only has a rating for months when there are sales. The number of products with a rating might thus change over time. In April for example, 8 out of the 36 products have a lower rating, 10 have the same rating, and 13 have a higher rating. 5 of the original 36 products have no rating in April 2007, because there were no sales for those 5 products. The products considered in April are only products with a rating in March 2007, the only change is their monthly rating and the 5 products without sales in April are not included because they have no rating in that month. The same reasoning applies to all the other months, always based on the 36 products that are Stable in March.

There are multiple ways of generating the transition matrix; here, we outline two possible solutions. The first solution is based on a snapshot table, generating a very fast static transition matrix. The second solution is based on pure DAX code, resulting in a slower but more flexible dynamic transition matrix.

Both patterns share some of the data modeling requirements. Therefore, we first explain the easier static transition matrix. Later on, we dive into more details with the dynamic transition matrix. In the dynamic transition matrix section, we will not repeat some of the details explained in the static transition matrix. Therefore, if you need to implement the dynamic transition matrix pattern, please review the static pattern first, in order to gather the required information on how to setup your model.

Static transition matrix

The static transition matrix uses a snapshot table containing the rating assigned to each product on a monthly basis. In the example provided, we generated this snapshot through a DAX calculated table. In your scenario, you might have the same information already provided in the data source. The important thing is that the table must contain the month, the product, and the rating assigned. In Figure 5 you can see an excerpt of the Monthly Ratings snapshot table.

The snapshot table is not enough to solve the scenario. We need two additional tables to enable the user to select a starting month and a starting rating. The user interface provided to the user is visible in Figure 6.

The slicer for Starting Month (1) cannot be based on the Date[Calendar Year Month] column. Indeed, the Date table is already used in the rows of the matrix (3). Therefore, the Date table cannot be filtered by an external slicer in order to show – for example – September 2007 even though the starting month is March 2007. Similarly, the slicer with the Starting Rating (2) cannot use the same snapshot rating attribute applied to the columns of the matrix (4). The columns of the matrix and the slicer must be fed by different tables.

We need two calculated tables for the slicers that we call Starting Month and Starting Rating:

Calculated table

Starting Month =
SELECTCOLUMNS (
SUMMARIZE ( Sales, 'Date'[Calendar Year Month], 'Date'[Calendar Year Month Number] ),
"Starting Month", 'Date'[Calendar Year Month],
"Starting Month Sort", 'Date'[Calendar Year Month Number]
)

These two slicer tables are not linked with any of the other tables in the model. Only the DAX code will read their selection and use it to compute the result of the measures.

However, the snapshot tables must be linked with the remaining part of the model through appropriate relationships. In this example we use a weak many-to-many relationship with Date based on the Calendar Year Month Number column, and a simple one-to-many strong relationship with Product based on the ProductKey column. The diagram is visible in Figure 7.

Once the model is set, the DAX code must read the current selection on the two slicer tables and use the information to determine the list of products that – in the selected month – are in the selected status. Once the list of products is computed, it is used as a filter over the snapshot table in order to restrict the calculation strictly to the relevant products:

Because the static transition matrix is based on a calculated table, its results are not dynamic. This means that if the user filters the customers in a specific country, the numbers in the transition matrix will not change. The only tables that affect the result are the ones linked through physical relationships with the snapshot. In this example these tables are Date and Product.

If you need a dynamic transition matrix that recomputes its result every time based on the current selection across the entire data model, then you need to implement the more powerful (albeit slower) dynamic transition matrix.

Dynamic transition matrix

The dynamic transition matrix solves the same scenario as the static transition matrix, with the noticeable difference that it does not require the snapshot table. Instead, it computes the result every time the measure is evaluated, resulting in a dynamic calculation.

The data model is the same as the static transition matrix, but no snapshot table is required this time. The result is visible in Figure 8, where we added a slicer filtering one continent – the same slicer would have no effect on a static transition matrix.

Because the pattern requires computing the ranking of a product multiple times, this time we created a measure to return the rating of a product in a given month:

Measure in the Sales table

Status :=
CALCULATE (
DISTINCT ( Rating[Rating] ),
FILTER (
ALLNOBLANKROW ( Rating ),
VAR CurrentValue = [% Of Previous Month]
VAR LowerBoundary = Rating[Min Growth]
VAR UpperBoundary = Rating[Max Growth]
RETURN
AND ( CurrentValue >= LowerBoundary, CurrentValue < UpperBoundary )
)
)

The final measure is quite intricate. It is divided into two separate steps:

Compute the list of the products that are in one of the selected states and months, chosen by the user with the slicers. To perform this operation – since the ranking of each product is unknown at the beginning – the formula computes the ranking of each product and then filters out the ones that are not selected.

Compute the status of the products computed earlier, this time in the current filter context. This second step is very similar to the previous one; the only important difference is in the filtering of dates and products, as better outlined in the code comments:

Measure in Sales table

# Products Matrix Dynamic :=
VAR SelectedStartingMonths = -- First we save the
TREATAS ( -- selected starting month
VALUES ( 'Starting Month'[Starting Month] ), -- as a Date column to make
'Date'[Calendar Year Month] -- it filter Date, later on
)
VAR SelectedStartingRatings = -- Save the currently selected
VALUES ( 'Starting Rating'[Starting Rating] ) -- rating as the starting point
VAR CurrentRatings = -- Save the ratings of the current filter
VALUES ( Rating[Rating] ) -- context (not the starting, the current ones)
VAR StartingProdsAndMonths = -- We store the products and months
CALCULATETABLE ( -- of the starting time period in a variable
SUMMARIZE (
'Sales',
'Product'[ProductKey],
'Date'[Calendar Year Month] -- Beware that the report is filtering
), -- a different period. For this reason we
SelectedStartingMonths, -- need to remove the outer filter on date
REMOVEFILTERS ( 'Date' ) -- and replace it with the starting month
)
VAR StartingProdAndStatus = -- Here we compute the Status assigned to
CALCULATETABLE ( -- the products in the starting months,
FILTER ( -- only keeping the products whose status
StartingProdsAndMonths, -- is among the selected starting ratings
[Status] IN SelectedStartingRatings
),
REMOVEFILTERS ( 'Date' ) -- Required to get rid of the original filter
)
VAR StartingProductsInStatus = -- Finally, we are only interested in the
DISTINCT ( -- product keys, so we remove other columns
SELECTCOLUMNS ( -- from the previous table
StartingProdAndStatus, -- This variable will only filter Product
"ProductKey", 'Product'[ProductKey]
)
)
--
-- At this point, we determined the products that were in the given status
-- in the starting month. The next step is to use the current filter context
-- created by the matrix to check where those products are in the target period.
--
-- The code is very similar to the previous code, this time using StartingProductsInStatus
-- as a filter over Sales, so to restrict the analysis
--
VAR CurrentProdsAndMonths = -- Determines the products and months
CALCULATETABLE ( -- in the current time period
SUMMARIZE (
'Sales',
'Product'[ProductKey],
'Date'[Calendar Year Month]
), -- Restricting the products visible to the ones
StartingProductsInStatus -- determined in the previous steps
)
VAR CurrentProdAndStatus =
CALCULATETABLE ( -- Here we compute the Status assigned to
FILTER ( -- each product, for each month in the current period
CurrentProdsAndMonths,
[Status] IN CurrentRatings
),
REMOVEFILTERS ( 'Date' )
)
VAR CurrentProducts = -- We want to count products, therefore we remove
DISTINCT ( -- all other columns and only keep the unique
SELECTCOLUMNS ( -- values in ProductKey
CurrentProdAndStatus,
"ProductKey", 'Product'[ProductKey]
)
)
VAR Result =
COUNTROWS ( CurrentProducts )
RETURN
Result

As you see, this code is not trivial at all. Changing it to make it fit your specific needs requires a deep understanding of its inner workings.

The dynamic transition matrix, albeit very powerful, is extremely demanding on CPU and RAM. Its speed mainly depends on the number of products. In data models with hundreds of thousands of products, it is unlikely to be usable. On the other hand, on smaller models it works just fine, though the static transition matrix displays much better performance.

]]>
https://www.daxpatterns.com/transition-matrix/feed/0Like-for-like comparison
https://www.daxpatterns.com/like-for-like-comparison/
https://www.daxpatterns.com/like-for-like-comparison/#respondMon, 10 Aug 2020 08:45:34 +0000http://www.sqlbi.com/daxpatterns/?p=6178Read more]]>The like-for-like sales comparison is an adjusted metric that compares two time periods, restricting the comparison to products or stores with the same characteristics. In this example, we use the like-for-like technique to compare the sales of Contoso stores that had sales in all the time periods considered. The stores are continuously updated: new stores are opened, other stores are closed or renovated. The like-for-like comparison only evaluates those stores that were open in all the periods considered. This way, the report does not show a store that seems to be underperforming simply because it was closed during the period analyzed.

As is the case with many other patterns, like-for-like can be computed statically or dynamically. The choice is both in terms of performance and in terms of business requirements. The variations of the “Same store sales” measure described in the following paragraphs are examples of like-for-like sales comparisons.

Introduction

If you analyze sales figures without considering whether stores were open or closed within the time period you are analyzing, looking at the following report might mislead you into thinking that there were issues in 2009 because of the dramatic drop in sales.

In 2009 many stores were closed. Therefore, the numbers reflect a substantial drop in sales due to the lower number of open stores, as you can see in the following report that shows which stores were open in different years. A blank cell means that the store was closed in that particular year.

In the “same store sales” measure, you must compute the sales amount just for the stores that were open during the entire time period (2007-2009), namely three stores.

The measure must compute the correct value even when sliced by different attributes, as shown in Figure 4.

Same store sales with snapshot

The best method to solve the same store sales scenario is to use a snapshot table to manage store statuses. Later in this pattern we also demonstrate how to compute same store sales in a dynamic way without a snapshot table. Nevertheless, the snapshot table is the best option for both performance and manageability.

The snapshot table must contain all the stores and years, with an additional column indicating the status.

The StoreStatus snapshot table can be created with the following calculated table:

Calculated table

StoreStatus =
VAR AllStores =
CROSSJOIN (
FILTER (
ALLNOBLANKROW ( 'Date'[Calendar Year Number] ),
'Date'[Calendar Year Number] IN { 2007, 2008, 2009 }
),
ALLNOBLANKROW ( Store[StoreKey] )
)
VAR OpenStores =
SUMMARIZE (
Receipts,
'Date'[Calendar Year Number],
Receipts[StoreKey]
)
VAR Result =
UNION (
ADDCOLUMNS ( OpenStores, "Status", "Open" ),
ADDCOLUMNS ( EXCEPT ( AllStores, OpenStores ), "Status", "Closed" )
)
RETURN
Result

The StoreStatus snapshot table has a granularity by store and year. Therefore, it has a regular strong relationship with the Store table and a weak Many-Many-Relationship (MMR) with the Date table. If weak relationships are not available in your tool – like in Power Pivot – then you must transfer the filter from Date to Store in DAX using TREATAS or INTERSECT.

The Same Store Sales measure checks the stores whose status is always “Open” during the entire selected period. If a store is “Closed” at any point, then SELECTEDVALUE returns either blank or “Closed”, filtering out that store:

Measure in the Receipts table

Same Store Sales :=
VAR OpenStores =
CALCULATETABLE (
FILTER (
ALLSELECTED ( StoreStatus[StoreKey] ), -- Filter the stores
CALCULATE ( -- where the Status is
SELECTEDVALUE ( StoreStatus[Status] ) -- always OPEN
) = "Open" --
), --
ALLSELECTED ( 'Date' ) -- Over all selected years
)
VAR FilterOpenStores =
TREATAS ( -- Use OpenStores to filter
OpenStores, -- Store[StoreKey]
Store[StoreKey] -- by changing its data lineage
)
VAR Result =
CALCULATE (
[Sales Amount],
KEEPFILTERS ( FilterOpenStores )
)
RETURN
Result

The formula requires the snapshot table to contain the rows for all the years and stores. If you store in the snapshot table only the years when a store was open, then the code no longer works.

Same store sales without snapshot

In case you do not have the option of building a snapshot table, same store sales can be computed in a more dynamic way using only DAX code.

If the snapshot table is not available, then you must compute the number of years of the report dynamically, and then filter all the stores that have sales in all the years. In other words, if the report is showing three years, then only the stores that have sales in all three years should survive the filter. If a store does not have sales in any one of the selected years, then that store will not be considered for the calculation:

Measure in the Receipts table

Same Store Sales Dynamic :=
VAR NumberOfYears =
CALCULATE (
DISTINCTCOUNT ( 'Date'[Calendar Year] ),
CROSSFILTER ( Receipts[Sale Date], 'Date'[Date], BOTH ),
ALLSELECTED ( )
)
VAR StoresAndYears =
CALCULATETABLE (
SUMMARIZE ( -- Group the Receipts table
Receipts, -- by store and year
Store[StoreKey], -- in order to count how
'Date'[Calendar Year] -- many years a store is present in
), --
ALLSELECTED ( ) -- Over all selected years and stores
)
VAR StoresAndYearCount =
GROUPBY (
StoresAndYears,
Store[StoreKey],
"@Years", SUMX ( CURRENTGROUP (), 1 )
)
VAR OpenStores =
FILTER (
StoresAndYearCount,
[@Years] = NumberOfYears
)
VAR Result =
CALCULATE (
[Sales Amount],
KEEPFILTERS ( OpenStores ) -- Filters Store[StoreKey]
)
RETURN
Result

From a computation perspective, this formula is much more expensive than the one using the snapshot. Besides, the entire logic to determine whether a store is open or closed lies inside the formula. In our experience, such business logic is better handled outside of DAX, possibly stored in the data source. Therefore, if you do not have that information available in the data source we suggest the implementation using the snapshot – even for smaller data models.

The Same Store Sales Dynamic measure shows three stores that were open in Canada for the entire time period (2007-2009).

]]>
https://www.daxpatterns.com/like-for-like-comparison/feed/0Events in progress
https://www.daxpatterns.com/events-in-progress/
https://www.daxpatterns.com/events-in-progress/#respondMon, 10 Aug 2020 08:43:22 +0000http://www.sqlbi.com/daxpatterns/?p=6166Read more]]>The Events in Progress pattern has a broad field of application. It is useful whenever dealing with events with a duration – events that have a start date and an end date. The event is considered to be in progress between the two dates. As an example, we use Contoso orders.

Each order has an order date and a delivery date. The date when the order was placed is considered the start date, and the date when the order was delivered is considered the end date. An order is considered open when it has been placed but has not yet been delivered. We are interested in counting how many orders are open at a certain date, and what their value is.

As with many of the patterns described, Events in Progress can be handled both dynamically through measures, or statically by using a snapshot table.

Definition of events in progress

You need to compute how many orders are open at a specific time, for Contoso. In Figure 1 you can see the result of the calculation at the day level; on each day the number of open orders is the number open orders from the previous day, plus the orders received and minus the orders delivered that day. EOP stands for End Of Period.

However, this way of computing the number of open orders could be ambiguous when we consider a period of several days, such as a month or a year. The ambiguity is explained below. To avoid this ambiguity, it is important to clearly define the desired result. When looking at a single day, the number of orders open is evident, as you can see in Figure 2.

In Figure 2 only orders number 2 and 5 are open at the date considered (October 15, 2019). Order 1 is already delivered, whereas orders 3 and 4 are yet to be received by Contoso. Therefore, the calculation is clearly defined. Nevertheless, when you report on a larger period of time, like a month, the calculation is harder to define. Look at Figure 3 where the time duration is much larger, including the full month of October.

In Figure 3, order 1 is completed before the beginning of October, and order 4 is yet to be received by Contoso after the end of October. Therefore, their status is obvious. However, order 2 is open at the beginning of the month, but it is closed at the end. Order 3 is opened during the month and still open at the end of the month. Order 5 is received and closed during the month. As you see, a calculation that is straightforward on an individual day requires a better definition at an aggregate level.

We do not want to provide an extensive description of every possible option. In this pattern we only consider the following three definitions for the orders in a period longer than one day – each measure is identified with a suffix from the list:

ALL: Returns the orders that were open at any time during the period. For Figure 3, we report three orders; we consider order 5 as open because it has been open for some time during the period considered.

EOP: Considers the status of each order at the end of the period. For Figure 3, this means reporting only one order (order 3), because all the other orders are either closed or not yet opened at the end of the period.

AVG: Computes the daily average of the orders open in the period. This requires computing the number of open orders day by day, and then averaging it over longer periods.

There might be different definitions of open orders, which usually are slight variations of the three scenarios described above.

Open orders

If the Orders table stores data at the correct granularity – storing one row for each order along with order date and delivery date – then the model looks like the one in Figure 4.

The DAX code computing the open orders for this model is rather simple:

Measure in the Orders table

# Open Orders ALL :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR MaxDate = MAX ( 'Date'[Date] )
VAR Result =
CALCULATE (
COUNTROWS ( Orders ),
Orders[Order Date] <= MaxDate,
Orders[Deliver Date] > MinDate,
REMOVEFILTERS ( 'Date' )
)
RETURN
Result

It is worth noting that REMOVEFILTERS is required, in order to remove any report filters that may be affecting the Date table.

Based on this measure, you can compute the two variations (end of period and average) using the following formulas:

Measure in the Orders table

# Open Orders EOP :=
CALCULATE (
[# Open Orders ALL],
LASTDATE ( 'Date'[Date] )
)

Measure in the Orders table

# Open Orders AVG :=
AVERAGEX (
'Date',
[# Open Orders ALL]
)

You can see the result of these formulas in Figure 5.

The Orders table might have more than one row for each order. If the Orders table has one row for each line in the order instead of one row for each order, then you should use DISTINCTCOUNT over a column containing a unique identifier for each order – instead of using COUNTROWS over the Orders table. This is not the case in our sample model, but in that scenario the # Open Orders ALL formula would differ by just one line:

Measure in the Orders table

# Open Orders ALL :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR MaxDate = MAX ( 'Date'[Date] )
VAR Result =
CALCULATE ( -- If any order can have several rows
DISTINCTCOUNT ( Orders[Order Number] ), -- use DISTINCTCOUNT instead of COUNTROWS
Orders[Order Date] <= MaxDate,
Orders[Deliver Date] > MinDate,
REMOVEFILTERS ( 'Date' )
)
RETURN
Result

If you want to compute the dollar value of the open orders, you use the Sales Amount measure instead of the COUNTROWS or DISTINCTCOUNT functions. For example, this is the definition of the Open Amount ALL measure:

Measure in the Orders table

Open Amount ALL :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR MaxDate = MAX ( 'Date'[Date] )
VAR Result =
CALCULATE (
[Sales Amount], -- Use Sales Amount instead of DISTINCTCOUNT or COUNTROWS
Orders[Order Date] <= MaxDate,
Orders[Deliver Date] > MinDate,
REMOVEFILTERS ( 'Date' )
)
RETURN
Result

The other two measures just reference the underlying Open Amount ALL measure instead of # Open Orders ALL:

The Figure 6 shows the results of the measures defined above.

The formulas described in this section work well on small datasets, but they require a big effort from the Formula Engine; this results in reduced performance starting from medium-sized databases and above – think hundreds of thousands of orders. If you need better performance, using a snapshot table is a very good option.

Open orders with snapshot

Building a snapshot simplifies the calculation and speeds up the performance. A daily snapshot contains one row per day and order that is open on that day. Therefore, a single order that has been open for 10 days requires 10 rows in the snapshot.

In Figure 7 you can see an excerpt from the snapshot table.

For large data models where the snapshot requires tens of millions of rows, it is suggested to use specific ETLs or queries in SQL to get the snapshot result. For smaller data models, the snapshot table can be created by using either Power Query or DAX. For example, the snapshot of our sample model can be created using the following definition of the Open Orders calculated table:

In Figure 8 you can see that the Open Orders snapshot has a set of regular relationships with the other tables in the model.

Using the snapshot, the formulas that compute the number of open orders are faster and simpler to write:

Measure in the Orders table

# Open Orders ALL :=
DISTINCTCOUNT ( 'Open Orders'[Order Number] )

Measure (hidden) in the Orders table

# Rows Open Orders :=
COUNTROWS ( 'Open Orders' )

Measure in the Orders table

# Open Orders EOP :=
CALCULATE (
[# Rows Open Orders],
LASTDATE ( 'Date'[Date] )
)

Measure in the Orders table

# Open Orders AVG :=
AVERAGEX (
'Date',
[# Rows Open Orders]
)

Only the # Open Orders ALL measure requires the DISTINCTCOUNT function. The other two measures, # Open Orders EOP and # Open Orders AVG count the number of open orders one day at a time, which can be done using a faster COUNTROWS over the snapshot table.

The Open Amount ALL measure requires you to apply the list of open orders as a filter to the Orders table. This is achieved using TREATAS:

Measure in the Orders table

Open Amount ALL :=
VAR OpenOrders =
DISTINCT ( 'Open Orders'[Order Number] )
VAR FilterOpenOrders =
TREATAS (
OpenOrders,
Orders[Order Number]
)
VAR Result =
CALCULATE (
[Sales Amount],
FilterOpenOrders,
REMOVEFILTERS ( Orders )
)
RETURN Result

The Open Amount EOP and Open Amount AVG measures just reference the underlying Open Amount ALL measure instead of # Open Orders ALL:

The size of the snapshot depends on the number of orders and on the average duration of an order. If an order typically stays open for a few days, then it is fine to use a daily granularity. If an order is usually active for a much longer period of time (think years) then you should consider moving the snapshot granularity to the month level – one row for each order open in each month.

A possible optimization requires an inactive relationship between Orders and Open Orders. This is only useful when the Orders table has one row per order – the Orders[Order Number] column is thus unique and the relationship has a one-to-many cardinality. The inactive relationship should be like the one highlighted in Figure 9. Do not use this technique with a many-to-many cardinality because the pure DAX approach based on TREATAS would be similar in performance and simpler to manage.

Then you should replace the previous Open Amount ALL measure with the code of the Open Amount ALL optimized measure defined as follows:

Leveraging the relationship to transfer the filter reduces the workload of the Formula Engine and improves the performance of all the measures based on Open Amount ALL. You should just consider the side effects of USERELATIONSHIP in different models, applying the required CROSSFILTER to remove possible ambiguities. Usually it should be enough to disable the relationship between Orders and Date like in this example, but carefully check the accuracy of the results by comparing the optimized measure with the one based on TREATAS.

]]>
https://www.daxpatterns.com/events-in-progress/feed/0Parent-child hierarchies
https://www.daxpatterns.com/parent-child-hierarchies/
https://www.daxpatterns.com/parent-child-hierarchies/#respondMon, 10 Aug 2020 08:42:32 +0000http://www.sqlbi.com/daxpatterns/?p=6146Read more]]>Parent-child hierarchies are often used to represent charts of accounts, stores, salespersons and such. Parent-child hierarchies have a peculiar way of storing the hierarchy in the sense that they have a variable depth. In this pattern we show how to use parent-child hierarchies to show budget, actual and forecast values in a report using both a chart of accounts and a geographic hierarchy.

Introduction

In the parent-child pattern the hierarchy is not defined by the presence of columns in the table of the original data source. The hierarchy is based on a structure where each node of the hierarchy is related to the key of its parent node. For example, Figure 1 shows the first few rows of a parent-child hierarchy that defines a geographic structure for sales.

Based on this data structure, we need to display a hierarchy showing Contoso United States under Contoso North America, as shown in Figure 2.

The parent-child pattern implements some sort of self-join of the table containing the entities, which is not supported in Tabular. Because of their nature, parent-child hierarchies may also have a variable depth: the number of levels traversing the hierarchy top to bottom can be different depending on the navigated path. For these reasons, a parent-child hierarchy should be implemented following the technique described in this pattern.

Parent-child hierarchies are often used with charts of accounts. In this case, the nodes also define the sign to use to aggregate a value to its parent. The chart of accounts in Figure 3 shows expenses that are subtracted from the total – despite the numbers displayed being all positive – whereas incomes are added.

The DAX expressions aggregating data over a parent-child hierarchy must consider the sign used to aggregate data at lower level of a hierarchy node.

Basic parent-child pattern

Neither hierarchies of variable depth nor self-joins are directly supported in a Tabular model. The first step in handling parent-child hierarchies is to flatten the hierarchical structure to a regular hierarchy made up of one column for each possible level of the hierarchy. We must move from the data structure of Figure 4 to that of Figure 5. In Figure 4 we only have the three columns required to define a parent-child hierarchy.

The full expansion of the parent-child hierarchy in this example requires four levels. Figure 5 shows that there is one column for each level of the hierarchy, named Level1 to Level4. The number of columns required depends on the data, so it is possible to add additional levels to accommodate for future changes in the data.

The first step is to create a technical column called EntityPath by using the PATH function:

The EntityPath column contains the full path to reach the node corresponding to the row of the table, as shown in Figure 6. This technical column is useful to define the Level columns.

The code for all the Level columns is similar, and only differs in the value assigned to the LevelNumber variable. This is the code for the Level1 column:

Calculated column in the Entity table

Level1 =
VAR LevelNumber = 1
VAR LevelKey = PATHITEM ( Entity[EntityPath], LevelNumber, INTEGER )
VAR LevelName = LOOKUPVALUE ( Entity[EntityName], Entity[EntityKey], LevelKey )
VAR Result = LevelName
RETURN
Result

The other columns have a different name and a different value assigned to LevelNumber, corresponding to the relative position of their level in the hierarchy. Once all the Level columns are defined, we hide them and create a regular hierarchy in the table that includes all of them – all the Level columns. Only exposing these columns through a hierarchy is important in order to make sure they are used in properly by the user navigating a report.

If used straight in a report, the hierarchy still does not provide an optimal result. Indeed, all the levels are always shown, even though they might contain no value. Figure 7 shows a blank row under Contoso Asia Online Store, even though the Level4 column for that node is blank – thus meaning that the node can be expanded only three levels, not four.

To hide the unwanted rows, for each row we must check whether the current level is available by the visited node. This can be accomplished by checking the depth of each node. We need a calculated column in the hierarchy table containing the depth of the node defined by each row:

Calculated column in the Entity table

Depth =
PATHLENGTH ( Entity[EntityPath] )

We need two measures: EntityRowDepth returns the maximum depth of the current node, whereas EntityBrowseDepth returns the current depth of the matrix by leveraging the ISINSCOPE function:

Finally, we use these two measures to blank out the result if the EntityRowDepth is greater than the browsing depth:

Measure in the StrategyPlan table

Sum Amount :=
SUM ( StrategyPlan[Amount] )

Measure in the StrategyPlan table

Total Base :=
VAR Val = [Sum Amount]
VAR EntityShowRow =
[EntityBrowseDepth] <= [EntityRowDepth]
VAR Result =
IF ( EntityShowRow, Val )
RETURN
Result

The report obtained by using the Total Base measure no longer contains rows with an empty description, as shown in Figure 8.

The same pattern must be applied to any measure that could be reported by using the parent-child hierarchy.

Chart of accounts hierarchy

The Chart of accounts pattern is a variation of the basic Parent-child hierarchy pattern, where the hierarchy is also used to drive the calculations. Each row in the hierarchy is tagged as either Income, Expense or Taxation. Incomes need to be summed, whereas expenses and taxation must be subtracted from the total. The Figure 9 shows the content of the table containing the hierarchy items.

The implementation is similar to the parent-child pattern, grouping the calculation by AccountType and applying the proper sign to the calculation depending on the value of AccountType:

Measure in the StrategyPlan table

Total :=
VAR Val =
SUMX (
SUMMARIZE ( StrategyPlan, Account[AccountType] ),
VAR SignToUse =
IF ( Account[AccountType] = "Income", +1, -1 )
VAR Amount = [Sum Amount]
RETURN
Amount * SignToUse
)
VAR AccountShowRow = [AccountBrowseDepth] <= [AccountRowDepth]
VAR EntityShowRow = [EntityBrowseDepth] <= [EntityRowDepth]
VAR Result =
IF ( AccountShowRow && EntityShowRow, Val )
RETURN
Result

The Total measure can use both parent-child hierarchies: the hierarchy defined in the Entity table – shown in the previous example – and the hierarchy defined in the Account table, which is the subject of this section.

The formula in Total returns the right result for each node of the hierarchy. However, in these types of reports it is commonly requested that the numbers be shown as positive despite being expenses. The requirement can be fulfilled by changing the sign of the result at the report level. The following Total No Signs measure implements the calculation this way: It first determines the sign to use for the report, and then it changes the sign of the result in order to show expenses as positive numbers, even though they are internally managed as negative numbers:

Measure in the StrategyPlan table

Total No Signs =
VAR BrowseLevel = [AccountBrowseDepth]
VAR AccountName =
SWITCH (
BrowseLevel,
1, SELECTEDVALUE ( Account[Level1] ),
2, SELECTEDVALUE ( Account[Level2] ),
3, SELECTEDVALUE ( Account[Level3] ),
4, SELECTEDVALUE ( Account[Level4] ),
5, SELECTEDVALUE ( Account[Level5] ),
6, SELECTEDVALUE ( Account[Level6] ),
7, SELECTEDVALUE ( Account[Level7] )
)
VAR AccountType =
LOOKUPVALUE ( Account[AccountType], Account[AccountName], AccountName )
VAR ValueToShow = [Total]
VAR Result =
IF ( AccountType IN { "Expense", "Taxation" }, -1, +1 ) * ValueToShow
RETURN
Result

The report obtained using Total No Signs is visible in Figure 10.

The pattern shown above works fine if the chart of accounts contains the AccountType column, which defines each item as being either an income or an expense. Sometimes the chart of accounts has a different way of defining the sign to use. For example, there could be a column defining the sign to use when aggregating an account to its parent. This is the case of the Operator column shown in Figure 11.

In this case, the code to author is more complex. We need one column for each level of the hierarchy, stating how that account needs to be shown when aggregated at any given level of the hierarchy. A single account can be aggregated at one level with a plus, but at a different level with a minus.

These columns need to be built from the bottom of the hierarchy. In this example we need seven columns because there are seven levels. The column indicates the sign to use when aggregating that specific item of the hierarchy at the desired level. Figure 12 shows the result of the seven columns in this example.

For instance, examine the rows with AccountKey 4 and 5: account 4 (Sale Revenue) must be summed when aggregated at levels 1, 2, 3 and 4, whereas it is not visible at other levels. Account 5 (Cost of Goods Sold) must be summed when aggregated at level 4, but it must be subtracted when aggregated at levels 1, 2, and 3.

The DAX formula computing the sign at each level starts from the most granular level –level 7 in our example. At this most granular level, the sign to use is just the operator converted into +1 or -1, for convenience in further calculations:

Calculated column in the Account table

SignToLevel7 =
VAR LevelNumber = 7
VAR Depth = Account[Depth]
RETURN
IF ( LevelNumber = Depth, IF ( Account[Operator] = "-", -1, +1 ) )

All the other columns (from level 1 to level 6) follow a similar pattern, though for each level the DAX expression must consider the sign at the more granular, adjacent level (stored in the PrevSign variable) and invert the result when that level shows a “-“ sign, as shown in the column for level 6:

Calculated column in the Account table

SignToLevel6 =
VAR LevelNumber = 6
VAR PrevSign = Account[SignToLevel7]
VAR Depth = Account[Depth]
VAR LevelKey =
PATHITEM ( Account[AccountPath], LevelNumber, INTEGER )
VAR LevelSign =
LOOKUPVALUE ( Account[Operator], Account[AccountKey], LevelKey )
RETURN
IF (
LevelNumber = Depth,
IF ( Account[Operator] = "-", -1, +1 ),
IF ( LevelNumber < Depth, IF ( LevelSign = "-", -1, +1 ) * PrevSign )
)

Once the level columns are ready, the Signed Total measure computing the total with custom signs is the following:

We can compare the result of this last Signed Total measure with that of the previous Total measure in Figure 13.

The amount for “Internet” is negative in Total, because it is an expense. However, in Signed Total the same row holds a positive number and it becomes negative only when it traverses the Expense node, which is aggregated to the parent with a minus sign.

Security pattern for a parent-child hierarchy

A common security requirement for parent-child hierarchies is to restrict the visibility to a node (or a set of nodes) including all of its children. In that scenario, the PATHCONTAINS function is useful.

By applying the following expression to a security role on the Account table, we limit the visibility to the node provided in the second argument of PATHCONTAINS. This way, all the children of the node are made visible to the user, because the node requested (2, corresponding to Income) is also part of the AccountPath value of all the children nodes:

PATHCONTAINS (
Account[AccountPath],
2 -- Key of Income
)

If we used the AccountKey column to limit the visibility, we would end up limiting the visibility to only one row and the user would not see the children nodes. By leveraging the path column, we can easily select multiple rows by including all the nodes that can be reached when traversing a path that includes the filtered node.

When the security role is active, the user can only see the nodes (and the values) included in the tree starting from the Income node, as shown in Figure 14.

The nodes above the Income node (Level3) no longer consider other children nodes in the Total measure. In case this is misleading in the report, consider removing the initial levels from the report (in this case Level1 and Level2) or using different descriptions of the nodes in Level1 and Level2 in order to better explain the result.

It is worth noting that the security role defined by using PATHCONTAINS may slow down the performance if used with a hierarchy with thousands of nodes. The expression in the role security must be evaluated for every node of the hierarchy when the end user opens a connection, and PATHCONTAINS can be expensive if it is applied to thousands of rows or more.

]]>
https://www.daxpatterns.com/parent-child-hierarchies/feed/0Hierarchies
https://www.daxpatterns.com/hierarchies/
https://www.daxpatterns.com/hierarchies/#respondMon, 10 Aug 2020 08:39:33 +0000http://www.sqlbi.com/daxpatterns/?p=6140Read more]]>Hierarchies are often created in data models to simplify the browsing of the model by providing users with suggested paths of navigation through attributes. The definition of the hierarchies follows the requirements of the model. For example, the Date table usually contains a hierarchy with levels like year, quarter, month, and day. Similarly, the Product table usually includes a common hierarchy like Category, Subcategory and Product.

Hierarchies make it possible to insert multiple columns at once in a report, but hierarchies are also useful to drive calculations. For example, a measure can show sales as a percentage over the parent of the current level of the hierarchy. Any other calculation can use the same approach by just customizing the calculation associated to each level of the hierarchy.

Detecting the current level of a hierarchy

Any calculation involving hierarchies requires the DAX code to detect the current level of the hierarchy. Therefore, it is important to understand how to detect the level of a hierarchy where a measure is being evaluated. Figure 1 shows the Product Level measure whose only goal is to detect the hierarchy level being browsed. The Product Level measure is usually hidden in the model because it is only used in other measures and implements a calculation related to the hierarchy level.

The Product Level measure is defined as follows:

Measure (hidden) in the Product table

Product Level :=
VAR IsProductInScope = ISINSCOPE ( 'Product'[Product Name] )
VAR IsSubcatInScope = ISINSCOPE ( 'Product'[Subcategory] )
VAR IsCatInScope = ISINSCOPE ( 'Product'[Category] )
VAR Result =
SWITCH (
TRUE (),
IsProductInScope, "Product",
IsSubcatInScope, "Subcategory",
IsCatInScope, "Category",
"No filter"
)
RETURN
Result

By using ISINSCOPE, the three variables IsProductInScope, IsSubcatInScope, and IsCatInScope check whether each level of the hierarchy is currently being grouped by. In that case, the corresponding column has a single value visible in the filter context.

The SWITCH statement detects the level by looking for the first level visible starting from the more granular one. The order of the conditions in SWITCH is relevant. Indeed, when the product is in scope, both category and subcategory are in scope too. Therefore, the measure must check the most restrictive filter first. The evaluation of the active level must always start from the lowest level of the hierarchy, and move up one step at a time.

The Product Level measure is of no use by itself. The technique used in the measure is frequently used to implement a calculation depending on the current level of the hierarchy. We use this measure as a convenient way to detect the hierarchy level in the measures described further in this pattern.

NOTE When ISINSCOPE is not available, ISFILTERED can be used as an alternative technique – this is the case in Excel up to version 2019. However, by using ISFILTERED, the DAX expression operating over hierarchies must assume that the levels beyond the top-level of the hierarchy displayed in a visualization are not filtered outside of the visualization itself – that is, they should not be used in slicers, filters, or selected in other visuals. In order to prevent the user from doing that, if ISINSCOPE is not available it is a best practice to create a hierarchy using only hidden columns – this means duplicating the columns used in levels of a hierarchy so that they are also available as separate filters and slicers without affecting the DAX calculations over the hierarchy itself.

Percentage of parent node

A common hierarchical calculation shows a measure as a percentage over the parent node, as shown in Figure 2.

The % Parent measure detects the level of the hierarchy for the cell being evaluated and uses the value of the parent at the denominator of the ratio:

Measure in the Sales table

% Parent :=
VAR AllSelProds =
ALLSELECTED ( 'Product' )
VAR ProdsInCat =
CALCULATETABLE (
'Product',
AllSelProds,
VALUES ( 'Product'[Category] )
)
VAR ProdsInSub =
CALCULATETABLE (
'Product',
ProdsInCat,
VALUES ( 'Product'[SubCategory] )
)
VAR Numerator = [Sales Amount]
VAR Denominator =
SWITCH (
[Product Level],
"Category", CALCULATE ( [Sales Amount], AllSelProds ),
"Subcategory", CALCULATE ( [Sales Amount], ProdsInCat ),
"Product", CALCULATE ( [Sales Amount], ProdsInSub )
)
VAR Result =
DIVIDE (
Numerator,
Denominator
)
RETURN
Result

]]>
https://www.daxpatterns.com/hierarchies/feed/0Currency conversion
https://www.daxpatterns.com/currency-conversion/
https://www.daxpatterns.com/currency-conversion/#respondMon, 10 Aug 2020 08:36:54 +0000http://www.sqlbi.com/daxpatterns/?p=6125Read more]]>Currency conversion is a complex scenario where both the data model and the quality of the DAX code play an important role. There are two kinds of currencies: the currency used to collect orders and the currency used to produce the report. Indeed, you might collect orders in multiple currencies, but need to report on those orders using only one currency, so to be able to compare all the values with the same unit of measure. Alternatively, you might collect (or store) orders in a single currency, but need to report the values using different currencies. Finally, you might have both orders that are collected in different currencies and reports that need to show many different currencies.

In this pattern, we cover three different scenarios where we simplified the description by only using EUR and USD:

Multiple sources, single target: orders are in both EUR and USD, but the report must convert all currencies into USD.

Single source, multiple targets: orders are only in USD, but the user can choose to see the report in either EUR or USD.

Multiple sources, multiple targets: orders are in both EUR and USD, but the user can choose to see the report in either EUR or USD.

The formulas depend on the currency conversion table available. The requirement is often to perform the currency conversion for each day of the year. Sometimes it is only possible to perform the currency conversion at a different granularity, for example at the month level. The differences in managing these different cases are minimal, and we highlight them when showing the DAX code.

For demo purposes, we created models with both the daily and the monthly currency conversions. Therefore, you find both formulas and models in the same demo file, though you should only use one of the two exchange rate granularities for a specific implementation.

We created the daily currency conversion tables by tweaking the data available in Contoso. Therefore, these examples contain imaginary currency conversion rates with the sole purpose of showing a technique – and no guarantee of accuracy at all.

Multiple source currencies, single reporting currency

In this scenario, the source data contains orders in different currencies, and the report converts values into a single currency. For example, orders are in EUR, USD, and other currencies; the report must convert the order currency to USD.

The first thing to analyze is the model shown in Figure 1.

The Sales table stores the transaction value with the local currency. Every column that contains a monetary amount uses the local currency, like Net Price, Unit Price, and Unit Discount. The Sales table has a relationship with the Currency table that depends on the currency of the transaction.

A simple measure computing the sales amount would only work if sliced by the source currency; indeed, it is not possible to aggregate values in different currencies without performing a currency conversion first. For this reason, we called the measure doing this calculation Sales (Internal), and we also hide this measure from the user:

As shown in Figure 2, Sales (Internal) produces a meaningless total, because it is summing values in different source currencies. Instead, the two measures Sales USD (Monthly) and Sales USD (Daily) produce a result that make sense, because they convert the Sales (Internal) value to USD. The differences in the report between the Sales USD (Monthly) and Sales USD (Daily) measures are due to the fluctuation of the currency exchange rates within each month.

To perform an efficient currency conversion, we aggregate Sales (Internal) at the granularity of the exchange rate for each currency, and then we apply the conversion rate. For example, the Sales USD (Daily) measure implements the calculation at a day granularity by iterating with a SUMX the result of a table that has one row for each date and currency:

Measure in the Sales table

Sales USD (Daily) :=
VAR AggregatedSalesInCurrency =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Date], -- Day granularity
'Currency'[Currency]
),
"@SalesAmountInCurrency", [Sales (Internal)],
"@Rate", CALCULATE (
SELECTEDVALUE ( 'Daily Exchange Rates'[Rate] )
)
)
VAR Result =
SUMX (
AggregatedSalesInCurrency,
[@SalesAmountInCurrency] / [@Rate]
)
RETURN
Result

To achieve optimal performance, it is essential to reduce the number of iterations to retrieve the currency exchange rate. Performing the currency exchange rate for every transaction would be time-consuming because all the transactions made on the same day with the same currency have the same currency exchange rate. SUMMARIZE over Sales significantly reduces the granularity of the entire formula. In case the currency exchange rates are available at the month level, the formula must reduce the granularity to the month level, like Sales USD (Monthly):

Measure in the Sales table

Sales USD (Monthly) :=
VAR AggregatedSalesInCurrency =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Calendar Year Month], -- Month granularity
'Currency'[Currency]
),
"@SalesAmountInCurrency", [Sales (Internal)],
"@Rate", CALCULATE (
SELECTEDVALUE ( 'Monthly Exchange Rates'[Rate] )
)
)
VAR Result =
SUMX (
AggregatedSalesInCurrency,
[@SalesAmountInCurrency] / [@Rate]
)
RETURN
Result

The measures used in this example do not check whether a currency exchange rate is available or not because the operation being performed is a division – which results in a division by zero error in case a rate is missing. An alternative approach is to use the conditional statement in the following examples, which controls the error message displayed if a currency exchange rate is missing. You should use either one of the two techniques that raise an error in case a rate is missing, otherwise the report would show inaccurate numbers without any warning to the user.

Single source currency, multiple reporting currencies

In this scenario, the source data contains orders in a single currency (USD in our example), and the user changes the currency to use in the report through a slicer. The report converts the original amount according to the date of the transaction and to the currency selected by the user.

The model shown in Figure 3 does not show any direct relationship between the Sales and Currency tables. Indeed, all the sales transactions are in USD, and the Currency table allows the user to select the desired report currency.

The user can either choose the desired currency with a slicer, or use the Currency[Currency] column in a matrix as shown in Figure 4, which performs the conversion using the monthly currency exchange rates.

The structure of the formula to obtain the desired result is similar to the previous example, even though its implementation is slightly different because of the data model being different. The Sales (Daily) measure applies a different currency conversion rate for every day:

Measure in the Sales table

Sales (Daily) :=
IF (
HASONEVALUE ( 'Currency'[Currency] ),
VAR AggregatedSalesInUSD =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Date] -- Day granularity
),
"@Rate", CALCULATE ( SELECTEDVALUE ( 'Daily Exchange Rates'[Rate] ) ),
"@USDSalesAmount", [Sales (internal)]
)
VAR Result =
SUMX (
AggregatedSalesInUSD,
IF (
NOT ( ISBLANK ( [@Rate] ) ),
[@USDSalesAmount] * [@Rate],
ERROR ( "Missing conversion rate" )
)
)
RETURN
Result
)

The initial test with HASONEVALUE ensures that only one currency is visible in the current filter context. The AggregatedSalesInUSD variable stores a table with the sales amount in USD and the corresponding currency exchange rate at the day granularity. The @Rate column retrieves the proper exchange rate thanks to the existing filter over Currency[Currency] and the context transition from Date[Date] aggregated by SUMMARIZE. The Result variable gets the final result by summing the result of the product of @Rate by @USDSalesAmount, or raises an error in case @Rate is not available. This breaks the report with an error message that describes the data quality issue (Missing conversion rate).

If the currency exchange rate is only available at the month level, Sales (Monthly) only differs from Sales (Daily) by the argument of SUMMARIZE:

Measure in the Sales table

Sales (Monthly) :=
IF (
HASONEVALUE ( 'Currency'[Currency] ),
VAR AggregatedSalesInUSD =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Calendar Year Month Number] -- Month granularity
),
"@Rate", CALCULATE ( SELECTEDVALUE ( 'Monthly Exchange Rates'[Rate] ) ),
"@USDSalesAmount", [Sales (internal)]
)
VAR Result =
SUMX (
AggregatedSalesInUSD,
IF (
NOT ( ISBLANK ( [@Rate] ) ),
[@USDSalesAmount] * [@Rate],
ERROR ( "Missing conversion rate" )
)
)
RETURN
Result
)

This scenario is a combination of the previous two. The source data contains orders in different currencies, and the user changes the currency to use in the report through a slicer. The report converts the original amount according to the date of the transaction, the original currency, and the reporting currency selected by the user.

There are two currency tables in the data model: Source Currency and Target Currency. The Source Currency table has a relationship with Sales and represents the currency of the transaction. The Target Currency table allows the user to select the desired currency for the report. The model is visible in Figure 5.

This model enables the conversion of any source currency into any target currency. Figure 6 shows orders collected in different currencies from several countries using the monthly currency exchange rates. The report converts the original amount into the currency displayed in the column of the matrix.

The formula of the measure – like the model – is a mix of the two previous ones. The HASONEVALUE function checks that only one target currency is selected. The AggregatedSalesInCurrency variable contains a table with the sales amount aggregated at the available granularity of the currency exchange rate, also including the source currency. The @Rate column fetches the proper exchange rate thanks to the existing filter over ‘Target Currency'[Currency], and thanks to the context transition from Date[Date] and ‘Source Currency'[Currency] aggregated by SUMMARIZE. The Result variable obtains the final result by summing the result of the product of @Rate by @SalesAmount, or raising an error in case @Rate is not available:

Measure in the Sales table

Sales (Daily) :=
IF (
HASONEVALUE ( 'Target Currency'[Currency] ),
VAR AggregatedSalesInCurrency =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Date], -- Day granularity
'Source Currency'[Currency]
),
"@SalesAmount", [Sales (Internal)],
"@Rate", CALCULATE ( SELECTEDVALUE ( 'Daily Exchange Rates'[Rate] ) )
)
VAR Result =
SUMX (
AggregatedSalesInCurrency,
IF (
NOT ( ISBLANK ( [@Rate] ) ),
[@SalesAmount] * [@Rate],
ERROR ( "Missing conversion rate" )
)
)
RETURN
Result
)

As with the previous examples, it is important to use the granularity of the currency exchange table. If the currency exchange rate is only available at the month level, the Sales (Monthly) measure only differs from Sales (Daily) by the argument of SUMMARIZE:

Measure in the Sales table

Sales (Monthly) :=
IF (
HASONEVALUE ( 'Target Currency'[Currency] ),
VAR AggregatedSalesInCurrency =
ADDCOLUMNS (
SUMMARIZE (
Sales,
'Date'[Calendar Year Month], -- Month granularity
'Source Currency'[Currency]
),
"@SalesAmount", [Sales (Internal)],
"@Rate", CALCULATE ( SELECTEDVALUE ( 'Monthly Exchange Rates'[Rate] ) )
)
VAR Result =
SUMX (
AggregatedSalesInCurrency,
IF (
NOT ( ISBLANK ( [@Rate] ) ),
[@SalesAmount] * [@Rate],
ERROR ( "Missing conversion rate" )
)
)
RETURN
Result
)

]]>
https://www.daxpatterns.com/currency-conversion/feed/0New and returning customers
https://www.daxpatterns.com/new-and-returning-customers/
https://www.daxpatterns.com/new-and-returning-customers/#respondMon, 10 Aug 2020 08:31:36 +0000http://www.sqlbi.com/daxpatterns/?p=6102Read more]]>The New and returning customers pattern helps in understanding how many customers in a period are new, returning, lost, or recovered. There are several variations to this pattern, each with different performance and results depending on the requirements. Moreover, it is a very flexible pattern that allows the identification of new and returning customers, or the computation of these customers’ purchase volume – also known as sales amount.

Before using this pattern, you need to clearly define the meaning of new and returning customers, as well as when a customer is lost or recovered. Indeed, depending on the definition you give to these calculations, the formulas are quite different both in their writing and – most important – in performance. Even though you could use the most flexible formula to compute any variation, we would advise you to spend some time experimenting in order to find the best version that fits your needs. The most flexible formula is very expensive from a computational point of view. Therefore, it might be slow even on small datasets.

Introduction

Given a certain time period, you want to compute these formulas:

Customers: the number of customers who made a purchase within that time period.

New customers: the number of customers who made their first purchase within that time period.

Returning customers: the number of customers who have already purchased something in the past, and are returning in that time period.

Lost customers: the number of customers whose last purchase occurred at least 2 months before the start of the current period.

Recovered customers: the number of customers who were considered lost in a previous time period, and then made a purchase in the current period.

The report looks like the one in Figure 1.

As shown in the report, in January 2007 all customers were new. In February, 116 customers were returning and 1,037 were new, for a total of 1,153 customers. In March, 603 customers were lost.

While the measures computing the number of customers and the number of new customers are easy to describe, calculating the number of lost customers is already complex. In the example, let us look at a customer lost two months after their last purchase. Therefore, the number reported (603) is made up of customers who made their last purchase in January. In other words, out of the 1,375 customers in January 2007, 603 did not buy anything in February, March, and the following months; for this reason, we consider them lost at the end of March.

The definition of lost customers may be different in your business. For example, you might define a customer as lost if they made their last purchase two months ago, even though you already know that they will be making another purchase next month. Imagine a customer who bought something in January and April: are they lost at the end of March or not? The answer leads to different formulations of the same calculation. Indeed, we consider the customer as being temporarily lost at the end of March, because we know the same customer will be recovered later. A report counting the temporarily-lost customers (who did not buy anything for two months, but then made a purchase afterwards) is visible in Figure 2.

The number of temporarily-lost customers is higher than the number of lost customers previously shown. The reason is that many of the temporarily-lost customers will buy something in future months. In that case, the report counts them as recovered customers in the month when they make a new purchase.

Another important element to take into account when selecting the right pattern is how you want to look at filters on the report. If the user selects a category of products, how does this filter affect the calculation? Let us say that you filter the Cell Phones category. Do you consider a customer as new the first time they buy a cell phone? If so, then a single customer will be new multiple times, depending on the filter. Otherwise, if you want to consider a customer as new only once, then you need to ignore the filters when computing the number of new customers. Similarly, all the remaining measures might or might not be affected by the filters.

Let us clarify the concept with another example. Figure 3 shows the raw data of a reduced version of Contoso with only three customers.

Considering the data in Figure 3, can you tell when Dale Lal is a new customer, if a user added a filter for Games and Toys? He bought a toy for the first time in April, even though he was already a customer for Cameras and camcorders products. Now focus on Tammy Metha: is she to be considered lost two months after her game purchase in January? She did not buy any other game product, even though she bought products of other categories. Answering these questions is of paramount importance to support your choice of the pattern that will best suit your specific business needs.

Additionally, counting customers is useful, but sometimes you are interested in analyzing the amounts sold to new, returning, and recovered customers. Or you might want to estimate the amount lost because of customer losses, in a report like the one in Figure 4. In the report we used the average sales volumes of our lost customers over the last 12 months, as an estimate for lost sales.

Another important note is to think about how the formulas count the different statuses of a customer inside each time period. For example, if you consider a full year, then it is possible that the same customer is new, temporarily lost, returning, and then permanently lost – all within the same period. On a given day, the status of a customer is well defined. However, throughout longer time frames the same customer can be in different statuses. Our formulas are designed to account for the customer in all their statuses. Figure 5 shows a sample report that only filters and shows one customer: Lal Dale.

The customer is both new and lost in the same year. Lal Dale was a returning customer for a few months, but not at the year level because he was new during the year. In Figure 6 the same report filters out January, thus showing the customer as returning three times within the period, and never showing them as a new customer.

If we were to describe all the possible combinations of measures in this pattern, this alone would require an entire book. Instead, we show some of the most common patterns, leaving to the reader the task of changing the formulas in case their scenario is different from any of the patterns described.

Finally, the New and returning customers pattern requires heavy calculations. Therefore, we present both a dynamic and a snapshot version of the formulas.

Pattern description

The pattern is based on two types of formulas:

Internal formulas: their goal is to compute the relevant dates for a given customer.

External formulas: these are the formulas used in reports. They use the internal formulas to compute the number of customers, the sales amount, or any other measure.

For example, in order to compute the number of new customers, for each customer the internal formula computes the date of their first purchase. The external formula then computes the number of customers whose first purchase happens to fall within the time period currently filtered.

An example is helpful to better understand this technique. Look at Figure 7, which shows the reduced dataset we use in order to explain the different formulas.

Using this data as an example, think about how you can compute the number of new customers in March. The new customers external measure checks how many customers made their first purchase in March. To obtain its result, the external formula queries the internal formulas on a customer-by-customer basis, checking their first purchase. The internal formula returns March 14 for the first purchase of Gerald Suri, whereas the first purchases of the other customers occurred earlier than that. Consequently, the external formula returns 1 as the number of new customers.

Other measures behave the same way, although each comes with peculiarities worthy of a more complete description.

As a first example of code, look at the internal formula that computes the date when a customer must be considered new. Be mindful, each example has different formulas and we provide greater detail on this code in subsequent sections. This first example of DAX is reported here only as an introduction:

Measure (hidden) in the Sales table

Date New Customer :=
CALCULATE (
MIN ( Sales[Order Date] ),
ALLEXCEPT (
Sales,
Sales[CustomerKey],
Customer
)
)

The internal formula is then used by the external formula, which computes the number of customers who are new in the given period:

Measure in the Sales table

# New Customers :=
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first purchase ever
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on customer
ALLSELECTED ( 'Date' ) -- and on date
)
VAR CustomersWithLineage = -- Here we change the data lineage
TREATAS ( -- of the CustomersWithNewDate variable
CustomersWithNewDate, -- so that it will filter the
Customer[CustomerKey], -- Customer table and the
'Date'[Date] -- Date table
)
VAR Result =
CALCULATE (
DISTINCTCOUNT ( Sales[CustomerKey] ), -- Counts the number of customers only
KEEPFILTERS ( CustomersWithLineage ) -- if included in @NewCustomerDate variable
)
RETURN
Result

Using this approach, the pattern is more flexible. Indeed, if you need to change the logic that determines when a customer is to be considered new, lost, or temporarily lost, you only need to update the internal formulas – thus leaving the external formula untouched. Still, we need to raise a big warning for our readers: the formulas shown in this pattern are extremely complex and delicate in the way the filter context is handled. You will certainly need to change them to suit your needs. But do so only after having thoroughly understood their behavior; indeed, each line of DAX in this pattern is the result of hours of thinking and endless tests, as we systematically had to make sure that it was the correct way to write it. In other words, get ready to walk on eggshells with this pattern; we certainly had to!

We organized the patterns in two families: dynamic and snapshot. The dynamic version computes the measures in a dynamic way, considering all the filters of the report. The snapshot version precomputes the values of the internal measures in calculated tables, in order to speed up the calculation of the external measures. Therefore, the snapshot version provides less flexibility, albeit with improved speed.

We also provide three different implementations, depending on how the measure should consider the active filters in the report:

Relative: a customer is considered new the first time they buy one of the products selected in the report.

Absolute: a customer is considered new the first time they buy a product, regardless of any filter present in the report.

By category: a customer is considered new the first time they buy a product from any of the product categories selected in the report. If they buy two products of the same category then they are considered new only once, whereas if they buy two products of different categories then they are considered new twice.

You can find a more complete explanation of the various calculations in the corresponding section of each pattern. Our suggestion is to read the chapter start-to-finish before attempting an implementation on your model. It is better to understand your requirements well before proceeding with the implementation, rather than only finding out at the end that you chose the wrong pattern.

Finally, the demo files of this pattern include two versions: the full version includes the complete database, whereas the base version only includes three customers. The base version is useful to better understand the pattern, because you can easily check the numbers thanks to the limited number of rows in the model. The full version is more useful to evaluate the performance of the different calculations.

Internal measures

There are three internal measures:

Date New Customer: returns the date when the customer is to be considered new.

Date Lost Customer: returns the date when the customer is to be considered permanently lost, checking that there are no sales in following time periods.

Date Temporary Lost Customer: returns the date when the customer might be lost, without checking whether the customer comes back in a following period.

These measures are not intended to be used in reports – they exist only to be used by the external measures. The code of the internal measures is different for each pattern.

External measures

Each pattern defines several measures to count customers and evaluate sales in the various customer states:

# New Customers: counts the number of customers who are new.

# Returning Customers: counts the number of customers who were new in a previous period and made a new purchase within the time period considered.

# Lost Customers: counts the number of customers permanently lost.

# Temporarily Lost Customers: counts the number of customers who are only lost when we look at the current time period, even though they might return in a later period.

# Recovered Customers: counts the number of customers who were temporarily lost and then made a new purchase within the time period considered.

Sales New Customers: returns the value of Sales Amount by filtering only the new customers.

Sales Returning Customers: computes the value of Sales Amount by filtering only the customers who were new in a previous period and made a new purchase in the period considered.

Sales Lost Customers (12M): computes the value of Sales Amount for 12 months prior to the start of the selected time period, filtering only customers permanently lost in the selected period.

Sales Recovered Customers: returns the value of Sales Amount filtering only customers who were previously temporarily lost and who made a new purchase in the period considered.

The code of the external measures is very similar in all the patterns. There are minor variations for some scenarios that are highlighted when we describe the individual patterns.

How to use pattern measures

The formulas presented in the pattern can be grouped into two categories. The measures starting with the # prefix compute the number of unique customers by applying a certain filter. Usually these measures are used as-is and are optimized for this purpose. For example, the following measure returns the number of new customers:

Measure in the Sales table

# New Customers :=
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first purchase ever
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of any local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR CustomersWithLineage = -- Here we change the data lineage
TREATAS ( -- of the CustomersWithNewDate variable
CustomersWithNewDate, -- so that it filters the
Customer[CustomerKey], -- Customer table and the
'Date'[Date] -- Date table
)
VAR Result =
CALCULATE (
DISTINCTCOUNT ( Sales[CustomerKey] ), -- Counts the number of customers only
KEEPFILTERS ( CustomersWithLineage ) -- if included ub @NewCustomerDate variable
)
RETURN
Result

The measures that do not start with the # prefix create a filter of customers that is applied to another measure. For example, the measures with the Sales prefix are measures that apply a filter of customers to the Sales Amount measure. The following measure can be reused to compute other measures by just changing the Sales Amount measure reference in the last CALCULATE function:

Measure in the Sales table

Sales New Customers :=
VAR CustomersWithFirstSale =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first purchase ever
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR NewCustomers =
FILTER (
CustomersWithFirstSale, -- Filters the customers
[@NewCustomerDate] -- whose new customer date
IN VALUES ( 'Date'[Date] ) -- is in the current time period
)
VAR Result =
CALCULATE (
[Sales Amount], -- Computes the Sales Amount measure
KEEPFILTERS ( NewCustomers ) -- by applying the filter for new customers
)
RETURN
Result

In each pattern we show the two measures (with the # and Sales prefixes) when there are differences in the measure structure, even just for performance optimization. If the two measures only differ by the calculation made in the last CALCULATE function, then we only include the # prefix version of the measure.

Dynamic relative

The Dynamic relative pattern takes into account all the filters in the report for the calculation. Therefore, if the report filters one category (Audio, for example), a customer is reported as new the first time they buy a product of the Audio category. Similarly, a customer is considered lost a certain number of days after they last purchased a product of the Audio category. Figure 8 is useful to better understand the behavior of this pattern.

The report only takes one customer into account: Lal Dale. He is reported as new in January, when Cameras and camcorders is selected, and he is also considered new in April, for the Games and Toys category. All the other measures behave similarly, by considering the filter where they are evaluated.

Internal measures

The internal measures are the following:

Measure (hidden) in the Sales table

Date New Customer :=
CALCULATE (
MIN ( Sales[Order Date] ), -- The date of the first sale is the MIN of Order Date
REMOVEFILTERS ( 'Date' ) -- at any time in the past
)

Measure (hidden) in the Sales table

Date Lost Customer :=
CALCULATE ( -- The last sale occurs two months after
EOMONTH ( MAX ( Sales[Order Date] ), 2 ), -- the last transaction (end of month)
REMOVEFILTERS ( 'Date' ) -- at any time
)

Measure (hidden) in the Sales table

Date Temporary Lost Customer :=
VAR MaxDate = -- The date of the last sale is the MAX of the Order Date
MAX ( Sales[Order Date] ) -- in the current period (set by the calling measure)
VAR Result =
IF (
NOT ISBLANK ( MaxDate ),
EOMONTH ( MaxDate, 2 ) -- two months later (end of month)
)
RETURN
Result

New customers

The measure that computes the number of new customers is the following:

Measure in the Sales table

# New Customers :=
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first purchase
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR CustomersWithLineage = -- Here we change the data lineage
TREATAS ( -- of the CustomersWithNewDate variable
CustomersWithNewDate, -- so that it filters the
Sales[CustomerKey], -- Customer Key and the
'Date'[Date] -- Date columns in different tables
)
VAR Result =
CALCULATE (
DISTINCTCOUNT ( Sales[CustomerKey] ), -- Counts the number of customers only
KEEPFILTERS ( CustomersWithLineage ) -- if they appear in their @NewCustomerDate
)
RETURN
Result

The code computes the date when each customer is new. ALLSELECTED is useful for optimization purposes: it lets the engine reuse the value of the CustomersWithNewDate variable in multiple executions of the same expression.

Then, in CustomersWithLineage the formula updates the lineage of CustomersWithNewDate to let the variable filter Sales[CustomerKey] and Date[Date]. When used as a filter, CustomersWithLineage makes the customers only visible on dates when they are considered new. The final CALCULATE applies the CustomersWithLineage filter using KEEPFILTERS to intersect with the current filter context. This way the new filter context ignores customers that are not new in the range of dates considered.

In order to apply the new customers as a filter for another measure like Sales Amount we need a slightly different approach, as shown in the following Sales New Customers measure:

Measure in the Sales table

Sales New Customers :=
VAR CustomersWithFirstSale =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first purchase ever
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR NewCustomers =
FILTER (
CustomersWithFirstSale, -- Filters the customers
[@NewCustomerDate] -- where the new customer date
IN VALUES ( 'Date'[Date] ) -- is in the current time period
)
VAR Result =
CALCULATE (
[Sales Amount], -- Evaluates Sales Amount by applying
KEEPFILTERS ( NewCustomers ) -- the filter for new customers
)
RETURN
Result

The NewCustomers variable holds a list of the values in Sales[CustomerKey] corresponding to the new customers, obtained by checking whether the @NewCustomerDate is within the filter context of the current evaluation. The NewCustomers variable obtained this way is then applied as a filter to compute the Sales Amount measure. Even though the variable contains two columns (Sales[CustomerKey] and @NewCustomerDate), the only column actively filtering the model is Sales[CustomerKey], because the newly added column does not share the lineage with any other column in the model.

Lost customers

The measure computing the number of lost customers needs to count customers that are not part of the current filter context. Indeed, in March we might lose a customer who made a purchase in January. Therefore, when filtering March the customer is not visible. The formula must look back at January to find that customer. This is the reason why the structure of the code is different from the New Customers measure:

Measure in the Sales table

# Lost Customers :=
VAR LastDateLost =
CALCULATE (
MAX ( 'Date'[Date] ),
ALLSELECTED ( 'Date' )
)
VAR CustomersWithLostDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date when they are considered lost
"@LostCustomerDate", [Date Lost Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
'Date'[Date] <= LastDateLost -- and on Date
)
VAR LostCustomers =
FILTER (
CustomersWithLostDate, -- Filters the customers
[@LostCustomerDate] -- whose Lost Customer Date
IN VALUES ( 'Date'[Date] ) -- falls within the current time period
)
VAR Result =
COUNTROWS ( LostCustomers ) -- The count of the lost customers does not
-- use the Sales table (no sales in the period)
RETURN
Result

The CustomersWithLostDate variable computes the date of loss for each customer. LostCustomers filters out customers whose date of loss is not in the current period. Eventually, the measure computes the number of customers left by counting the rows in LostCustomers that correspond to the customers whose date of loss falls within the period visible in the current filter context.

Temporarily-lost customers

The measure computing the number of temporarily-lost customers is a major variation of the measure computing the lost customers. The measure must check that in the current context the customer who is potentially lost did not make a purchase prior to the date when they would have been lost. This is the code that implements this calculation:

Measure in the Sales table

# Temporarily Lost Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the temporarily-lost date
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customers without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR PotentialTemporarilyLostCustomers =
FILTER (
CustomersWithLostDate, -- Filters the customers
[@TemporarilyLostCustomerDate] -- whose lost-customer date
IN VALUES ( 'Date'[Date] ) -- falls within the current period
)
VAR ActiveCustomers =
ADDCOLUMNS ( -- Gets the first order date of
VALUES ( Sales[CustomerKey] ), -- customers in the current selection
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
)
VAR TemporarilyLostCustomers =
FILTER ( -- Filters the temporarily-lost
NATURALLEFTOUTERJOIN ( -- customers by combining
PotentialTemporarilyLostCustomers,-- potential lost customers
ActiveCustomers -- and active customers
), -- and then comparing dates
OR (
ISBLANK ( [@MinOrderDate] ),
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
)
VAR Result =
COUNTROWS ( TemporarilyLostCustomers )
RETURN
Result

The measure first computes the potential date of loss of each customer; it applies a filter on the date so that it only considers transactions made before the start of the current time period. Then, it checks which customers have a loss date that falls within the current period.

The resulting table (PotentialTemporarilyLostCustomers) contains the customers that can be potentially lost in the current period. Before returning a result, a final check is required: these customers must not have purchased anything in the current period before the date when they would be considered lost. This validation happens by computing TemporarilyLostCustomers, which checks for each customer whether there are sales in the current period before the date when the customer would be considered lost.

Recovered customers

The number of recovered customers is the number of customers that were temporarily lost before a purchase was made in the current period. It is computed by the following measure:

Measure in the Sales table

# Recovered Customers :=
VAR MinDate =
MIN ( 'Date'[Date] )
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the temporarily-lost date
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customer without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR ActiveCustomers =
ADDCOLUMNS ( -- Gets the first order date of
VALUES ( Sales[CustomerKey] ), -- customers in the current selection
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
)
VAR RecoveredCustomers =
FILTER (
NATURALINNERJOIN ( -- Filters the recovered customers
ActiveCustomers, -- by combining active customers
CustomersWithLostDate -- and temporarily-lost customers
), -- and then comparing dates
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
VAR Result =
COUNTROWS ( RecoveredCustomers )
RETURN
Result

The CustomersWithLostDateComplete variable computes the temporarily-lost date for the customers. Out of this list, the CustomersWithLostDate variable removes the customers who do not have a temporarily-lost date. The ActiveCustomers variable retrieves the first purchase date for the customers in the current selection. The RecoveredCustomers variable filters customers that are in both ActiveCustomers and CustomersWithLostDate lists and have a transaction date greater than the temporarily-lost date.

Finally, the Result variable counts the recovered customers.

Returning customers

The last measure in the set of counting measures is # Returning Customers:

Measure in the Sales table

# Returning Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- their first purchase date
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ), -- Regardless of local filters on Customer
ALLSELECTED ( 'Date' ) -- and on Date
)
VAR ExistingCustomers = -- To get the existing customers,
FILTER ( -- this filters all customers
CustomersWithNewDate, -- and checks that their first purchase took
[@NewCustomerDate] < MinDate -- place before the start of the current period
)
VAR ReturningCustomers = -- Obtains the returning customers
INTERSECT ( -- as the intersection between
VALUES ( Sales[CustomerKey] ), -- the active customers in the selection
SELECTCOLUMNS ( -- and the existing customers
ExistingCustomers,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( ReturningCustomers )
RETURN
Result

The measure first prepares a table in CustomersWithNewDate with the first purchase date for every customer. The ExistingCustomers variable filters out all the customers whose date is not strictly earlier than the start of the currently selected period. What remains in ExistingCustomers is the set of customers who already purchased products before the current period started. Therefore, if those customers also made purchases within the current period, then they are returning customers. This last condition is obtained by combining ExistingCustomers with the customers active in the selected period. The result in the ReturningCustomers variable can be used to count the returning customers – as in this measure – or to filter them in a different calculation.

Dynamic absolute

The Dynamic absolute pattern ignores the filters on the report when computing the relevant dates for the customer. Its implementation is a variation of the basic dynamic relative pattern, with a different set of CALCULATE modifiers to explicitly ignore filters.

The result is an absolute assignment of the status of a customer regardless of report filters, as shown in Figure 9: Dale Lal is considered new in January when Games and Toys is selected, even though he purchased cameras and no games.

The only measure that changes depending on the category is # Customers, which shows when Lal Dale purchased products. All the other measures ignore the filter on the product: customers are new only the first time they make a purchase regardless of the report filter.

Internal measures

The internal measures are the following:

Measure (hidden) in the Sales table

Date New Customer :=
CALCULATE ( -- The first sale is
MIN ( Sales[Order Date] ), -- the MIN of the order date
ALLEXCEPT (
Sales, -- ignoring any filter
Sales[CustomerKey], -- other than the customer
Customer
)
)

Measure (hidden) in the Sales table

Date Lost Customer :=
VAR MaxDate =
CALCULATE ( -- The last sale is the MAX of Order Date
MAX ( Sales[Order Date] ), -- in the current period (set by the calling measure)
ALLEXCEPT (
Sales, -- ignoring any filter
Sales[CustomerKey], -- other than Customer
Customer
)
)
VAR Result =
IF (
NOT ISBLANK ( MaxDate ),
EOMONTH ( MaxDate, 2 ) -- two months later (end of month)
)
RETURN
Result

Measure (hidden) in the Sales table

Date Temporary Lost Customer :=
VAR MaxDate =
CALCULATE ( -- The last sale is the MAX of Order Date
MAX ( Sales[Order Date] ), -- in the current period (set by the calling measure)
ALLEXCEPT (
Sales, -- ignoring any filter
'Date', -- other than Date
Sales[CustomerKey], -- and Customer
Customer
)
)
VAR Result =
IF (
NOT ISBLANK ( MaxDate ),
EOMONTH ( MaxDate, 2 ) -- two months later (end of month)
)
RETURN
Result

As shown in the previous code, the internal measures are designed to ignore all filters other than the ones on Customer – with the noticeable exception of Date Temporary Lost Customer which needs to also consider the filters on Date.

Please note that the internal measures have been designed to behave properly when called from the external measures. This is the reason why ALLEXCEPT explicitly keeps the filter on Sales[CustomerKey] in a somewhat unusual way. If called within an iteration that includes that column, the internal measures do not remove the filter, thereby observing the requirements of the external measure.

New customers

The measure that computes the new customers is the following:

Measure in the Sales table

# New Customers :=
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date of their first puchase
"@NewCustomerDate", [Date New Customer]
),
ALLEXCEPT ( Sales, Customer )
)
VAR NewCustomers =
FILTER (
CustomersWithNewDate, -- Filters the customers
[@NewCustomerDate] -- whose new customer date
IN VALUES ( 'Date'[Date] ) -- falls within the current period
)
VAR Result = -- The count of the new customers
COUNTROWS ( NewCustomers ) -- does not use the Sales table
RETURN
Result

There are two things to note about this measure. First, the filter in the calculation of CustomersWithNewDate uses ALLEXCEPT to ignore any filter apart from the ones on the Customer table. Second, in order to check whether a customer is new, the measure filters the content of CustomersWithNewDate. It then counts the row in the NewCustomers variable, instead of using TREATAS as the corresponding measure in the Dynamic relative pattern. This technique may turn out to be slower than the one used in the Dynamic relative pattern; it is still required because it needs to count a customer even though they might not be visible due to the current filter context.

Lost customers

The measure computing the number of lost customers is the following:

Measure in the Sales table

# Lost Customers :=
VAR LastDateLost =
CALCULATE (
MAX ( 'Date'[Date] ),
ALLSELECTED ( 'Date' )
)
VAR CustomersWithLostDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the date when they are considered lost
"@LostCustomerDate", [Date Lost Customer]
),
ALLEXCEPT ( Sales, Customer ),
'Date'[Date] <= LastDateLost
)
VAR LostCustomers =
FILTER (
CustomersWithLostDate, -- Filters the customers
[@LostCustomerDate] -- whose lost customer date
IN VALUES ( 'Date'[Date] ) -- fall within the current period
)
VAR Result =
COUNTROWS ( LostCustomers ) -- The count of the lost customers does not
-- use the Sales table (no sales in the period)
RETURN
Result

Its structure is close to the New Customer measure, the main difference being in the calculation of the CustomersWithLostDate variable.

Temporarily-lost customers

The measure computing the number of temporarily-lost customers is a variation of the measure computing the lost customers:

Measure in the Sales table

# Temporarily Lost Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the temporarily-lost date
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
), -- ignoring any filter
ALLEXCEPT ( Sales, Customer ) -- other than Customer
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customers without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR PotentialTemporarilyLostCustomers =
FILTER (
CustomersWithLostDate, -- Filter the customers
[@TemporarilyLostCustomerDate] -- where the lost customer date
IN VALUES ( 'Date'[Date] ) -- falls within the current period
)
VAR ActiveCustomers =
CALCULATETABLE (
ADDCOLUMNS ( -- Gets the first order date of
VALUES ( Sales[CustomerKey] ), -- customers in the current selection
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
),
ALLEXCEPT ( Sales, Customer, 'Date' )
)
VAR TemporarilyLostCustomers =
FILTER ( -- Filters the temporarily-lost
NATURALLEFTOUTERJOIN ( -- customers by combining
PotentialTemporarilyLostCustomers,-- potential lost customers
ActiveCustomers -- and active customers
), -- and then by comparing dates
OR (
ISBLANK ( [@MinOrderDate] ),
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
)
VAR Result =
COUNTROWS ( TemporarilyLostCustomers )
RETURN
Result

Its behavior is very close to the corresponding measure in the Dynamic relative pattern. The main differences are the use of ALLEXCEPT in the evaluation of CustomersWithLostDateComplete and ActiveCustomers. In CustomersWithLostDateComplete all the filters other than Customer are removed, whereas in ActiveCustomers the filters are not removed from Date and Customer.

Recovered customers

The number of recovered customers is the number of customers that were temporarily lost before a purchase made in the current period. It is computed by the following measure:

Measure in the Sales table

# Recovered Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- the temporarily-lost date
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
), -- ignoring any filter
ALLEXCEPT ( Sales, Customer ) -- other than Customer
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customer without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR ActiveCustomers =
CALCULATETABLE (
ADDCOLUMNS ( -- Gets the first order date of
VALUES ( Sales[CustomerKey] ), -- customers in the current selection
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
),
ALLEXCEPT ( Sales, Customer, 'Date' )
)
VAR RecoveredCustomers =
FILTER (
NATURALINNERJOIN ( -- Filters the recovered customers
ActiveCustomers, -- by combining active customers
CustomersWithLostDate -- and temporarily-lost customers
), -- then by comparing dates
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
VAR Result =
COUNTROWS ( RecoveredCustomers )
RETURN
Result

Its behavior is very close to the corresponding measure in the Dynamic relative pattern. The main difference is the use of ALLEXCEPT in the evaluation of the CustomersWithLostDateComplete and ActiveCustomer variables to correctly set the required filter.

Returning customers

The last measure in the set of counting ones is the # Returning Customers:

Measure in the Sales table

# Returning Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that
ADDCOLUMNS ( -- for each customer contains
VALUES ( Sales[CustomerKey] ), -- their first sale date
"@NewCustomerDate", [Date New Customer]
), -- ignoring any filter
ALLEXCEPT ( Sales, Customer ) -- other than Customer)
)
VAR ExistingCustomers = -- To compute the existing customers
FILTER ( -- we filter all customers
CustomersWithNewDate, -- and check that their
[@NewCustomerDate] < MinDate -- first sale happened
-- before the current period
)
VAR ActiveCustomers =
CALCULATETABLE (
VALUES ( Sales[CustomerKey] ), -- Gets the active customers
ALLEXCEPT ( Sales, Customer, 'Date' )
)
VAR ReturningCustomers = -- Obtain the returning customers
INTERSECT ( -- as the intersection between
ActiveCustomers, -- the active customers in the selection
SELECTCOLUMNS ( -- and the existing customers
ExistingCustomers,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( ReturningCustomers )
RETURN
Result

Its behavior is very close to the corresponding measure in the Dynamic relative pattern. The main difference is the use of ALLEXCEPT in the evaluation of the CustomersWithNewDate and ActiveCustomers variables, to accurately set the required filter.

Generic dynamic pattern (dynamic by category)

The generic dynamic pattern is an intermediate level between the absolute and the dynamic patterns. The pattern ignores all the filters from the report except for attributes determined by the business logic. In the examples used in this section, the measures are local to each product category. The result is dynamic for product category and absolute for all the other attributes in the data model. For instance, one customer can be new for a product category and a returning customer for another product category within the same month. The same customers might be considered new multiple times if they buy different categories of products over time. In other words, the analysis of new and returning customers is made by product category. You can customize the pattern by replacing product category with one or more other attributes, so that it fits your business logic.

We purposely avoided excessive optimizations when writing the code of this pattern: the primary goal of this set of measures is to make them easier to update. If you plan on modifying the pattern to fit your needs, this set of measures should be a good starting point.

The rules of this pattern are the following:

The same customer might be considered a new customer multiple times, one for each combination of dynamic attributes (product category in the example).

Customers are considered returning customers if they already purchased the same combination of dynamic attributes (product category in the example) they are purchasing in the selected period.

Customers are temporarily lost if they did not purchase a combination of dynamic attributes (product category in the example) for two months, even though they may have purchased different combinations of dynamic attributes (product category in the example) in the meantime.

Customers are considered recovered customers if they make a new purchase of products of the very combination of dynamic attributes (product category in the example) for which they were temporarily lost.

It is important to note that the pattern detects the customers, not the combination of dynamic attributes and customers – like customer and product category in the example. Therefore, the measures with the # prefix always return the number of unique customers, whereas the measures with the Sales prefix always evaluate the Sales Amount measure regardless of the combination of dynamic attributes (product category in the example) for which a customer is considered new/lost/recovered. The difference is visible by filtering two or more combinations of the dynamic attributes. For example, by filtering two product categories, the Sales measures for new and returning customers could add up to more than the value of Sales Amount; indeed, the same amount can be computed considering the same customer both new and returning, because of their having different states for different categories.

Your requirements might be different from those assumed in this example. In that case, as we already stated in the introduction, you need to very carefully understand the filtering happening in all the measures before implementing any change. These measures are quite complex and easy to break with small changes.

Internal measures

The internal measures are the following:

Measure (hidden) in the Sales table

Date New Customer :=
CALCULATE (
MIN ( Sales[Order Date] ), -- The first sale is the MIN of Order Date
ALLEXCEPT (
Sales, -- ignoring filters
Sales[CustomerKey], -- other than Customer
Customer,
'Product'[Category] -- and Product Category
)
)

Measure (hidden) in the Sales table

Date Lost Customer :=
VAR MaxDate =
CALCULATE ( -- The last sale is the MAX of Order Date in the
MAX ( Sales[Order Date] ), -- current time period (set by the calling measure)
ALLEXCEPT (
Sales, -- ignoring any filter
Sales[CustomerKey], -- other than Customer
Customer,
'Product'[Category] -- and Product Category
)
)
VAR Result =
IF (
NOT ISBLANK ( MaxDate ),
EOMONTH ( MaxDate, 2 ) -- two months later (end of month)
)
RETURN
Result

Measure (hidden) in the Sales table

Date Temporary Lost Customer :=
VAR MaxDate =
CALCULATE ( -- The last sale is the MAX of Order Date
MAX ( Sales[Order Date] ), -- in the current period (set by the calling measure)
ALLEXCEPT (
Sales, -- ignoring any filter
'Date', -- other than Date
Sales[CustomerKey], -- and Customer
Customer,
'Product'[Category] -- and product category
)
)
VAR Result =
IF (
NOT ISBLANK ( MaxDate ),
EOMONTH ( MaxDate, 2 ) -- two months later (end of month)
)
RETURN
Result

As shown in this code, the internal measures are designed to ignore filters other than the ones on Customer and Product[Category].

New customers

The measure that computes the new customers is the following:

Measure in the Sales table

# New Customers :=
VAR FilterCategories =
CALCULATETABLE (
VALUES ( 'Product'[Category] ),
ALLSELECTED ( 'Product' )
)
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that contains
ADDCOLUMNS ( -- for each customer and category,
SUMMARIZE ( -- the date of their first purchase
Sales,
Sales[CustomerKey],
'Product'[Category]
),
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ),
FilterCategories, -- Filter Product Category from ALLSELECTED
ALLEXCEPT ( -- Removes any filter other than
Sales, -- Customer retrieved by ALLSELECTED so that
Sales[CustomerKey], -- the result is unchanged in
Customer -- different cells of the report
)
)
VAR CustomersCategoryNewDate =
TREATAS (
CustomersWithNewDate, -- Changes the data lineage so that
Sales[CustomerKey], -- NewCustomerDate maps Date[Date]
'Product'[Category], -- and can be used to join or filter
'Date'[Date] -- that same column in the model
)
VAR ActiveCustomersCategories =
CALCULATETABLE (
SUMMARIZE ( -- Retrieves combinations of
Sales, -- Customer, Category, and Date
Sales[CustomerKey], -- active in the current selection
'Product'[Category],
'Date'[Date]
),
ALLEXCEPT ( -- Removes any filter other than Date and
Sales, -- Customer retrieved by ALLSELECTED so that
'Date', -- the result is unchanged in
Sales[CustomerKey], -- different cells of the report
Customer
),
VALUES ( 'Product'[Category] ) -- Restore related Product[Category] filter
)
VAR ActiveNewCustomers =
NATURALINNERJOIN ( -- Filters the customers
CustomersCategoryNewDate, -- within the current selection
ActiveCustomersCategories -- joining Date and Category
)
VAR NewCustomers =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- new customers
ActiveNewCustomers,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( NewCustomers )
RETURN
Result

In this version of the measure, the CustomersWithNewDate variable might compute a different date for each product category. Indeed, SUMMARIZE uses the Product[Category] column as a group-by condition. Consequently, TREATAS specifies the lineage for the three columns in CustomersWithNewDate so that the @NewCustomerDate column can be used later to filter or join the Date[Date] column.

For performance reasons, the CustomersWithNewDate and CustomersCategoryNewDate variables are invariant to the filter context of cells in a report, so their result is computed only once for a single visualization. In order to get the actual new customers, it is necessary to filter those combinations that are not visible in the filter context where # New Customer is evaluated. This is accomplished by the NATURALINNERJOIN in ActiveNewCustomers, which joins the combinations of customer, date, and category visible in the filter context (ActiveCustomersCategories) with the combinations in CustomersCategoryNewDate.

The NewCustomers variable removes the duplicated customers that could be new for different categories in the same period. This way, NewCustomers can be used as a filter in following calculations or it can be counted to obtain the number of new customers, as the # New Customers measure does.

The Sales New Customers measure is similar to # New Customers, the only difference is the Result variable that uses the NewCustomersCat as a filter in CALCULATE instead of just counting the rows of the NewCustomer variable. Therefore, we show here only the last part of the code, using ellipsis for the unchanged sections:

Measure in the Sales table

Sales New Customers :=
...
VAR NewCustomersCat =
SELECTCOLUMNS ( -- new customers/category, remove date
ActiveNewCustomers,
"CustomerKey", Sales[CustomerKey],
"Category", 'Product'[Category]
)
VAR Result =
CALCULATE (
[Sales Amount], -- Count the new customers (or sum Sales)
KEEPFILTERS ( NewCustomersCat ) -- applying the filter for new customers
)
RETURN
Result

Lost customers

The measure computing the number of lost customers is the following:

Measure in the Sales table

# Lost Customers :=
VAR LastDateLost =
CALCULATE (
MAX ( 'Date'[Date] ),
ALLSELECTED ( 'Date' )
)
VAR CustomersWithLostDate =
CALCULATETABLE ( -- Prepares a table that contains
ADDCOLUMNS ( -- for each customer and category,
SUMMARIZE ( -- the corresponding lost date
Sales,
Sales[CustomerKey],
'Product'[Category]
),
"@LostCustomerDate", [Date Lost Customer]
),
'Date'[Date] <= LastDateLost,
ALLSELECTED ( Customer ),
VALUES ( 'Product'[Category] ),
ALLEXCEPT ( -- Removes any filter other than
Sales, -- Customer retrieved by ALLSELECTED so that
Sales[CustomerKey], -- the result is unchanged in
Customer -- different cells of the report
)
)
VAR LostCustomersCategories =
FILTER (
CustomersWithLostDate, -- Filters the customers
[@LostCustomerDate] -- where the lost customer date
IN VALUES ( 'Date'[Date] ) -- falls within the current period
)
VAR LostCustomers =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- lost customers
LostCustomersCategories,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( LostCustomers )
RETURN
Result

In this version of the measure, the CustomersWithLostDate variable might compute a different date for each product category. The reason is that SUMMARIZE uses the Product[Category] column as a group-by condition and that the customer might have different dates of loss – one for each category.

The LostCustomersCategories variable only filters the combinations of customers and categories that have a lost date included in the selected time period. Similarly to the New Customers measure, the LostCustomers variable removes the duplicated customers so it can be used both as a filter and to count the lost customers.

Temporarily-lost customers

The measure computing the number of temporarily-lost customers is a variation of the measure computing the lost customers:

Measure in the Sales table

# Temporarily Lost Customers :=
VAR LastDateLost =
CALCULATE (
MAX ( 'Date'[Date] ),
ALLSELECTED ( 'Date' )
)
VAR MinDate = MIN ( 'Date'[Date] )
VAR FilterCategories =
CALCULATETABLE (
VALUES ( 'Product'[Category] ),
ALLSELECTED ( 'Product' )
)
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that contains
ADDCOLUMNS ( -- for each customer and category,
SUMMARIZE ( -- the corresponding lost date
Sales,
Sales[CustomerKey],
'Product'[Category]
),
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
),
ALLSELECTED ( Customer ),
FilterCategories, -- Filter Product Category from ALLSELECTED
ALLEXCEPT ( -- Removes any filter other than
Sales, -- Customer retrieved by ALLSELECTED so that
Sales[CustomerKey], -- the result is unchanged in
Customer -- different cells of the report
)
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customer without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR PotentialTemporarilyLostCustomers =
FILTER (
CustomersWithLostDate, -- Filter the customers
[@TemporarilyLostCustomerDate] -- where the lost customer date
IN VALUES ( 'Date'[Date] ) -- falls within the current period
)
VAR ActiveCustomersCategories =
CALCULATETABLE (
ADDCOLUMNS (
SUMMARIZE ( -- Gets the first order date
Sales, -- for each combination of
Sales[CustomerKey], -- customer and category
'Product'[Category] -- in the current selection
),
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
),
ALLEXCEPT ( -- Removes any filter other than
Sales, -- customer and date
Sales[CustomerKey],
Customer,
'Date'
),
VALUES ( 'Product'[Category] ) -- Restore related Product[Category] filter
)
VAR TemporarilyLostCustomersCategories =
FILTER ( -- Filters the temporarily-lost
NATURALLEFTOUTERJOIN ( -- customers by combining
PotentialTemporarilyLostCustomers, -- potential lost customers
ActiveCustomersCategories -- and active customers
), -- and then comparing dates
OR (
ISBLANK ( [@MinOrderDate] ),
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
)
VAR TemporarilyLostCustomers =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- temporarily-lost customers
TemporarilyLostCustomersCategories,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( TemporarilyLostCustomers )
RETURN
Result

The CustomersWithLostDateComplete variable needs to enforce the filter on the Product[Category] column by using the VALUES function – though the filter might not be directly applied to that column but rather, to other columns cross-filtering Product[Category].

Similarly, the ActiveCustomersCategories variable creates a table of combinations of Sales[CustomerKey] and Product[Category] along with the first purchase date for each combination of customers and product category. This table is then joined to the PotentialTemporarilyLostCustomers variable, which contains the content of CustomersWithLostDate visible in the current selection. The result of the join filtered by date over the limit of the temporarily-lost date is returned in the TemporarilyLostCustomersCategories variable.

Finally, to avoid counting the same customer multiple times, the measure extracts the customer key before finally counting the number of temporarily-lost customers.

Recovered customers

The number of recovered customers is the number of customers that were temporarily lost before a purchase was made in the current period. It is computed by using the following measure:

Measure in the Sales table

# Recovered Customers :=
VAR LastDateLost =
CALCULATE (
MAX ( 'Date'[Date] ),
ALLSELECTED ( 'Date' )
)
VAR MinDate = MIN ( 'Date'[Date] )
VAR FilterCategories =
CALCULATETABLE (
VALUES ( 'Product'[Category] ),
ALLSELECTED ( 'Product' )
)
VAR CustomersWithLostDateComplete =
CALCULATETABLE ( -- Prepares a table that contains
ADDCOLUMNS ( -- for each customer and category,
SUMMARIZE ( -- the corresponding lost date
Sales,
Sales[CustomerKey],
'Product'[Category]
),
"@TemporarilyLostCustomerDate", CALCULATE (
[Date Temporary Lost Customer],
'Date'[Date] < MinDate
)
),
ALLSELECTED ( Customer ),
FilterCategories, -- Filter Product Category from ALLSELECTED
ALLEXCEPT ( -- Removes any filter other than
Sales, -- Customer retrieved by ALLSELECTED so that
Sales[CustomerKey], -- the result is unchanged in
Customer -- different cells of the report
)
)
VAR CustomersWithLostDate =
FILTER ( -- Removes the customer without a
CustomersWithLostDateComplete, -- temporarily-lost date
NOT ISBLANK ( [@TemporarilyLostCustomerDate] )
)
VAR ActiveCustomersCategories =
CALCULATETABLE (
ADDCOLUMNS (
SUMMARIZE ( -- Gets the first order date
Sales, -- for each combination of
Sales[CustomerKey], -- customer and category
'Product'[Category] -- in the current selection
),
"@MinOrderDate", CALCULATE ( MIN ( Sales[Order Date] ) )
),
ALLEXCEPT ( -- Removes any filter other than
Sales, -- customer and date
Sales[CustomerKey],
Customer,
'Date'
),
VALUES ( 'Product'[Category] ) -- Restore related Product[Category] filter
)
VAR RecoveredCustomersCategories =
FILTER ( -- Filters the recovered customers
NATURALINNERJOIN ( -- by combining active customers
ActiveCustomersCategories, -- and temporarily-lost customers
CustomersWithLostDate -- and then by comparing dates
),
[@MinOrderDate] > [@TemporarilyLostCustomerDate]
)
VAR RecoveredCustomers =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- recovered customers
RecoveredCustomersCategories,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( RecoveredCustomers )
RETURN
Result

The measure first determines the customers that were temporarily lost before the current date, also summarizing by Product[Category]. Because the Sales[CustomerKey] and Product[Category] columns are part of the tables stored in the CustomersWithLostDateComplete and ActiveCustomers variables, the join made in RecoveredCustomersCategories returns a table that has both columns. This ensures that a customer that was to be considered lost for a given category is recovered only if they buy a product of the same category. The customer might appear multiple times in this table, so duplicated customers are removed in RecoveredCustomersCategories in order to count or filter only the unique recovered customers. The Sales Recovered Customers measure is similar to # Recovered Customers; the only difference is the Result variable that uses RecoveredCustomersCat as a filter in CALCULATE instead of just counting the rows of the corresponding RecoveredCustomersCategories variable in the # Recovered Customers measure. Therefore, here we only show the last part of the code, using ellipsis for the identical sections:

Measure in the Sales table

Sales Recovered Customers :=
...
VAR RecoveredCustomersCat =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- recovered customers and categories
RecoveredCustomersCategories,
"CustomerKey", Sales[CustomerKey],
"Category", 'Product'[Category]
)
)
VAR Result =
CALCULATE (
[Sales Amount],
KEEPFILTERS ( RecoveredCustomersCat )
)
RETURN
Result

Returning customers

The last measure in the set of counting measures is # Returning Customers:

Measure in the Sales table

# Returning Customers :=
VAR MinDate = MIN ( 'Date'[Date] )
VAR FilterCategories =
CALCULATETABLE (
VALUES ( 'Product'[Category] ),
ALLSELECTED ( 'Product' )
)
VAR CustomersWithNewDate =
CALCULATETABLE ( -- Prepares a table that contains
ADDCOLUMNS ( -- for each customer and category,
SUMMARIZE ( -- the date of their first purchase
Sales,
Sales[CustomerKey],
'Product'[Category]
),
"@NewCustomerDate", [Date New Customer]
),
ALLSELECTED ( Customer ),
FilterCategories, -- Filter Product Category from ALLSELECTED
ALLEXCEPT ( -- Removes any filter other than
Sales, -- Customer retrieved by ALLSELECTED so that
Sales[CustomerKey], -- the result is unchanged in
Customer -- different cells of the report
)
)
VAR ExistingCustomers = -- To get the existing customers,
FILTER ( -- filters all customers
CustomersWithNewDate, -- and checks that their first purchase
[@NewCustomerDate] < MinDate -- took place before the current time period
)
VAR ActiveCustomersCategories =
CALCULATETABLE (
SUMMARIZE ( -- Retrieves combinations of
Sales, -- Customer, Category, and Date
Sales[CustomerKey], -- active in the current selection
'Product'[Category],
'Date'[Date]
),
ALLEXCEPT ( -- Removes any filter other than Date and
Sales, -- Customer retrieved by ALLSELECTED so that
'Date', -- the result is unchanged in
Sales[CustomerKey], -- different cells of the report
Customer
),
VALUES ( 'Product'[Category] ) -- Restore related Product[Category] filter
)
VAR ReturningCustomersCategories =
NATURALINNERJOIN (
ActiveCustomersCategories, -- Combines active customers
ExistingCustomers -- and existing customers
)
VAR ReturningCustomers =
DISTINCT ( -- Gets the list of unique
SELECTCOLUMNS ( -- recovered customers
ReturningCustomersCategories,
"CustomerKey", Sales[CustomerKey]
)
)
VAR Result =
COUNTROWS ( ReturningCustomers )
RETURN
Result

The measure creates a CustomersWithNewDate variable which obtains the first sale date for each combination of customer and product category. This result is joined to the combination of customers and product category that is present in the current filter context over Sales. The result is the set of returning customers in the ReturningCustomers variable that is counted in the # Returning Customer measure. The Sales Returning Customers measure uses the following ReturningCustomersCat variable as a filter instead of the ReturningCustomers variable. Here we only write its final lines of code, all the remaining code being identical to the previous formula:

Measure in the Sales table

Sales Returning Customers :=
...
VAR ReturningCustomersCat =
SELECTCOLUMNS ( -- new customers/category, remove date
ReturningCustomersCategories,
"CustomerKey", Sales[CustomerKey],
"Category", 'Product'[Category]
)
VAR Result =
CALCULATE (
[Sales Amount], -- Count the number of customers
KEEPFILTERS ( ReturningCustomersCat ) -- that are also in ReturningCustomersCat
)
RETURN
Result

Snapshot absolute

Computing new and returning customers dynamically is a very expensive operation. Therefore, this pattern is oftentimes implemented by using precomputed tables (snapshots) to store the most relevant dates at the desired granularity.

By using precomputed tables, we get a much faster solution albeit with reduced flexibility. In the pre-calculated absolute pattern, the state of new and returning customers does not depend on the filters applied to the report. The results obtained by using this pattern correspond to those of the Dynamic absolute pattern.

The pattern uses a snapshot table containing the relevant states of each customer (New, Lost, Temporarily lost, and Recovered) shown in Figure 10.

The New and Lost events are unique for each customer, whereas the Temporarily lost and Recovered events can have multiple occurrences over time for each customer.

The resulting table is linked to Customer and Date through regular relationships. The resulting model is visible in Figure 11.

Building the CustomerEvents table is a critical step. Creating this table as a derived snapshot by using a calculated table in DAX is relatively efficient for the New and Lost states, whereas it can be very expensive for the Temporarily lost and Recovered states. Keep in mind that Temporarily lost is needed in order to compute the Recovered state. In models with hundreds of thousands of customers or with hundreds of millions of sales you should consider preparing this table outside of the data model, and importing it as a simple table.

Once this model is in place, the DAX measures are simple and efficient. Indeed, for this model there is no need to create external and internal measures – the external measures are already simple. The full logic that defines the status of a customer is in the table itself. This is the reason why the resulting DAX code is much simpler.

The only calculation that requires some attention is the # Returning Customers measure, because it computes the number of customers dynamically while ignoring any filter other than Date and Customer. It then subtracts the number of new customers obtained by querying the snapshot table:

# Returning Customers :=
VAR NewCustomers = [# New Customers]
VAR NumberOfCustomers =
CALCULATE (
[# Customers],
ALLEXCEPT ( Sales, 'Date', Customer )
)
VAR ReturningCustomers =
NumberOfCustomers - NewCustomers
VAR Result =
IF ( ReturningCustomers <> 0, ReturningCustomers )
RETURN
Result

The measures computing the sales amount for new and returning customers take advantage of the physical relationship between the CustomerEvents snapshot table and the Customer table, thus reducing the DAX code required and providing higher efficiency:

Sales Returning Customers :=
VAR SalesAmount = [Sales Amount]
VAR SalesNewCustomers = [Sales New Customers]
VAR SalesReturningCustomers = SalesAmount - [Sales New Customers]
VAR Result =
IF (
SalesReturningCustomers <> 0,
SalesReturningCustomers
)
RETURN
Result

Creating the derived snapshot table in DAX

We suggest creating the CustomerEvents snapshot table outside of the data model. Indeed, creating it in DAX is an expensive operation that requires large amounts of memory and processing power to refresh the data model. The DAX implementation described in this section works well on models with up to a few thousand customers and up to a few million sales transactions. If your model is larger than that, you can implement a similar business logic using other tools or languages that are more optimized for data preparation.

The complex part of the calculation is the retrieving of the dates when a customer is temporarily lost and then possibly recovered. These events can happen multiple times for each customer. For this reason, for each transaction we compute two dates in two calculated columns in the Sales table:

TemporarilyLostDate: this is the date obtained by the Date Temporary Lost Customer measure when there are no other transactions between the current row in Sales and the date. If the same customer put multiple transactions through on the same date, all of them will have the same value in the TemporarilyLostDate

RecoveredDate: this is the date of the first purchase made by that same customer after TemporarilyLostDate. This column is blank if there are no transactions after TemporarilyLostDate.

The code of these calculated columns is the following:

Calculated column in the Sales table

TemporarilyLostDate =
VAR TemporarilyLostDate =
CALCULATE (
[Date Temporary Lost Customer],
ALLEXCEPT ( Sales, Sales[Order Date], Sales[CustomerKey] )
)
VAR CurrentCustomerKey = Sales[CustomerKey]
VAR CurrentDate = Sales[Order Date]
VAR CheckTemporarilyLost =
ISEMPTY (
CALCULATETABLE (
Sales,
REMOVEFILTERS ( Sales ),
Sales[CustomerKey] = CurrentCustomerKey,
Sales[Order Date] > CurrentDate
&& Sales[Order Date] <= TemporarilyLostDate
)
)
VAR Result =
IF ( CheckTemporarilyLost, TemporarilyLostDate )
RETURN
Result

Calculated column in the Sales table

RecoveredDates =
VAR TemporarilyLostDate = Sales[TemporarilyLostDate]
VAR Result =
IF (
NOT ISBLANK ( TemporarilyLostDate ),
CALCULATE (
MIN ( Sales[Order Date] ),
ALLEXCEPT ( Sales, Sales[CustomerKey] ),
DATESBETWEEN ( 'Date'[Date], TemporarilyLostDate+1, BLANK() )
)
)
RETURN
Result

We then use these calculated columns to obtain two calculated tables as an intermediate step to compute the CustomerEvents snapshot table. If you want to leverage an external tool to only compute the Temporarily Lost and Recovered events, you should consider importing these two tables from the data source, where you prepare their content by using dedicated tools for data preparation. The two tables are visible in Figure 12 and Figure 13.

Using these intermediate tables, the CustomerEvents calculated table is obtained with a final UNION of the four states:

Calculated table

CustomerEvents =
VAR CustomerGranularity =
ALLNOBLANKROW ( Sales[CustomerKey] )
VAR NewDates =
ADDCOLUMNS (
CustomerGranularity,
"Date", [Date New Customer],
"Event", "New"
)
VAR LostDates =
ADDCOLUMNS (
CustomerGranularity,
"Date", [Date Lost Customer],
"Event", "Lost"
)
VAR Result =
UNION (
NewDates,
LostDates,
TempLostDates,
RecoveredDates
)
RETURN
Result

Splitting the calculation into smaller steps is useful for educational purposes and to provide a guide in case you want to implement part of the calculation outside of the data model. However, if you implement the calculation entirely in DAX then you can skip the intermediate TempLostDates and RecoveredDates calculated tables. In this case you must pay attention to the CALCULATE functions in order to avoid circular dependencies, by implementing explicit filters obtained by iterating the result of ALLNOBLANKROW. This results in a more verbose definition of the CustomerEvents table, proposed here under the name CustomerEventsSingleTable:

Calculated table

CustomerEventsSingleTable =
VAR CustomerGranularity =
ALLNOBLANKROW ( Sales[CustomerKey] )
VAR NewDates =
ADDCOLUMNS (
CustomerGranularity,
"Date", [Date New Customer],
"Event", "New"
)
VAR LostDates =
ADDCOLUMNS (
CustomerGranularity,
"Date", [Date Lost Customer],
"Event", "Lost"
)
VAR _TempLostDates =
CALCULATETABLE (
SUMMARIZE (
Sales,
Sales[CustomerKey],
Sales[TemporarilyLostDate],
"Event", "Temporarily lost"
),
FILTER (
ALLNOBLANKROW ( Sales[TemporarilyLostDate] ),
NOT ISBLANK ( Sales[TemporarilyLostDate] )
)
)
VAR _RecoveredDates =
CALCULATETABLE (
SUMMARIZE (
Sales,
Sales[CustomerKey],
Sales[RecoveredDate],
"Event", "Recovered"
),
FILTER (
ALLNOBLANKROW ( Sales[RecoveredDate] ),
NOT ISBLANK ( Sales[RecoveredDate] )
)
)
VAR Result =
UNION (
NewDates,
LostDates,
_TempLostDates,
_RecoveredDates
)
RETURN
Result

Although the sample file includes a definition of CustomerEventsSingleTable, the measures in the report do not use that table. If you want to use this approach, you can replace the definition of CustomerEvents with the expression in CustomerEventsSingleTable and remove the former expression from the model – you also want to remove the TempLostDates and RecoveredDates calculated tables that are no longer being used.

]]>
https://www.daxpatterns.com/new-and-returning-customers/feed/0Basket analysis
https://www.daxpatterns.com/basket-analysis/
https://www.daxpatterns.com/basket-analysis/#respondMon, 10 Aug 2020 08:26:12 +0000http://www.sqlbi.com/daxpatterns/?p=6088Read more]]>The Basket analysis pattern builds on a specific application of the Survey pattern. The goal of Basket analysis is to analyze relationships between events. A typical example is to analyze which products are frequently purchased together. This means they are in the same “basket”, hence the name of this pattern.

Two products are related when they are present in the same basket. In other words, the event granularity is the purchase of a product. The basket can be the most intuitive, like a sales order; but the basket can also be a customer; In that case, products are related if they are purchased by the same customer, albeit across different orders.

Because the pattern is about checking when there is a relationship between two products, the data model contains two copies of the same table of products. The two copies are named Product and And Product. The user chooses a set of products from the Product table; the measures show how likely it is that products in the And Product table are associated to the original selection.

We included in this pattern additional association rules metrics: support, confidence, and lift. These measures make it easier to understand the results and they extract richer insights from the pattern.

Defining association rules metrics

The pattern contains several measures, which we describe in detail in this section. In order to provide the definitions, we examine the orders containing at least a product of the Cameras and camcorders category and one product of the Computers category, as shown in Figure 1.

The report in Figure 1 uses two slicers: The Category slicer shows a selection of the Product[Category] column, whereas the And Category slicer shows a selection of the ‘And Product'[And Category] column. The # Orders measure shows you how many orders contain at least one product of the “Cameras and camcorders” category, whereas the # Orders And measure shows how many orders contain at least one product of both the “Cameras and camcorders” and “Computers” categories. We describe the other measures later. First, we need to make an important note: by inverting the selection between Category and And Category, the results are different by design. Most measures provide the same result (# Orders Both, % Orders Support, Orders Lift), whereas confidence (% Orders Confidence) depends on the order of the selection. In Figure 2 you can see the report from Figure 1, with the difference that the selections were inverted between the Category and And Category slicers.

Next, you find the definition of all the measures used in the pattern. There are two versions of all the measures: one considering the order as a basket, the other using the customer as a basket. For example, the description of # And applies to both # Orders And and # Customers And.

#

# Orders and # Customers return the number of unique baskets in the current filter context. Figure 1 shows 2,361 orders containing one product from the “Cameras and camcorders” category, whereas Figure 2 shows 2,933 orders containing at least one product from the “Computers” category.

# And

# Orders And and # Customers And return the number of unique baskets containing products of the And Product selection in the current filter context. These measures ignore the Product selection. Figure 1 shows 2,933 orders containing at least one product from the “Computers” category.

# Total

# Orders Total and # Customers Total return the total number of baskets and ignore any filter over Product and And Product. Both Figure 1 and Figure 2 report 21,601 orders. Be mindful that the filter on And Product is ignored by default because the relationship is not active; the only filter being explicitly ignored in the measure is the filter on Product. If there were a filter over Date, the measure would report only the baskets in the selected time period, and still ignore the filter over Product.

# Both

# Orders Both and # Customer Both return the number of unique baskets containing products from both the categories selected with the slicers. Figure 1 shows that 400 orders contain products from both categories: “Cameras and camcorders” and “Computers”.

% Support

% Orders Support and % Customers Support return the support of the association rule. Support is the ratio between # Both and # Total. Figure 1 shows that 1.85% of the orders contain products from both categories: “Cameras and camcorders” and “Computers”.

% Confidence

% Orders Confidence and % Customers Confidence return the confidence of the association rule. Confidence is the ratio between # Both and #. Figure 1 shows that out of all the orders containing “Cameras and camcorders”, 16.94% also contain “Computers” products.

Lift

Orders Lift and Customers Lift return the ratio of confidence to the probability of the selection in And Product.

A lift greater than 1 indicates an association rule which is good enough to predict events. The greater the lift, the stronger the association. Figure 1 reports that the association rule between “Cameras and camcorders” and “Computers” is 1.25, obtained by dividing the % Confidence (16.94%) by the probability of # Orders And over # Orders Total (2933/21601 = 13.58%).

Sample reports

This section describes several reports generated on our sample model. These reports are useful to better understand the capabilities of the pattern.

The report in Figure 3 shows the products that are more likely to be present in orders containing “Contoso Optical USB Mouse M45 White”.

“SV Keyboard E90 White” is present in 99.45% (confidence) of the orders that contain the selected mouse. The support of 3.33% indicates that the orders with this combination of products represent 3.33% of the total number of orders (21,601 as shown in Figure 1). The high lift value (29.67) is also a good indicator of the quality of the association rule between these two products.

The report in Figure 4 shows the pairs of products that are most likely to be in the same order, sorted by confidence.

The dataset used in this example returns somewhat similar confidence values when the order of the two products is reversed. However, this is not common. Focus on the highlighted lines: when “Contoso USB Cable M250 White” is in the first column the confidence of an association with “SV 40GB USB2.0 Portable Hard Disk E400 Silver” is slightly smaller than the other way around. In real datasets, these differences are usually bigger. Even though support and lift are identical, the order matters for confidence.

The same pattern can use the customer as a basket instead of the order. By using the customer, there are many more products in each basket. With more data, it is possible to perform an analysis by category of product instead of by individual product. For example, the report in Figure 5 shows what the associations are between categories in the customers’ purchase history.

Customers buying “Cell phones” are likely to buy “Computers” too (confidence is 48.19%), whereas only 12.74% of customers buying “Computers” also buy “Cell phones”.

Basic pattern example

The model requires a copy of the Product table, needed to select the And Product in a report. The And Product table can be created as a calculated table using the following definition:

There is an inactive relationship between the And Product table and Sales, connecting the Sales[ProductKey] column used in the relationship between Product and Sales. The relationship must be inactive because it is only used in the measures of this pattern and should not affect other measures in the model. Figure 6 shows the relationships between Product, And Product, and Sales.

We use two baskets: orders and customers. An order is identified by Sales[Order Number], whereas a customer is identified by Sales[CustomerKey]. From now on, we show only the measures for orders, because the measures for the customers are a basic variation – obtained by replacing Sales[Order Number] with Sales[CustomerKey]. The curious reader can find the customer measures in the sample files.

The first measure counts the number of unique orders in the current filter context:

Before we describe the remaining measures in the pattern, a small digression is required. The # Orders measure is actually a DISTINCTCOUNT over Sales[Order Number]. We used an alternative implementation for both flexibility and performance reasons. Let us elaborate on the rationale of this choice.

The # Orders measure could have been written using the following formula with DISTINCTCOUNT:

DISTINCTCOUNT ( Sales[Order Number] )

In DAX this is a shorter way to perform a COUNTROWS over DISTINCT:

The last three versions of the formula return the same result in terms of performance and query plan. Using SUMX instead of COUNTROWS leads to the same result:

SUMX ( SUMMARIZE ( Sales, Customer[City] ), 1 )

Usually, replacing COUNTROWS with SUMX produces a query plan with lower performance. However, the specifics of Basket analysis make this alternative much faster in this pattern. More details about this optimization are available in this article: Analyzing the performance of DISTINCTCOUNT in DAX.

The advantage of using SUMMARIZE is that we can replace the second argument with a column that represents the basket even if it is in another table, as long as the table is related to Sales. For example, the measure computing the number of unique customer cities can be written this way:

SUMX ( SUMMARIZE ( Sales, Customer[City] ), 1 )

The first argument of SUMMARIZE needs to be the table containing the transactions, like Sales. If the second argument is a column in Customer, then you have no choice: you must use that column. For example, for the city of the customer you specify Customer[City]. In case you use the column that defines the relationship, like CustomerKey for Customer, then you can choose to use either Sales[CustomerKey] or Customer[CustomerKey]. Whenever possible, it is better to use the column available in Sales to avoid traversing the relationship. This is why instead of using Customer[CustomerKey] to identify the customer as a basket, we used Sales[CustomerKey]:

Now that we have explained why we use SUMMARIZE instead of DISTINCT to identify the basket attribute, we can move forward with the other measures of the pattern.

# Orders And computes the number of orders by using the selection made in And Product. It activates the inactive relationship between Sales and And Product:

# Orders Both (Internal) is a hidden measure used to compute the number of orders including at least one item of Product and one item of And Product:

Measure in the Sales table

# Orders Both (Internal) :=
VAR OrdersWithAndProducts =
CALCULATETABLE (
SUMMARIZE ( Sales, Sales[Order Number] ),
REMOVEFILTERS ( 'Product' ),
REMOVEFILTERS ( Sales[ProductKey] ),
USERELATIONSHIP ( Sales[ProductKey], 'And Product'[And ProductKey] )
)
VAR Result =
CALCULATE (
[# Orders],
KEEPFILTERS ( OrdersWithAndProducts )
)
RETURN
Result

This hidden measure is useful to compute # Orders Both and other calculations described later in the optimized version of the pattern. # Orders Both adds a check to return blank in case the selection in Product and And Product contains at least one identical product. This is required to prevent the report from showing associations between a product and itself:

Measure in the Sales table

# Orders Both :=
IF (
ISEMPTY (
INTERSECT (
DISTINCT ( 'Product'[ProductKey] ),
DISTINCT ( 'And Product'[And ProductKey] )
)
),
[# Orders Both (Internal)]
)

% Orders Support is the ratio of # Orders Both to # Orders Total:

Orders Lift is the result of the division of % Orders Confidence by the ratio of # Orders And to # Orders Total, as per the formula we had introduced earlier:

The code described in this section works. Yet, the measures might display performance issues in case there are more than a few thousand products. The optimized pattern provides a faster solution, but at the same time it requires additional calculated tables and relationships to improve the performance.

Optimized pattern example

The optimized pattern reduces the effort required at query time to find the best combinations of products to consider. The performance improvement is obtained by creating calculated tables that pre-compute the existing combinations of products in the available baskets. Because we consider orders and customers as baskets, we created two calculated tables that are related to Product, as shown in Figure 7.

The RawProductsOrders and RawProductsCustomers tables contain in each row, a combination of two product keys alongside the number of baskets containing both products. The rows that would combine identical products are excluded:

The filter from Product automatically propagates to the two RawProducts tables. Only the filter from And Product must be moved through a DAX expression in the # Orders Both measure. Indeed, # Orders Both is the only measure that differs from the ones in the basic pattern:

# Orders Both cannot use the # Orders Both (Internal) implementation because of the way it applies the filters. # Orders Both transfers the filter from And Product to RawProductsOrders and then to Sales in order to retrieve the orders that include any of the items in Any Product. This technique is somewhat complex, but it is useful in order to reduce the workload in the formula engine. All this results in better performance at query time.

]]>
https://www.daxpatterns.com/basket-analysis/feed/0Survey
https://www.daxpatterns.com/survey/
https://www.daxpatterns.com/survey/#respondMon, 10 Aug 2020 08:16:18 +0000http://www.sqlbi.com/daxpatterns/?p=6079Read more]]>The Survey pattern uses a data model to analyze correlations between different events related to the same entity, such as customer answers to survey questions. For example, in healthcare organizations the Survey pattern can be used to analyze data about patient status, diagnoses, and medicine prescribed.

Pattern description

You have a model that stores answers to questions. Therefore, consider a Questions table containing questions and possible answers shown in Figure 1.

The answers are stored in an Answers table containing in each row the survey target (the Customer in this case), one question and one answer. There are multiple rows in case the same customer provides multiple answers to the same question. The real model would store information with integer keys; In Figure 2 we are using strings to clarify the concept.

By using a DAX formula, we can answer a request like, “How many customers enjoy cartoons, broken down by job and gender?” Consider Figure 3 as an example. In this table, totals are not strict totals. This is explained later.

The report includes two slicers to select the questions to intersect in the report. The columns in the matrix have the answers to the question selected in the Question 1 slicer, whereas the rows of the matrix provide the details of questions and answers corresponding to the selection made in the Question 2 slicer. The highlighted cell shows that 9 customers who answered Cartoons to the Movie Preferences question also answered Female to the Gender question.

In order to implement this pattern, you need to load the Questions table twice. This way you can use two slicers for the questions to analyze. Moreover, the relationship between the two copies of the questions must be inactive. Because we use the tables as filters, we named them Filter1 and Filter2. You can see the resulting diagram in Figure 4.

To compute the number of customers who answered Q1 (the question filtered by Filter1) and Q2 (the question filtered by Filter2) you can use the following formula:

The formula activates the correct relationship when computing CustomersQ1 and CustomersQ2. It then uses the two variables as filters for the Answers table, which filters the customers through the CROSSFILTER modifier.

You can compute any calculation using the previous formula – provided that the CROSSFILTER modifier makes the Answers table filter the table you are basing your code on. Therefore, you can replace COUNTROWS ( Customer ) with any expression involving the Customers table. For example, the RevenueQ1andQ2 measure provides the total revenue made off of the customers included in the selection; The only difference with the CustomersQ1andQ2 measure is the Revenue Amount measure reference that replaces the previous COUNTROWS ( Customer ) expression:

It is important to understand the condition computed in each cell. We use Figure 6 to explain this further, where we labeled a few cells from A to E.

Here is what is computed in each cell:

A

Female and prefers Cartoons

B

( Female or Male ) and prefers Cartoons

C

( Female or Male or Consultant or IT Pro or Teacher ) and prefers Cartoons

D

( Female or Male ) and prefers ( Cartoons or Comedy or Horror )

E

( Female or Male or Consultant or IT Pro or Teacher ) and prefers ( Cartoons or Comedy or Horror )

The formula uses an and condition for the intersection between questions selected in Question 1 and Question 2, whereas it uses an or condition for the answers provided to one same question. Remember that the or condition means “any combination” (do not confuse it with an “exclusive or”) and the or condition also implies a non-additive behavior of the measure.