DAX variables can be used to optimize performance of queries against SSAS Tabular, Power BI Desktop, and Power Pivot for Excel models by re-using a DAX expression multiple times within the same metric or query. However, performance can be negatively impacted by defining multiple variables and only returning or using one of the variables through conditional logic. In this latter scenario, the DAX engine will generate storage engine queries regardless of whether the variable is needed in the RETURN clause or not.
Note: One of the performance improvements from ‘Super DAX’ in SSAS 2016 (Super DAX) is the elimination of storage engine queries when a branch of an IF or Switch condition is false. Perhaps in a future update to the engine DAX Variables that are not needed by an IF or SWITCH will also not be evaluated.
One of the top new features of DAX for SSAS 2016 Tabular, Power BI Desktop, and Excel 2016 Power Pivot is variables. With the simple keywords of VAR and RETURN you can define multiple expressions, including tables, and reference them within the same expression. Solid overviews are available at the following links: SQL BI, PowerPivotBlog, MSDN.
The three main benefits of DAX Variables are as follows:
- Readability: By moving the logic and filtering conditions of DAX expressions into a variable the code following the RETURN clause is much more concise and logical.
- Context Transition: Variables are evaluated where you define them (after the VAR) so rather than using recursive functions like EARLIER to ‘step out’ of row context you can simply reference a named variable that’s already outside of row context.
- Performance: If you have multiple conditions to apply to a single value such as the example below you can split # of queries required by evaluating a single variable and re-using for each condition.
Now what if we go in reverse – multiple variables defined but returning one result? Is there a penalty?
Let’s say you need to apply different filtering logic on one dimension based on the value of another dimension. In this example we apply filtering conditions on the product dimension for different sales region. The data is small (Adventure Works) and maybe the scenario itself isn’t very common but it helps illustrate a point about DAX variable declaration and evaluation.
Design A: No Variables
With the following metric we simply ‘bake in’ the different filtering expressions by region within the SWITCH condition – no variables. We rely on the improved DAX handling of IF/SWITCH to only run the query it needs at run time – if only France is needed for the query/report then the other expressions should be discarded.
Let’s jump over to DAX Studio to confirm this is true against a query:
Yes, only three tiny Scan events despite no variables being declared.
Design B: Variables
Now what if we build the filtering rules into variables – one for each country as follows:
Clearly the readability is improved by separating expressions/filter from business logic but code readability/manageability usually takes a clear backseat to performance and functionality.
Look at this – we went from 3 storage engine queries up to 11 SE queries by switching to variables. Performance in duration also suffered.
I’ve compared the two metrics in other queries and found similar results – the metric declaring many variables consistently generates a multiple of the storage engine queries the no variable metric does. Against a large dataset this behavior and approach with variables could be very painful in terms of performance.
Two final takeaways:
- Yes, DAX variables are indeed difference makers and I’m sure they’ll make their way into new and existing SSAS Tabular 2016 and Power BI Desktop models soon for the reasons identified. (Even if you don’t define them explicitly in your model, the Power BI visuals will take advantage of them)
- Just remember that, as of this writing, you can’t expect the engine to avoid computing a variable you define ‘up front’ in an expression even if it’s not needed by your metric. Just as defining a variable once and re-using it multiple times benefits performance, defining many variables and only calling/invoking one can diminish performance.
*Next week I plan to blog about nested CALCULATE and CALCULATETABLE functions relative to OR and AND conditions.