Technical Debt Limit
One of the top news stories here in America is growing fiscal deficits and the politics involved in raising the federal debt limit to allow for greater borrowing. Almost all market participants expect that debt limit to be raised as in past years but, given recent movements in US Treasury yields (up) and the US Dollar (down), America could soon be facing something of a credit limit.
I believe this issue relates well to business intelligence and I have a basic outline in place for next week’s blog. For this week, however, I thought I would list examples of technical debts in BI projects, particularly from a Power BI and MSBI perspective. Additionally, I call out some of the causes of these debts and some common symptoms or characteristics of highly indebted BI environments.
Is it all bad?
No, not when it’s managed and ultimately paid back. Some debt can be useful in highlighting and bridging gaps with the current state. The problem I usually see is that the existence of the debt is forgotten as essential requirements, testing, and timelines were met and teams move on to new projects. At some point in the future, the debt re-appears in various forms such as high support costs, lack of accuracy, data refresh failures, query performance issues, lack of scalability, and more.
As one example, let’s say a business user creates several DAX measures in a Power BI report which uses a live connection to an Analysis Services model. In this scenario, either the user isn’t aware that the necessary measures have already been created or there’s a particular analysis or business question that the current model doesn’t support. In the short term, these report-scoped measures may prove sufficient as the business user may only be experimenting within a small group. However, if they become common or widely utilized it probably makes sense to build them into the model.
What does it come from?
Like all forms of debt, technical debt pulls forward future consumption to the present such that someone (a BI organization, for example), can consume beyond their current ability to produce (their income). In less economic terms, the BI/IT team has a certain amount of resources (people, tools, time, infrastructure, existing BI architecture) and these resources are sometimes insufficient to meet certain aspects of projects. Technical debt thus arises to inflate the team’s ability to produce or deliver for their stakeholders.
A few examples of sources of technical debt:
- A data warehouse system exists but it essentially represents an operational data store (ODS) or copies of source system tables.
- A true data warehouse of fact and dimension tables exist with a supporting ELT/ETL process but no master data management or data quality tools are utilized.
- The BI team is fairly new to the tool(s) such as Power BI or Analysis Services.
- Whether an M query or a DAX measure is efficient or not might not be clear.
- The team BI manager (under some pressure) commits to a delivery date that’s more driven by stakeholder expectations than team resources or estimates (if any).
- Each BI project is it’s own island of custom requirements and delivery approaches.
- There’s not a central BI strategy and set of delivery standards in place.
- There are not defined roles (ie report author, dataset designer) and regular communication between those roles.
Technical Debt Examples
- Multi-layered Data Transformation
- In addition to a standard nightly ETL/ELT process, complex SQL queries are built into source views of the Power BI datasets.
- On top of these SQL views, complex M queries implement further transformations.
- On top of these M queries, DAX calculated columns further extend the dataset.
- In some cases, even the DAX measures have built-in logic to account for data transformation issues.
With all these dependencies spread across different tools the dataset becomes harder to support and reuse. This is a very big topic so I’ll stop here for now but I will be giving a webinar on April 11th via the PUG Academy.
- M Queries (or SQL) as Substitute for Data Warehouse
- You don’t have data transformation spread across many layers but nonetheless the data retrieval process supporting your data model carries a heavy burden.
- The query tries to build actual dimensions (via joins), implement business rules, and cleanse the data. Of course these are all things that other tools are built for.
- Maybe the M query runs without failure, for now, but it requires heavy RAM from your gateway server as it can’t be folded.
In my view some minimal level of data transformation via M or SQL is tolerable. Over time, however, this logic should be built into a data warehouse such that the data model stays true to its purpose as an analytical platform.
- Single Point of Failure
- You have one gateway server for the On-premises data gateway rather than a cluster of gateway resources.
- You have one Power BI Report Server rather than multiple instances in a scale-out deployment.
Like almost all technical debts, someone says “we’ll take care of this at some point” or “we’re just trying to getting started for now”.
- Multiple Datasets
- Dataset A was built to support shipping and inventory while Dataset B includes purchase orders.
- Alternatively, you allow or even build duplicates of the same dataset with slightly different business definitions for different projects.
As separate datasets, you can’t build reports that cut across these distinct business processes (fact tables) with a common set of dimensions. Assuming reports have already been built against both datasets migrating to a consolidated dataset could be messy.
- DirectQuery Source Not Optimized
- You build a DirectQuery model on top of a source which hasn’t been enhanced or optimized for the BI workload.
- For example, the organization has upgraded to SQL Server 2016 but still hasn’t used the clustered or non-clustered columnstore index.
DirectQuery performance is significantly improving this year but it might not be enough to overcome an insufficiently designed or tuned source.
- Pervasive Report-Level Filtering
- In almost all Power BI reports, the same filter condition(s) have to be applied.
Maybe there’s a certain use case where this data needs to be the dataset or maybe this data needs to be filtered out upstream. Alternatively, a rule can be created and a single column with two values can be used to apply the filter.
- Hyper Visual Reports
- Power BI reports are created with excessive levels of customization to provide certain aesthetics or user experience.
- These reports often have several layers of objects (visuals, shapes, images) which, while visually attractive, often can’t be quickly changed to meet new requirements. Additionally, they can’t be pinned to dashboards.
If you’re following solid report design and visualization practices then high levels of customization may not be needed. For example, using bookmarks and drillthrough features the way they’re intended to be used makes sense. Stacking multiple visuals on top of each other to give the impression of a single visual might not make as much sense.
- Lack of Documentation and Metadata
- Knowledge of the various definitions and rules built into the BI solutions is scattered throughout the team.
- Business users aren’t sure which measure to use or which columns are included in which tables.
I’ve previously blogged and written about using the Analysis Services DMVs and the Field Descriptions.
Signs You’ve Reached a Technical Debt Limit
At a certain point, you might not realistically be able to take out more debt to pay off the old debt. For example, you simply have to rebuild the fundamentals of the environment or certain components of it rather than take on new BI projects. This could be very painful (just as America paying off its debts will be) but it’s necessary.
- You have version control issues between the multiple or duplicate datasets.
- You have availability issues with datasets not refreshing or users reporting errors.
- You have performance issues (despite adequate hardware) with users complaining.
- It’s difficult or costly to implement further changes given existing customizations.
- The BI team’s time is largely devoted to supporting the existing environment and the issues that crop up.
- Perhaps someone on the team leaves as she would prefer to develop or build rather than support.
- You’re unable to provide a consistent, integrated, organizational view.
- Your monitoring solution surprises you with how much or how little various resources are being used – there’s not a consistent pattern.
As mentioned, next week’s post will take a look at America’s financial debt but I’ll be sure to include plenty of Power BI/MSBI content such that it’s relevant to this blog. Given other projects and a few events planned for April I may not be able to blog in March.