Analyze SQL Agent Job History with Power BI

The purpose of this post is to share and describe a Power BI-based solution for analyzing SQL Server Agent job history data. SQL Server Agent has been around for many years and though increasingly my projects involve other scheduling and orchestration tools like Azure Automation, SQL Agent remains relevant both for on-premises BI/DW processes and in Azure via SQL Managed Instance.

In fact, dependencies on SQL Agent for scheduled jobs is one of the main reasons SQL Managed Instance was created as this reduces the friction in migrating on-premises workloads to Azure.

You can download the Power BI template (.PBIT) described in this blog from GitHub:

Power BI Template for SQL Agent Job History

Power BI Template: SQL Instance Parameter Required

I’d strongly recommend you read through this blog post to better understand the context and details of this solution before you attempt to execute the template. At a minimum, you should check that you have the SQL Agent service running, note the name of the server its running on (the one required parameter per the above image), and confirm that you can query the SQL Agent tables in the MSDB database such as dbo.sysjobs.

This template uses native T-SQL queries so in Power BI Desktop you might also disable the user approval requirements for new native queries prior to running the template:

Global Security Options in Power BI Desktop

SQL Server Agent Background

I assume many readers of this blog are familiar with SQL Agent and thus can skip ahead to later sections. For others, the SQL Agent service allows for the creation and scheduling of ‘jobs’ with each ‘job’ comprised of a sequence of one more ‘steps’. For example, a job could first execute a T-SQL stored procedure, then run an SSIS package, and maybe finally process/refresh an SSAS database or execute a PowerShell script. SQL Agent logs the duration of each step, each instance/execution of the overall job, any retries attempted, error messages, and other useful information.

In a common use case, a BI and data warehouse team will link together multiple steps (or multiple jobs) to support a nightly data warehouse and data model update process. If the nightly SQL Agent job fails, the team/user closest to the technology involved in the given step will review the agent log and apply changes necessary to successfully complete the process from the point of failure and in future instances of the job.

The following image from SQL Server Management Studio (SSMS) is from a four-step SQL Agent job scheduled to occur nightly:

SQL Server Agent Log File Viewer

In the above example, Power BI and Azure AD data is retrieved via PowerShell scripts in steps 1 and 2, then an SSIS package is used to load this data to a SQL Server database in step 3, and finally a Power BI dataset used for administering Power BI is triggered to refresh. This entire process executed in about 2.5 hours with the Azure AD data retrieval step requiring 1.5 hours itself.

As an aside, some of the PowerShell scripts I’ve featured in recent blog posts containing loops over Power BI app workspaces require time (e.g. 52 minutes per image) and of course loops are suboptimal from a performance standpoint. There’s a recent blog post on avoiding workspace loops you may consider. However, if you filter down your workspaces to exclude personal/my workspaces prior to executing the loops, a nightly 50+ minute process such as this example for a large Power BI environment might be acceptable.

What problem are we solving?

When you’re getting started with SQL Agent just viewing the job history and error log files via the graphical interfaces in SSMS might be sufficient. If you want to become more productive and proactive in your analysis, you’ll look for a programmatic solution such as writing T-SQL queries against the SQL Agent tables in the MSDB system database including dbo.sysjobs, dbo.sysjobhistory, dbo.sysjobsteps, and others. Over the years several blog posts have been written containing T-SQL example approaches to querying this data and MS Docs itself provides an example T-SQL query for the job history table.

The Power BI-based solution starts with T-SQL queries but builds on top of this layer to leverage several of Power BI’s rich and integrated features including Power Query functions, DAX expressions, reporting visualizations, and more. The core idea of this solution is to have a data model in place (a Power BI ‘dataset’), including fact to dimension relationships and measures with built-in calculation logic to make it relatively easy to build SQL Agent reports and dashboards and to browse/analyze SQL agent history data.

Since the data model already includes the SQL Agent data stored in the MSDB database and can be refreshed on a schedule, users (typically DBAs and other IT admins) don’t have to write new or modified SQL queries on an ad hoc basis to answer different questions. Within the PBIX file containing the data model itself or a separate file or tool connected to the published dataset users can drag-and-drop and filter to find what they need.

Its outside the scope of this blog post but you could leverage any number of tools to analyze the SQL Agent History Power BI dataset beyond Power BI itself. Given XMLA endpoint connectivity, you could use Excel, Paginated/SSRS reports, and other 3rd Party BI tools just like an Analysis Services database.

Report Pages (Visual Layer)

In describing any Power BI solution I’d prefer to start from the source systems and data and walk through the logic and components of the Power Query Editor and then the Modeling View before saying anything about any reports or dashboards. However, experience suggests that introducing the report/visual layer, at least a representative sample, further addresses the ‘Why does this exist?’ question and will help maintain attention to these two more fundamental layers.

With that being said, the following short slideshow displays the three report pages and highlights two tooltip pages included in this initial version of the template:

The two main report pages of the PBIX file that’s generated from the template are the Job Level and Job Step Level pages. As mentioned earlier, SQL Agent is comprised of ‘jobs’ and ‘jobs’ contain one or many ‘steps’ which will execute on a schedule. Therefore, these two pages help address the most basic questions such as “Do I have any job failures?”, “How long are my jobs taking to complete?”, “Which steps within my jobs are taking the longest to complete or are failing?”, etc.

There’s a report-level filter (impacting all report pages) that’s set for the trailing last 30 days and you can also use the Date slider at the top to modify the date range of the report based on the run date of the job. You can also use the slicer at the top right to filter for one or multiple ‘jobs’ of interest and the slicers are synchronized across the pages such that these selections will be carried over.

The tooltip pages are intended to provide additional context as you hover over a specific agent job or a specific agent job step. The bar charts on the Job Level and Job Step Level pages will render these tooltip pages to give additional details relevant to these specific items (or ‘dimension members’ if you prefer) including a line chart of the duration for the given job or step over time.

Many additional and more visually impressive and analytically powerful reports could be developed. Depending on the extent of SQL Agent usage these additional reports could make sense but for common use cases such as routine monitoring and troubleshooting these reports or slight variations might be sufficient and represent quite an improvement from ad hoc SQL queries).

The Data Model (Sematic Layer)

At the center of the solution are two star schema models – one for job history at the job level and one at the job step level. You might think of this as the header or order level of a sales or purchase order versus the line level of these transactions which includes the individual products.

The following two images from Modeling view expose the fact to dimension (many to one) relationships.

Both the SQL Agent Job Step History and the SQL Agent Job History fact tables have many-to-one relationships with the Date and the SQL Agent Jobs dimension tables. The SQL Agent Job Step History fact table also has an active many-to-one relationship to the SQL Agent Job Steps dimension table.

Each row of the SQL Agent Job History table reflects one instance of a job being executed from start to finish inclusive of all steps. Each row of the SQL Agent Job Step History table reflects the execution of one step of a job. The source table in the MSDB database for both the aggregate/job level fact table and the step-level fact table is dbo.sysjobhistory. Every row of a fact table should have the same meaning and therefore there’s filtering logic in the solution to separate the rows representing the overall job from those representing an individual step of a job.

In addition to the relationships, there are display folders of basic DAX measures in each fact table per the following image:

SQL Agent DAX Measures

As their names indicate, these measures either count the number of jobs or steps that succeeded or failed or apply an aggregation to a duration fact column (in minutes). The presence of the DAX measures and the relationships makes development of the report pages relatively straight forward.

Power Query (M) Data (ETL Layer)

The fact and dimension tables of the data model are the product of a data transformation process executed via T-SQL statements against the Agent tables in the MSDB system database and Power Query (M) expressions. The T-SQL statements contain logic to prep and stage the Agent tables in MSDB such as adjusting date and time columns into data types and structures more suitable for reporting. The results of these SQL queries are further enhanced and modified with Power Query (M) expressions.

Here’s a look at the Power Query Editor:

Power Query Editor

Like normal, I like to use query groups to isolate queries into those that are being loaded to the data model (dimension and fact tables) from those that are not but are required for staging/prepping processes.

One item you might find a bit novel or interesting in this design is that I’m storing T-SQL statements in their own ‘queries’ returning text primitive values – the text value being the T-SQL statement. In this way, the queries which invoke the Sql.Database() function can simply reference these ‘queries’ as the value in the Query parameter. This means that A) the staging queries against the MSDB tables are shorter and cleaner to read and B) the T-SQL statements can be more easily viewed and modified if necessary in the future.

In this version, I filter out common system Agent jobs such as ‘SSIS Server Maintenance Job’ and ‘Sys Policy History Purge’ in order to focus the solution on user/business created jobs. There are filters and queries to split job level and step level data into their own flows, joins to integrate job schedule attributes and to support the relationship between job history at the step level with job steps.

The date table is completely created on the fly via Power Query (M) expressions. For this version, the query creates a date table from the current date (at the time of execution) back 365 days. This might be acceptable for the limited scope of this solution or you might want to replace this date table with an ‘official’ date dimension table if you have one.

Given the length of this post I can’t go into significant depth here on the queries but there are comments included in the M expressions and I think the structure and syntax is simple enough that an intermediate user of Power Query (M) could reason her way through the queries.

Don’t mix native queries and M, normally

If you’ve been working with Power Query (M) for any significant period of time you know that any logic/transform you apply after a native query will not be folded back to the source system. The general guidance is always to either A) accomplish all required logic in the native query or B) accomplish all required logic in a Power Query (M) expression which can be folded/translated into a query for execution by the source system.

Therefore, I want to be very clear that I’m only using both tools in the same solution because the scale of this data is very small. When you’re querying DMV and system table sources which contain maybe hundreds or a few thousand rows, some level of processing executed via the Power Query engine is probably acceptable and even advisable in some cases.

In this case, Power Query functions such as #duration, #time, and Duration.TotalMinutes() are used on top of the SQL statements and in my view this is actually a simpler approach than consolidating all logic into much more extensive T-SQL queries. If you beg to differ, perhaps given relative comfort/skill in T-SQL, you can always re-factor the T-SQL queries and remove the M expressions.

Wrapping Up

If you work with SQL Agent regularly and are at least open to using Power BI I hope you’re able to give this template a try and find some value in it. If you don’t use SQL Agent maybe this blog post and sample solution can give you some idea of leveraging Power BI to build your own monitoring and administrative solutions for the tools/services you are responsible for.

If you’d like to receive email notification of future blog posts you can click ‘Follow’ in the upper right.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s