The new Power BI Activity Log makes it much easier for Power BI administrators to access audit log data to monitor and analyze activities occurring within the tenant they support. This blog post describes one solution for accessing, processing, and loading Power BI activity log data as part of an automated process to support ongoing Power BI administrative analysis and reporting.
What problem is being solved? In the absence of an existing audit log retrieval and storage process, the admin team and other interested parties are unable to answer even the most basic questions such as “Which content is being consumed/accessed the most?” or “Is adoption of Power BI, in terms of content creation and usage increasing or decreasing?”
A simple but complete and reliable process is needed for retrieving one day’s worth of Power BI activity log data (the current maximum) and loading this log data to a database. The database table, which should have integrity constraints to prevent duplicates, can then serve as the source object for reports or possibly as part of a larger Power BI Admin dataset.
High Level Solution
As is generally the case there are several approaches involving different tools and techniques to be considered. In this blog post/example here are the essentials:
- A PowerShell script running under their identity of a Power BI admin account retrieves all activity log data (all users, all activity types) for a full day and writes the data to a CSV file.
- This includes various Power BI ‘activities’ ranging from the most common (e.g. ‘ViewReport’, ‘ViewDashboard’) to less common (but very important) activities related to creating, editing, and interacting with content.
- Ultimately each row is an ‘event’ and thus you’ll generally be counting the number of events of a given activity type (e.g. report views).
- A Data Flow task within an SSIS package reads the CSV file, removes any duplicate activity log rows, and then loads the data to a SQL Server database table.
- The SQL Server database table, which could be within an Azure SQL Database or an on-premises instance of SQL Server, will prevent any duplicate activity log rows from being loaded and serve as either the direct source for reporting/analysis via T-SQL queries or the source for a fact table in a Power BI dataset or Analysis Services model (a ‘cube’).
- This specific example goes ahead and leverages other administrative jobs/scripts against Azure Active Directory and Power BI to support relationships with a Users, Workspaces, and Date table.
- SQL Server Agent is used for scheduled execution of the PowerShell script and the SSIS package for loading the activity log table.
A PowerShell script and T-SQL DDL statement to create an activity log table is uploaded to my GitHub repository for your review/sample. To implement this example you would create the SQL table, re-factor the script for your authentication and CSV source file path, and then build an SSIS package to load the table.
SQL Agent is very familiar/common for Microsoft IT shops so scheduling the script and SSIS package as part of a nightly job is outside the scope of this post.
The PowerShell script references the variables ($PBIAdminUPN, $PBIAdminPW) contained in an external PowerShell script which contains the credentials of a Power BI admin account for authentication via the Connect-PowerBIServiceAccount command.
The script then defines the path location of the CSV file to write the log data to and builds two string values based on the current date to provide the starting and ending date parameter values. For example, the $StartDT and $EndDT variable values for today are ‘2019-12-23T00:00:00.000’ and ‘2019-12-23T23:59:59.999’, respectively.
The script executes the Get-PowerBIActivityEvent command for the StartDateTime and EndDateTime interval and converts this JSON to a custom PowerShell object. Finally, the properties (columns) of the PowerShell object are selected along with the datetime value that the script was executed. This explicitly defines the schema of the CSV source file structure and finally the object is exported to the CSV file path.
Since the source CSV file is already filtered to a single day’s worth of data, the SSIS package doesn’t need to apply any incremental logic that’s common in ETL (extract-transform-load) scenarios. However, it’s definitely possible that the activity log file will have a few duplicate rows (honestly I’m not sure why, but they do exist) and thus the SSIS package can remove these duplicates prior to loading the SQL DB via the Sort transformation.
Here’s a look at the data flow task from source (flat file connection) to destination (Ole DB to Azure SQL DB):
Be sure to set the OutputColumnWidth property of each column to align with the data types you use in your target SQL table. The sort transformation is used to specify the activity log columns that should define a unique log row as defined in the SQL table’s primary key constraint.
Duplicates will be discarded prior to being loaded per the Execute Task image (1,588 rows in source file, 1,587 rows loaded to the SQL database). Without this step, the primary key constraint would cause the data flow task to fail.
Activity Log Analysis
You could just start to fill up the SQL table now and build more useful reporting once you have enough history to support anything resembling a trend. Some simple T-SQL queries filtering and grouping on the columns could yield important insights.
If you want to go further (much further), you might consider investing the time in an analytical model full of Power BI and related data (e.g. Azure AD) such as the following image from the modeling view of a Power BI dataset:
In the above example, the Users table reflects a nightly PowerShell script that retrieves Azure Active Directory data (via AzureAD module) and the Workspaces table reflects another PowerShell script that utilizes the Power BI Management module. With relationships and metrics defined, you can just drag-and-drop to build visualizations against activity log data.
So with a simple PowerShell script, an SSIS task, and a destination table you could potentially be off and running with Power BI activity log data. You could (and probably will need to) go further by enhancing the data such as adding a column for grouping the activities into parent categories such as ‘Create’, ‘View’, ‘Update’, etc. You could also account for UTC time and provide a time dimension to support drilldown into the specific hours of a day. Maybe I’ll follow up this blog post with these details or this content could be included in a book next year – we’ll see.
Thanks for visiting Insight Quest and feel welcome to Subscribe (click Follow) to be notified of future blog posts.
Hi Brett, another great blog post.
Do you have a link of which I could then create the “external PowerShell script which contains the credentials of a Power BI admin account”
Good to hear from you and hope all is well. I should probably have added a bit of detail or reference materials for the authentication piece as I’ve used this in a few past posts. The approach is called ‘dot sourcing’: https://ss64.com/ps/source.html
With dot sourcing, the variables defined in the PowerShell script (C:\BIAdminScripts\AdminCreds.ps1) become available in the current scope. Therefore, you can store the credential values you use across multiple scripts in this one ‘AdminCreds’ script (e.g. $PBIAdminUPN = “GilbertQ@abc.onmicrosoft.com”) and then just reference these variables for authentication like the example in this blog post. If/when the password for the Power BI admin user changes you only then need to revise the value of the $PBIAdminPW variable in the AdminCreds script for all dependent scripts to still work. In the same script I’ve stored variables used for authenticating to Azure AD via service principal such as application IDs, thumbprints, and tenant Ids.
Azure Key Vault (https://azure.microsoft.com/en-us/services/key-vault/) would likely be a best option for securely storing and reusing credential values but this is a relatively new service that might not be immediately available for the given project. Ultimately the dot sourcing approach allows you to centrally store credential values (and potentially other common values) rather than hard coding these values in each script.
Hi Brett, I am well thanks.
Thanks for those additional details.
What I meant to ask was do you have the actual AdminCreds.ps1 example file that I could look at?