Parallel Power BI Scripts

PowerShell 7.0 was made generally available last month and one of it’s new features is pipeline parallelization for the ForEach-Object cmdlet. Scripts running on PowerShell 7 or higher can simply specify the new Parallel parameter (ForEach-Object – Parallel) to significantly reduce the duration of these scripts.

Although there are some options for avoiding looping over Power BI workspaces such as the GetGroupsAsAdmin REST API and the Include parameter that was recently added to the Get-PowerBIWorkspace cmdlet, if you’re still looping over workspaces or other objects either via scheduled processes or in ad hoc scenarios, you might use this blog as some motivation to consider moving over to PowerShell 7.

Test Script and Results

To compare scripts with and without the Parallel parameters I accessed a Power BI tenant (as a PBI admin) with 1,545 active workspaces, 1,909 datasets in these workspaces, and 2,226 reports in these workspaces. The requirement of the test script was simple and realistic – retrieve all of these artifacts and write the data to three CSV files (one for each artifact type) including the workspace ID for the reports and datasets. (Including the workspace ID makes it possible to define relationships and write queries against this data)

You can download and compare/test both scripts from my GitHub repository. (The .ps1 files are formatted, unlike the code blocks below)

One of the two scripts used the ForEach language keyword for the datasets and reports (which cannot be parallelized currently) like the following sample:

$Datasets = ForEach ($WS in $ActiveWorkspaces)
    {
        $WSID = $WS.Id
        Get-PowerBIDataset -Scope Organization -WorkspaceId $WSID | Where-Object {$_.Name -ne "Report Usage Metrics Model" -and $_.Name -ne "Dashboard Usage Metrics Model"} |`
        Select-Object Id,Name,ConfiguredBy,DefaultRetentionPolicy,AddRowsApiEnabled,Tables,WebUrl,Relationships,Datasources,DefaultMode,IsRefreshable, `       IsEffectiveIdentityRequired,IsEffectiveIdentityRolesRequired,IsOnPremGatewayRequired,TargetStorageMode,ActualStorage, `
        @{Name="WorkspaceID";Expression={$WSID}},@{Name="DateRetrieved";Expression={$RetrieveDate}}
    }

The script that utilized this (non paralell) approach took 28 minutes to complete. The other script, which uses the new Parallel parameter from PowerShell 7, completed in under 8 minutes – 71% faster. The parallel script (of course) uses the ForEach-Object cmdlet like the following:

$Datasets = $ActiveWorkspaces | ForEach-Object -Parallel {$WSID = $_.Id; Get-PowerBIDataset -Scope Organization -WorkspaceId $WSID | `
    Where-Object {$_.Name -ne "Report Usage Metrics Model" -and $_.Name -ne "Dashboard Usage Metrics Model"} | `
    Select-Object Id,Name,ConfiguredBy,DefaultRetentionPolicy,AddRowsApiEnabled,Tables,WebUrl,Relationships,Datasources,DefaultMode,IsRefreshable, `   IsEffectiveIdentityRequired,IsEffectiveIdentityRolesRequired,IsOnPremGatewayRequired,TargetStorageMode,ActualStorage, `
    @{Name="WorkspaceID";Expression={$WSID}},@{Name="DateRetrieved";Expression={$RetrieveDate}}}

It’s important to note that the ForEach-Object cmdlet requires piped input like the $ActiveWorkspaces variable in the above example. It’s easy to get them confused but the ForEach-Object cmdlet is completely distinct from the ForEach keyword used in the first example above.

The performance impact for your scenario will of course vary based on the size of the tenant being accessed, threads available (I have 4 cores on my machine) and whether you use the optional -ThrottleLimit parameter. If you’re an admin over a very large Power BI deployment and your script retrieves not just datasets and reports but also dashboards, dataflows, and data sources then switching to the parallel method via PowerShell 7 could be quite the benefit to nightly Power BI admin loads and general admin analysis scenarios. (For a bit of context on ‘nightly admin loads’, you might read Power BI Admin datasets)

Moving Forward

PowerShell 7 is not ‘PowerShell Core’, it’s now the GA and supported version of PowerShell with backward compatibility to Windows PowerShell modules and many new features to come in upcoming releases. The ability to run PowerShell 7 on multiple platforms (Mac, Linux) along with the roadmap makes it the clear choice for new PowerShell script development.

Thus, though I became very comfortable with Windows PowerShell and the Windows PowerShell ISE, moving forward I’m using VS Code as my primary script editor for working with PowerShell 7 and higher.

PBI World Tour: New York City

I’ll be presenting at next month’s Power BI World Tour event in New York City regarding the development of dynamic paginated reports. This will be a virtual event so if you’re interested in this topic and/or others being presented you can just log in like we’re all doing with everything else at the moment.

Wrapping Up

Thanks for visiting Insight Quest and I hope you found this post or something from the archive useful. If you’re interested in getting notified of future blog posts, you can click the ‘Follow’ button on the home page.

6 comments

  1. The chokepoint for my tenant (15,000 workspaces, 20,000 reports, 15,000 datasets) has always been retrieving all data sources for each dataset. That’s been my white whale as I’ve never been able to get any method (PS, API calls, Power Automate, etc) to return the information. Will this method help with extracting all datasources from every dataset?

    Like

    1. Hi Jordan,
      Thanks for sharing these details. Yes, this method is applicable to retrieving data sources as well. In the case of data sources, you’re iterating over the dataset objects rather than workspaces but the same ForEach-Object -Parallel approach via PowerShell 7+ should work to deliver much better performance. I’ve tweaked the blog to now mention data sources as well.

      Like

      1. Thanks Brett – I took a stab at it but this isn’t working – anything jump out to you that I’m missing:

        Connect-PowerBIServiceAccount

        $RetrieveDate = Get-Date
        $DatasourceFile = “G:\Power BI Artifacts\Datasources.csv”

        $Datasets = Get-PowerBIDataset -Scope Organization

        $Datasources = $Datasets | ForEach-Object -Parallel {$DSID = $_.Id; Get-PowerBIDatasource -Scope Organization -DatasetId $DSID | `
        Select-Object *, @{Name=”DatasetID”;Expression={$DSID}},@{Name=”DateRetrieved”;Expression={$RetrieveDate}} | `

        $Datasources | Export-Csv $DatasourceFile

        Like

      2. It looks like you’re missing a closing bracket after $RetrieveDate to close out the ForEach-Object. It should have three brackets at the end rather than two.

        I’ll try data sources and see if i run into issues. (I’m sure you are running PowerShell 7.0, right?)

        Like

      3. I ran the following for data sources but for some reason I didn’t get any performance benefit over the older approach:
        $DataSources = $Datasets | ForEach-Object -Parallel {$DSID = $_.Id; $WSID = $_.WorkspaceID; Get-PowerBIDatasource -Scope Organization -DatasetId $DSID -ErrorAction SilentlyContinue | `
        Select-Object Name,ConnectionString,DatasourceType,ConnectionDetails,GatewayId,DatasourceId, `
        @{Name=”WorkspaceID”;Expression={$WSID}},@{Name=”DatasetId”;Expression={$DSID}},@{Name=”DateRetrieved”;Expression={$RetrieveDate}}}

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s