The combination of Power BI and Application Insights (AI)/Log Analytics (LA) is a powerful one. These tools provide a quick, convenient, and relatively cheap way to collect and analyze telemetry on a wide variety of applications. One drawback of AI/LA is that any data query will return a maximum of 500,000 rows, which can be quite constraining in some cases. This article describes a way to work around this limit.
In this example, we’ll be working with an Application Insights instance that is being populated by the WordPress Application Insights plugin – in fact, it’s the one used on this very blog. There are a couple of ways to connect Power BI Desktop to AI data. The Power Query code is downloadable directly from Application Insights, and you can also use the Azure Data Explorer proxy address as outline in my post on the topic here. This approach will work for both methods, and for our purposes, we’ll be using the generated Power Query code approach.
To begin, access your Application Insights instance, and open the Logs window. If necessary, dismiss the “Queries” window that pops up. Next, form your query using Kusto Query Language (KQL). In our case, we want a simple dump of all rows in the “pageViews” table, so the query is simple – just pageViews.
Once we have the query the way that we want it, we select the Export button, and choose “Export to Power BI (M query). M is the name of the language that Power Query uses. Once chosen, a text file will be downloaded that contains the Power Query that we will need in Power BI Desktop.
At this point, we launch Power BI Desktop, and choose “Get Data”. Since we already have the query that we need, we will choose “Blank Query”.
Next, we name our query “Page Views”, and select the Advanced Editor. This is where we can paste in the query generated by Application Insights above. At this point, we open the file that was downloaded above, copy the contents, and paste them in this window (the top comments can be excluded).
Of note here is the value that will be automatically set for timespan. By default, this will be set to P1D, which means data will be retrieved only for the previous day. In our example above, we have changed it to show data for the past 365 days.
Selecting “Done” will load a preview of our data into Power Query. However, if we want to then load it into the data model, it will do so in a single pull, and we will be subject to the 500,000 row limit. What we need to do is break up our query into multiple queries, and Power Query lets us do this through the use of functions.
The first thing that we’ll need to do is to decide on how to segment the AI data. In our case, it is unlikely that we will have more than 500,000 page views per month, so if we performed one query per month, we should be able to retrieve all of our data. In order to do this, we need to go back to Application Insights, and form up a query that will return a list of year and month for our data. In our case, this query is:
pageViews | where timestamp > now(-365d) | summarize by Year = datetime_part('Year',timestamp), Month = datetime_part('Month',timestamp)
Note that the number of days in the where clause above should match the number of days in the larger query above. Next, export this query to Power BI, and create another query in Power Query. Leave the name as default for now. Selecting Done should return a list of years and months for your data. These values are all numbers, and Power Query recognizes them as such. However, we need to work with them as text later on , so we change their types to text.
Now we will return to our original query, and modify it so that it only returns data for a single month. Reopen the advanced editor and replace the query “pageViews” with:
pageViews | where datetime_part('Month',timestamp) == 10 and datetime_part('Year',timestamp) == 2020
The values chosen don’t matter, but they should return data, In the end, the edited code should look as follows:
Selecting done, we verify that we have data restricted to the specified month. This is where the fun begins. We are now going to turn this query into a function. To do so, we right click on our pageViews Query, and select “Create Function”
We are then presented with a dialog box that asks if we want to create the function without parameters. We can go ahead and select “Create”. We are then prompted to name the function, and we’ll call it “GetViewsByMonthAndYear”. We now need to edit the function. To do so, with the function selected in the query pane, we select the Advanced Editor once again. We then dismiss the following warning, and then we edit the function in two places. First, we need to define two variables to be passed to the function Month and Year , and then we add them to our query.
In the function declaration we add “Month as text” and “Year as text”. We then replace the explicit month and year that we originally queried for with these new variables, Month and Year. Our function code now appears as below:
Now we are ready to use our function. We select our query that contains the list of years and months, select the “Add Column” tab from the ribbon, and choose “Invoke Custom Function”. We give the new column a name “Views”, select our function from the dropdown, and then we select our column containing years and the column containing months to be passed to the function.
At this point, selecting “OK” will cause the function to be executed for each of the listed months. These are individual queries to AI, not one large one. Each query is still subject to the 500,000 row limit, but provided that no specific month exceeds that limit, all of the data will be returned.
Initially, the data is returned as a single table per day, but selecting the expand icon at the right of the column header allows us to retrieve the row values. It’s also a good idea to turn off the “Use original column name” option.
Selecting OK at this point displays all of the appropriate column values. We can then remove the “Year” and “Month” columns, as well as the original Page Views table that we used to create the function. We also need to set the data types for all of our columns because Power Query is unable to detect them using this approach.
Renaming our combined Query to Views, gives us the following result:
We still have a single table, but there is no longer a 500,00 row limit. At this point, we can load the data into the model and build our report.
Hi, really well done !
I want to ask you if it’s possible to do incremental refresh.
For example, using the method above to have all application insights logs and refresh only the 24 hours or day ?
Also, interested to see if incremental refresh can be enabled using the above method. For example, I would like to download the last 90 day history and THEN enable increment refresh (perhaps daily) to store past data and obtain new data daily and build the history.
Is it possible to enable incremental refresh after using functions to break up query?