A few months ago, I posted an article outlining how to connect Power BI to Azure Application Insights and Azure Log Analytics (jointly referred to as Azure Monitor) with Direct Query. This article describes an approach that allows you to use a native Kusto connector to connect to the Azure Monitor instance as if it were an ADX cluster. This allows for Direct Query to be used, among other things. The option connecting Power BI available through the Azure Monitor UI uses an html connector to query the respective APIs, and that connector doesn’t support Direct Query.
The problem with using this connector is that it’s a bit of a hack. At the time it was written, you needed to use the old Power BI driver for Kusto to make it work, and that approach isn’t simple. Over time, it stopped working altogether for Application Insights. The ADX connector has since been updated to support connection to Azure Log Analytics (but not Application Insights) and is therefore still valid.
There is however another way to achieve this by using your own ADX cluster. ADX clusters allow for “cross-cluster queries” that permit tables in a database in one cluster to be joined or unioned with tables in a completely different cluster. The same proxy addresses mentioned above can be used in one of these cross-cluster queries, and in this way, be just use the ADX cluster as an intermediary.
To create a Power BI report that queries Azure Monitor data using Direct Query, first create a new report, and connect to data using the “Azure Data Explorer (Kusto) connector”. Enter the address of the cluster, and the name of a database within that cluster. The database itself doesn’t matter; it simply provides a scope for the query. Finally, you need to specify the query, and this is where the cross-cluster query comes into the picture. The query takes the following form:
The cross-cluster query for the table named “pageViews” in an Application Insights instance named “WhitePagesLogs” in a Resource group named “MyResourceGroup” in the subscription “71a90792-474e-5e49-ab4e-da54baa26d5d” is therefore”
It is worth explicitly noting that the resource name appears twice in the query – once in the cluster address, and as the database name.
When ready, the Get data dialog box should appear as follows:
If you want to use Direct Query, don’t forget to open the Advanced Options section, and select it here.
At this point, the report can be built, and it will behave as if it was a normal ADX cluster. You can of course build more complex queries, etc, but you cannot build things like functions, or materialized vies, since you do not have administrative access to the engine behind Azure Monitor.
Compared to using the Power BI ADX connector directly, this approach has the advantage of being explicitly supported, and it also works with bot Application Insights, and Log Analytics. On the downside, there is a cost to running your own ADX cluster, although it is minimal. This cluster is simply acting as a gateway in this case, and therefore, a bare minimum of resources will suffice.
Azure Data Explorer (ADX) is a great platform for storing large amounts of transactional data. The Incremental Refresh feature (now available for Pro users!) in Power BI makes it much faster to keep data models based on that data current. Unfortunately, if you follow the standard guidance from Microsoft for configuring Incremental Refresh, you’ll quickly bump into a roadblock. Luckily, it’s not that difficult to get around.
Incremental Refresh works by setting up data partitions in the dataset in the service. These partitions are based on time slices. Once data has been loaded into the dataset, only the data in the most recent partition is refreshed.
To set this up in Power BI Desktop, you need to configure two parameters, RangeStart, and RangeEnd. These two parameters must be set as Date/Time parameters. Once set, the parameters are used to filter the Date/Time columns in your tables accordingly, and once published to the service, to define the partitions to load the data into.
When Power Query connects to ADX, all Date/Time fields come in as the Date/Time/Timezone type. This is a bit of a problem. When you use the column filters to filter your dates, the two range parameters won’t show up because they are of a different type (Date/Time). Well, that’s not a big problem, right? Power Query lets us change the data column type simply by selecting the type picker on the column header.
Indeed, doing this does in fact allow you to use your range parameters in the column filters. Unfortunately, data type conversions don’t get folded back to the source ADX query. You can see this by right-clicking on a subsequent step in the Power Query editor. The “View Native Query” option is greyed out, which indicates that the query cannot be folded.
Query folding is critical to incremental refresh. Without it, the entirety of the data is brought locally so that it can be filtered vs having the filter occur at the data source. This would completely defeat the purpose of implementing Incremental Refresh in the first place.
The good news is that you can in fact filter a Date/Time/Timezone column with a Date/Time parameter, but the Power Query user interface doesn’t know that. The solution is to simply remove the type conversion Power Query step AFTER performing the filter in the Power Query UI.
Alternatively, if you’re comfortable with the M language, you can simply insert something like the following line using the Advanced Editor in Power Query (where CreatedLocal is the name of the column being filtered).
#"Filtered Rows" = Table.SelectRows(Source, each [CreatedLocal] >= RangeStart and [CreatedLocal] < RangeEnd),
If the filtration step can be folded back into the source, Incremental Refresh should work properly. You can continue setting up Incremental Refresh using the DAX editor. You will likely see some warning messages indicating that folding can’t be detected, but these can safely be ignored.
Application Insights (AI) is a useful way of analyzing your application’s telemetry. Its lightning-fast queries make it ideal for analyzing historical data, but what happens when you start to bump into the limits? The default retention for data is 90 days, but that can be increased (for a fee) to 2 years. However, what happens when that’s not enough? If you query too much, or too often you may get throttled. When you start to bump into these limits, where can you go?
The answer lies in the fact that Application Insights is backed by Azure Data Explorer (ADX or Kusto). Moving your AI data to a full ADX cluster will allow you to continue using AI to collect data, and even to analyze recent data, but the ADX cluster can be sized appropriately and used when the AI instance won’t scale. The fact that it is using the same engine and query language as AI means that your queries can continue to work. This article describes a pattern for doing this.
Requirements
We’ll be working with several Azure components to create this solution. In addition to your AI instance, these components are:
Azure Data Explorer cluster
Azure Storage Account
Azure Event Namespace and at least one Event hub
Azure Event Grid
The procedure can be broken down into a series of steps:
Enable Continuous Export from AI
Create an Event Grid subscription in the storage account
Create an ADX database and ingestion table
Create an Ingestion rule in ADX
Create relevant query tables and update policies in the ADX database
Enable Continuous Export from Application Analytics
AI will retain data for up to 2 years, but for archival purposes, it provides a feature called “Continuous Export”. When this feature is configures, AI will write out any data it receives to Azure blob storage in JSON format.
To enable this, open your AI instance, and scroll down to “Continuous Export” in the “Configure” section. Any existing exports will show here, along with the last time data was written. To add a new destination, select the “Add” button.
You will then need to select which AI data types to export. For this example, we will only be using Page Views, although multiple types can be selected.
Next, you need to select your storage account. First select the subscription (if different from your AI instance), and then select the storage account and container. You will need to know what data region the account is in. Once selected, save the settings.
Initially, the “Last Export” column will display “Never”, but once AI has collected some data, it will be written out to your storage container, and the “Last Export” column will display when that occurred. Once it has occurred, you should be able to open your storage account using Storage Explorer, and then the container to see the output. In the root of the container selected above, you’ll see a folder that is named with the AI Instance name, and the AI instance GUID.
Opening that folder, you’ll find a folder for each data type selected above (if there has been data for them). Each data types will be further organized into folders names for the day, and the hour. Multiple files will be contained withing with the .blob extension. These are multiline json files and can be downloaded and opened with a simple text editor.
The next step is to raise an event whenever new content is added to this storage container.
Create an Event Grid subscription in the storage account
Prior to this step, ensure that you have created, or have available an Event namespace, and an Event hub. You will connect to this hub in this step.
From the Azure portal, open the storage account and then select the “Events” node. Then click the “Event Subscription” button at the top.
On the following screen, you’ll need to provide a name and schema for the subscription. The name can be whatever you wish, and the schema should be “Event Grid Schema”. In the Topic Details section, you will provide a topic name which will pertain to all subscriptions for this storage account. In the “Event Types” section, you select the types of actions that will fire an event. For our purposes, all we want is “Blob Created”. With this selection, the event will fire every time a new blob is added to the container. Finally, under “Endpoint Details”, you will select “Event Hubs” from the dropdown, and then you click on “Select an endpoint” to select your Event Hub.
Once created an event will fire anytime a blob is created in this storage account. If you wish to restrict this to specific folders or containers, you can select the Filters tab, and create a subject filter to restrict it to specific file types, containers, etc. More information on Event Grid filters can be found here. In our case, we do not need a filter.
When ready, click the “Create” button, and the Event subscription will be created. It can be monitored from the storage account and can also be monitored in the Event hub. As new blobs are added to the storage account, more events will fire.
Create an ADX database and ingestion table
From Azure portal, navigate to your ADX cluster and either select a database or create a new one. Once the database has been created, you need to create at least one table to store the data. Ultimately, Kusto will ingest data from the blobs added above whenever they are added, and you need to do some mapping to get that to work properly. For debugging purposes, I find it useful to create an intermediate ADX table to receive data from the blobs, and them transform the data afterward.
In this case, the intermediate table will have a single column, Body that will contain the entirety of each ingested record. To create this table, run the following KQL query on your database:
.create-merge table Ingestion (Body: dynamic)
The dynamic data type in ADX can work with JSON content, and each record will go there. For this to work, you also need to add a mapping to the table. The mapping can be very complex, but in our case, we’re doing a simple load in, so we’re matching the entire JSON record to the Body column in our database. To add this mapping, run the following KQL command:
At this point, we are ready for an ingestion rule.
Create an Ingestion rule in ADX
From the Azure portal, open your ADX cluster, and select the “Databases” node in the “Data” section, then click on your database.
The setting that we need is “Data ingestion” in the resulting window. Selecting that takes you to the ingestion rules. Now you want to create a new connection by selecting the “Add data connection” button.
The first selection is the data connection type. The options are Event Hub, Blob storage, or Iot Hub. We need to select Blob storage. Both it, and Event hub will connect to an Event hub, but the difference it that using “Blob storage”, the contents of the blobs will be delivered, and selecting “Event Hub” will only deliver the metadata of the blob being added.
Once the type is selected, you give it a name, and choose the event grid to connect to (the one that you created above) and the event type. Next, you select “Manual” in the Resources creation section. Selecting “Automatic” will create a new Event Hub Namespace, Hub, and Event grid and you won’t have any control of the naming of these resources. Selecting Manual allows you to keep it under control. Select your event grid here.
Next, select the “Ingest properties” tab, and provide the table and mapping that you created above (which in our case was “RawInput”). Also, you need to select “MULTILINE JSON” as the data format.
Once these values are complete, press the Create button and the automatic ingestion will commence. Adding a new blob to the storage account will fire an event, which will cause ADX to load the contents of the blob into the Body column of the Ingestion table. This process can take up to 5 minutes after the event fires.
Create relevant query tables and update policies in the ADX database
Once ingestion happens, your “Ingestion” table should have records in it. Running a simple query in ADX using the table name should show several records with data in the “Body” column. Opening a record will show the full structure of the JSON contained within. If records with different schema are being imported, a query filter can be employed to limit the results to only those records.
For example, the pageViews table in AI will always contain a JSON none named “view”. The query below will return only pageView data from the ingestion table:
This ingestion table can be queried in this matter moving forward, but for performance and usability reasons, it is better to “materialize” the views of this table. To do this, we create another table, and set an update policy on it that will add relevant rows to it whenever the ingestion table is updated.
The first step is to create the table. In our case, we want to replicate the schema of the pageViews table in Application Insights. This is because we want to be able to reuse any queries that we have already built against AI. All that should be necessary is to change the source of those queries to the ADV cluster/database. To create a table with the same schema of the AI pageViews table (mostly), the following command can be executed in ADX:
Once the table is created, we need to create a query against the Ingestion table that will return pageViews records in the schema of the new table. Without getting deep into the nuances of the KQL language, a query that will do this is below:
The “where isnull(Body.view) == false” statement above uniquely identifies records from the pageViews table. This is useful if multiple tables use the same Ingestion table.
Next, we need to create a function to encapsulate this query. When we add an update policy to the pageViews table, this function will run this query on any new records in the Ingestion table as they arrive. The output will be added to the pageViews table. To create the function, it’s a simple matter of wrapping the query from above in the code below and running the command:
.create-or-alter function pageViews_Expand {
Query to run
}
This creates a new function named pageViews_Expand. Now that the function has been created, we modify the update policy of the pageViews to run it whenever new records are added to the Ingestion table, and its output will be added to the pageViews table. The command to do this can be seen below:
After the next ingestion run, not only will you see records in the Ingestion table, but if there were page views, you should see the results show up in the pageViews table as well.
If you have data already in the Ingestion table that you want to bring in to the pageViews table, whether for testing or for historical purposes, you can use the .append command to load rows into the table from the function:
.append pageViews <| pageViews_Expand
Finally, if you don’t want to maintain data in the Ingestion table for very long, or not at all, you can set the retention policy on it. Data will be automatically purged from it at the end of the time limit. Setting the value to zero will purge the data immediately, and in that case, the Ingestion table simply becomes a conduit. To set the retention policy on the Ingestion table to 0, you can run the following command:
There are several steps involved, but once everything is wired up, data should flow from Application Insights to Azure Data Explorer within a few minutes. This example only worked with the pageViews table, but any of the AI tables can be used although of course their schemas will be different.
Application Insights (AI) and Log Analytics (LA) from Microsoft Azure provide easy and inexpensive ways to instrument applications. Using just an instrumentation key, any application can send operational data to AI which can then provide a rich array of tools to monitor the operation of the application. In fact, the blog that you are reading uses an Application Insights plugin for WordPress that registers each view of a page into an instance of AI in my Azure tenant.
Application Insights data can be queried directly in the Azure portal to provide rich insights. In addition, the data can be exported to Excel for further analysis, or, it can be queried using Power Query in either Excel or Power BI. The procedure for using Power Query can be found in this article. The approach for doing so, uses the Web connector in Power Query, which can be automatically refreshed on a regular basis. The Web connector does not however support Direct Query, so the latency of the data in this scenario will be limited by the refresh schedule configured in Power BI. Any features that depend on Direct Query (Aggregations, Automatic Page Refresh) will also not work.
If you’ve worked with AI or LA, and dropped down to the Query editor, you’ve been exposed to KQL – The Kusto Query Language. This is the language that is used by Azure Data Explorer (ADX), or as its code name, “Kusto”. This is of course not a coincidence, as the Kusto engine powers both AI and LA.
Power BI contains a native connector for ADX, and if you can configure an ADX cluster for yourself, populate it, and work with it in Power BI for both imported and Direct Query datasets. Given that ADX is what powers AI and LA, it should be possible to use this connector to query the data for AI and LA. It turns out that the introduction of a new feature known as the ADX proxy will allow us to do just that.
The ADX proxy is designed to allow the ADX user interface to connect to instances of AI and LA and run queries from the same screens as native ADX clusters. The entire process is described in the document Query data in Azure Monitor using Azure Data Explorer. The document explains the process, but what we are particularly interested in is the syntax used to express an AI or LA instance as an ADX cluster. Multiple variations are described in the document, but the ones that we are most interested in are here:
For LA: https://ade.loganalytics.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.operationalinsights/workspaces/<workspace-name>
For AI:https://ade.applicationinsights.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.insights/components/<ai-app-name>
By substituting in your subscription ID, resource group name, and resource name, you can treat these resources as if they were ADX clusters, and query them in Power BI using Direct Query. As an example, a simple query on this blog can be formed using the ADX connector:
And the result will appear as:
The precise query is provided in the query section of the connector above.
Once the report is built, it can be deployed to the PowerBI service, and refreshed using AAD credentials.
It is important to note that this method does NOT require you to configure an ADX cluster of your own. We are simply utilizing the cluster provided to all instances of AI and LA. We therefore do not have any control over performance levels, as we would have in a full ADX cluster. However, if the performance is adequate,(and the queries designed appropriately), this can be a good approach to work with AI and LA data that has low latency (near real time) requirements.