I was recently asked by Christian Buckley what my top 2016 blog posts were. No problem I thought, I just went back to my output for the past year, and pulled out the posts that I knew have had a lot of discussion or impact, and forwarded them on. At that point he asked how many views that each of those pages have had. Being a data guy, I suddenly felt like the shoemaker noticing that his children had been going barefoot.
I monitor my blog traffic with the built-in WordPress JetPack tools, StatCounter, and Google Analytics. They all work slightly differently, with StatCounter and JetPack being the most alike. I tend to rely on StatCounter for immediate stats (how many hits today, what’s popular today) and the Google stats for a longer time frame. StatCounter doesn’t persist my stats beyond a day, as I don’t have the pro version, and the JetPack stats don’t seem very extensible. Google Analytics seemed like the best place to begin, particularly because there is a pre-existing content pack for Power BI.
The Google Analytics Content Pack
I have used the Google Analytics (GA) content pack casually and for demonstration purposes since it was introduced with the Power BI launch in July 2015. It hasn’t changed much. Actually, as far as I can tell, it hasn’t changed at all. To use the content pack, you simply log into the Power BI service, select “Get Data”, select the “Services” tile, and select Google Analytics.
After you enter in your credentials by selecting oAuth2, Power BI will import your GA data into a data model, and populate a pre-configured report. The report consists of several pages, mostly focused on visitors to the site.
There are some interesting visuals out of the box, and there are more metrics available in the data model if you want to customize the out of the box reports. At the moment, any customizations that are made in this way are not portable, and with the content pack, data is only retained for 180 days, which means that year over year comparisons are not possible. The visuals don’t appear to have been updated since initial release, which means that many of the new Power BI UI enhancements are not there, but they too can be added through customization.
Generally, if you’re going to do a lot of customization, the best tool to use is Power BI Desktop. Reports can then be reused easily and are highly portable. Luckily, in addition to the content pack, Google Analytics also exists as a data source for Power BI Desktop.
Using the Google Analytics Data Source in Power BI Desktop
When Power BI Desktop imports data from GA, it imports all the data that GA has. There seems to be no agreement on how long Google will retain this data, but in practice, GA seems to retain all data since it was originally configured. In my case, that’s a little over two years now, which is fine for my analysis. The first step is to connect to and import the correct data. Start Power BI Desktop, select “Get Data”, choose the Online Services tab and choose “Google Analytics”.
Once you authenticate, you’ll be presented with all of the sites that are monitored by Google Analytics. You’ll want to drill down and open “All Web Site Data”. GA captures an awful lot of information, and the trick is to know what to grab. Grabbing everything won’t work as it only allows for 8 dimensions and measures in a single import. In my case, I am interested in PageViews and Unique PageViews measures, and the Page, Page Title and Landing Page dimensions (under the “Page Tracking” section) measures. In addition, I want the Date, Hour, and Minute dimensions from the Time section.
Once selected, w select OK, and edit the query, giving it a good name like “GA Data”. Finally, we can select “Close and Apply” and the data will be added. This procedure can take a little while depending on the quantity of data.
Once loaded, we need to do a little bit of work in the data model. We imported the dates from GA, but we’ll want to do year/month/day drilldowns, as well as use textual values for month names, day names etc. For that, the tried an true method has been to build a Date table. Power BI itself will actually do some of this automatically for you behind the scenes, but a custom table gives us the ultimate in flexibility. DAX (the Power BI modelling language) makes this very easy. We create a new table by first selecting the “Modeling” tab, and then the New Table button. This allows us to create a calculated table in the formula bar. First change the name from “Table” to something meaningful like “View Dates”, and then add the following formula:
ADDCOLUMNS ( CALENDAR (DATE(2010,1,1), DATE(2025,12,31)), "Date As Integer", FORMAT ( [Date], "YYYYMMDD" ), "Year", YEAR ( [Date] ), "Month Number", FORMAT ( [Date], "MM" ), "Year Month Number", FORMAT ( [Date], "YYYY/MM" ), "Year Month Short", FORMAT ( [Date], "YYYY/mmm" ), "Month Name Short", FORMAT ( [Date], "mmm" ), "Month Name Long", FORMAT ( [Date], "mmmm" ), "Day Of Week Number", WEEKDAY ( [Date] ), "Day Of Week", FORMAT ( [Date], "dddd" ), "Day Of Week Short", FORMAT ( [Date], "ddd" ), "Quarter", "Q" & FORMAT ( [Date], "Q" ), "Year Quarter", FORMAT ( [Date], "YYYY" ) & "/Q" & FORMAT ( [Date], "Q" ) )
Adjust the beginning and end dates to suit the data in question, click the check mark, and voila, instant date table. There will be a record for every date between the beginning and end dates. It’s a good idea to adjust the properties of some of the resultant columns for display, we want to sort the Month Name Long and Month Name Short columns by Month Number, and the Day of Week column by the Day of Week Number column. Any additional customizations can be made as necessary.
The next step is to establish the relationship between the Date column in the GA table, and the Date field in the new calculated date table. Simply click on the relationship builder icon, the drag and drop the Date column from one table to the corresponding column on the other.
At this point, we can create a visual that shows traffic over time. We create a column chart, and add Pageviews as the Value, then we add Year Month Short (which should be sorted by Year Month Number) to the axis, and we should see site all site traffic over time. Adding Date to the axis and stripping out all the dimensions except Day allows us to drill down on days for a selected month.
Although we can see our site traffic by month, we still can’t answer Christian’s original question, which was “what were the most frequently viewed posts written in 2016“. Google Analytics has no clue when the pages were created. It’s possible to try to imply it from the earliest viewed date for a given page, but the created date is available directly in WordPress. We just need to get the WordPress data into the data model. Thankfully, that is possible through the WordPress REST Add on.
Using the WordPress REST Add-On
REST support is available for WordPress as an add-on. The “WP REST API” is available in the add-on catalog, and on Github here. Once installed, all WordPress content (including posts) is available through a simple http GET request. This is something that’s fully supported by Power BI, and therefore all the relevant post data can be loaded into Power BI through this add-on.
From the Power BI Home tab, select Get Data, then “web” and then use the URL required to retrieve posts. For the blog that you’re reading, it’s http://whitepages.tygraph.com/wp-json/wp/v2/posts. The query will return a list of records. However, there will only be as many records as WordPress shows by default. We need all of them. The add on-allows you to specify the number of posts per page, by adding the “per_page” parameter. Therefore, in our case, it’s http://whitepages.tygraph.com/wp-json/wp/v2/posts?per_page=50 where 50 is the desired number of items per page.
The per_page parameter is all that you need if the number of posts to analyze is fewer than 100, but the limit of this parameter is 100. There is another parameter that can be added to the query, page= that will specify the page number. With this, and the posts per page parameter, it’s possible to get all the posts. There are a couple of ways to implement this in Power BI.
The ideal way is to an “M” function. With a function, you build up a query normally, and then you wrap it in another parameterized query using the advanced editor, passing in the page number as a parameter, and that parameter being used in the subsequent query. The function can then be called from each record of another table, thereby returning all the posts, which is exactly what we need. This approach works perfectly well in Power BI Desktop. Unfortunately, once the model and report are deployed to the Power BI service, it stops working. The Power BI service currently cannot refresh any query that uses replaceable parameters as part of the query.
The other way that this can be handles is to generate multiple queries that explicitly use the page= parameter. The number of queries necessary will be equal to the number of posts divided by 100, then rounded up to the next whole number. In my case, I have 230 posts, and therefor need 3 queries. Once created, all 3 queries can be merged into a single table. This approach is messy, and will require occasional maintenance, but it’s the only one that works for now. Let’s walk through the process.
We’ll start with the first query. As above, we use Get Data, select the Web source and enter the URL for page 1 and 100 posts per page. For this blog the URL is http://whitepages.tygraph.com/wp-json/wp/v2/posts?page=1&per_page=100. The query should show a list of 100 records. Next, we need to turn the list into a table so that it can be expanded. Click the “To Table” button in the ribbon.
Click OK to accept the defaults, and then click the small expand button in the column header (Column1). Be sure to deselect the “Use original column name as prefix” before clicking OK.
At this point, all the post metadata from WordPress should be available. You can choose to keep all or only some of the columns, but the ones that we want to be sure to keep are date, slug, and title. Title needs to be expanded, so we should go ahead and do that – the procedure is the same as the step above, but only the title field is returned as “rendered”. It’s a good idea to rename it to Title. Also, it’s a good idea to set the data type of the Date field to Date/Time here.
Once the query is the way we want it, we’ll want to name it something like “Posts1-100”, and then we need to set its data load properties to not load into the report. We don’t want the data to load into this query because it will only be one merge source of three, and we don’t want to store the data redundantly. To do that, we right click on the query, select properties, and deselect “Enable load to report”. Then click OK.
We now need to duplicate this query for page 2. The easiest way to do this is by copying all the M script generated by the query builder into a new blank query, and then editing it. From the Home tab, we click on “Advanced Editor”, then select and copy all the text in the dialog box. We then close the dialog box, then select New Source – Blank Query. Once opened, we again select “Advanced Editor”, remove the default content and paste the copied text into the box. Finally, “page=1” in the URL is replaced with “page=2”.
We then save the query, name it and set the properties not to load as with the first query. We then repeat all these steps for page 3. At this point we are ready to merge the queries into our “master” query.
To merge the three queries into one, we select the “Append Queries” dropdown from the ribbon, and select “Append Queries as New”. We then select “Three or more tables” and add the three tables and select OK. Finally, we give this new query a name “Posts” but we do not prevent the data from loading. This is our master table. At this point, we are ready to Close and Apply, and return to the main design surface.
This Posts table has a Date column, but it’s actually a Date/Time column. To use a date table, we need to create a new calculated column with just the date portion. With the Posts table selector selected, we select the Modeling tab, and then “New Column”. We then give the column a name (PostDate) and the following formula based on the Date column:
We also want a calculated measure to indicate the number of posts. The process is like that for a new column. We select “New Measure”, and add the following formula to the formula bar:
Posts = CountA(Posts[id])
We will be relating records in the Posts table to records in the GA table, so we need another date table to keep the relationships clean. We could calculate another table as we did above, but it’s even easier to calculate the new one based on the one already created. We simply select “New Table” and use the following formula:
PostDates = ViewDates
Next, we create the relationship between the Posts table and the PostDates table the same way that we did it for the GA table above. Now that both tables are date sliceable, we need to relate them together. In the Posts table, the Link column uniquely identifies the page but the GA table uses the relative address of the page in the Landing Page column. In our case the solution is simple, we need to prepend the main part of the site address in question (in our case http://whitepages.tygraph.com) to the Landing Page. We do that by creating a new column, URL, with the following formula:
URL = “http://whitepages.tygraph.com” & ‘Google Analytics Views'[Landing Page]
Finally, we relate the URL column in the GA table to the Link column in the Posts table.
At this point the model is ready for use in reports.
Building a Report
How to build a report is not the focus of this article, so I’ll just explain the steps taken here. To prepare our data model, we first need to flag the Link column in the Posts table as a URL field. To do that, select it in the UI, then select the model tab. Use the Data Category Drop down control and select “Web URL”.
Next, add a new table to the reports, and in in the Format section, select Values, and set the “URL icon” setting to “On”.
This has the effect of displaying any column that has been flagged with the Web URL attribute as a link icon with a live hyperlink, instead of the entire, often long URL itself.
Next, we add the Title and Link fields from the Pages table, and the Pageviews field from the GA table, and then sort the table by Pageviews. Next, we add two slicer controls to the report – one bound to the Year column of the PostDates table, and the other bound to the Year column of the ViewDates table. Now by selecting 2016 from the ViewDate slicer, and 2016 from the PostDate slicer, we can see, in order with precise numbers, which posts authored in 2016 were most frequently viewed in 2016. With this, I was now able to give Christian an answer.
An answer today is one thing, but an answer next year is another altogether. This report was worth sharing, so it was worth sprucing up a bit. By taking advantage of some of the new table formatting capabilities in Power BI, and importing the chiclet slicer custom control, we are able to make a more visually appealing report. I will also occasionally use a column chart in a report and use it like a slicer when appropriate. With a little bit of formatting work, we wind up with a report that looks something like the following:
Publishing and Sharing
We’re now ready to publish this report. The easiest approach is to simply select the “Publish” button from Power BI desktop. Select the destination, most likely your personal workspace. When publishing is complete, we can select “View in Power BI” to see the report in the service.
Having the report is one thing, but we want this report to be kept up to date. To do this, we go to the datasets section and select our dataset. In the data source credentials, section, we need to set the credentials for both Google analytics, and our WordPress connection (which will display as “Web”). Even though the Web source is anonymous, we have to configure it that way in the Power BI service. Once the connections are configured they should appear in the Data source credentials section with no notices.
When we configured the WordPress data import above, we used 3 queries. That’s good for 300 posts, and my blog is currently at 238, which should be fine for a while. However, once I hit 300, I’m going to need another query. What I’m really hoping for is that by that time the Power BI service will support parameterized data sources for refresh, but either way I’ll need to modify the data source. I’m likely to forget this need about a week after I publish this post, so a reminder is a good idea. Luckily, Power BI supports data driven alerts, which is exactly what we need here.
Alerts are set on dashboard tiles for card date. Our report has a data card showing the number of total posts. Once that card has been pinned to the dashboard, an alert can be set on it for when it reaches a specific threshold. Simply hover over the dashboard card and click on the ellipsis, then the small bell icon.
In our case, we want to be notified when the number of posts are approaching 300, so we set the condition to be above 297. Once blog post 298 is published, I will receive an email and can then act on it.
Finally, I want to share this report with Christian so that the next time he has questions about my blog, he can just look it up for himself. When I tell him this, I’ll say that it’s so he can keep me honest, but really, I just want him to stop bugging me…
We don’t work at the same company and we use different Azure AD tenant. I could share the dashboard externally with him, but it’s even easier to share it anonymously, and anonymous sharing of this data is fine with me. Anonymous sharing of data is relatively straightforward. From the report interface, select File – Publish to web. A dialog will open asking for confirmation, and once opened will provide a URL that can be shared publicly. In the case of this blog’s report, you can simply click here to get the full report in a dedicated window. I can just email that report to Christian, and he’ll have the answers that he’s looking for. The beauty of anonymous sharing is that you are also given an embed code that can be added to any web page. As an example, the fully interactive report for this blog can be seen below.