Skip to content

Tag: Nintex

Archiving Your SharePoint Workflow History Lists

If you’ve worked at all with SharePoint declarative workflows (the ones that you use InfoPath to create), or others, such as Nintex workflows that are based on them, you are undoubtedly aware of their ability to log items to the history list. These items  are those that appear in the Workflow History section of the workflow status page.

image

What may be less commonly known is how this works. This history list is really just a view of items that are contained in a hidden list on the site, and whenever an item is logged to the list, it gets created in the history list. For regular SharePoint workflows, this list is located at http://yoursiteurl/Lists/Workflow History, and for For Nintex workflows, you can find it at http://yoursiteurl/Lists/NintexWorkflowHistory. So, why does this matter? Well, if you need to audit what is has happened with your workflows, this is where the information is contained.

There is, however a catch. By default, SharePoint will run a Timer job named “Workflow Auto Cleanup” daily that will remove all of the the tasks associated with a workflow, and all of the history links for workflows that are over 60 days old. This is done for performance reasons. Well, unless your audit requirements only go back two months, this isn’t going to work for you.

Try doing a search for “Workflow History” and you’ll see that this has caused a number of issues (especially for those that have found it out after the fact. The good news for those people is that the workflow history list isn’t actually purged (which is also bad news, as we’ll see shortly), and those links can be recreated through reporting. However, the most common guidance found on this topic is to simply disable the automatic cleanup job, as outlined in this very poorly named Technet article.

The problem with disabling the job is that performance will suffer, potentially badly. Assume that we have an approval workflow that runs on a list that will receive 2500 approvals annually. This is a reasonably  sized list (for SharePoint 2010). Now lets also assume that during the life of the workflow, 10 items get logged to the history list. This means that in a given year, 25,000 items are being logged to the history list, which is beyond the default list view threshold of administrators, and would be considered a very large list.

What is needed is a way to balance the auditing requirements with the list size constraints of SharePoint. 25,000 items may be a large SharePoint list, but it’s trivial to a relational database like SQL. What the remainder of this article will do is to discuss how to use Microsoft’s Business Intelligence tools to extract workflow history data into a data warehouse, and then safely purge it from the workflow history list. This will be a lengthy one.

Step 1 – Extract And Load

In my opinion, one of the most underutilized tools in Microsoft’s arsenal is SQL Server Integration Services (SSIS). Almost every SharePoint site has it, and very few know about it. It is Microsoft’s ETL (Extract, Transform, and Load) tool, and it is used for taking data from source systems, performing operations on it, and loading it into a destination system, which is typically a data warehouse in the form of one or more SQL databases. This is precisely what we need to do with our Workflow History data. You can read more about SSIS here.

The problem however is that SSIS does not support SharePoint list data as a data source. Yes, ultimately all SharePoint data is stored in SQL content databases, but we all know that we’re supposed to stay out of there. SharePoint data should only be accessed via UI constructs, the SharePoint APIs, or the SharePoint web services.

Happily, a Codeplex project was created several years ago that adds both source and destination SSIS adapters for SharePoint list data, and yes, it works well with SharePoint 2007 data. What this project does is to encapsulate calls to the SharePoint web services into SSIS data adapters. Because it uses the SharePoint web services (not the API), there is no requirement for the SharePoint bits to be installed where it is being run.

You can find an excellent tutorial on how to set this up here, so I’m not going to go into any details on that. I will however cover the basic steps below. First, however I want to outline the logic involved.

What we want to be able to do is to maintain a complete log of workflow history. We also want to be able to keep the history in SharePoint for a period of time (60 days by default), and then be able to purge it, knowing that it’s secure in the data warehouse. Therefore, we need to take an initial dump of the data, and then be able to add only new items to it. The design of the data warehouse will also support multiple site history lists.

The solution will consist of 2 tables in a SQL data warehouse (a staging table and the actual archive table). The SSIS package will perform the following steps:

  1. Empty the staging table
  2. Extract the entire Workflow History List (SP) into the staging table (SQL)
  3. Query the archive table for the most recent entry
  4. Extract all items from the staging table more recent than the entry in step 3 into the archive table, and add in a site identifier.

First, open up Business Intelligence Development Studio (BIDS). BIDS is really just Visual Studio with all of the SQL BI project types added, and is normally installed when SQL is installed. If not, you can install it from the SQL media. You do not need SQL server installed to use it, but it does have some advantages.

From the Business Intelligence Projects section, select “Integration Services Project”, and give it a solution and project name. You’ll then be presented with the SSIS design canvas. The first thing that you’ll want to do is to create two connection managers – one for SharePoint, and one for SQL. In the Connection Managers pane right click anywhere in the window and select “New Connection”

image

Scroll down, and select SPCRED – Connection manager for SharePoint connections, give it a name, and select the credentials. If you use the credentials of the executing process, it will use your credentials when you test it, but the credentials of the SQL Server Agent process if you schedule it to run automatically. Alternatively, you can enter the credentials of a proxy account, which is what I typically do. Repeat this process, only this time select OLEDB and configure the connection to your SQL Data Warehouse database (if you haven’t already done so, you’ll need to create a SQL database to house the archive).

Next, from the Toolbox, drag a Data Flow Task onto the Design surface. Your surface should look something like below:

image

Double click on the Data Flow task, and the Data Flow window will open (you can also click on the Data Flow Tab). Here, from the toolbox, drag a SharePoint List Source, and an OLE DB Destination task onto the surface. Double click on the SharePoint List source, then click in the area to the right of the SharePoint Credential Connection, and set the Connection manager to the manager that you created above.

image

Next, click on the Component properties tab, and enter valid values for the SharePoint source site URL, and the list name. The List name will either be WorkflowHistory for standard SharePoint workflows, or NintexWorkflowHistory, for Nintex workflows.

image

Click OK. Next, grab the green arrow at the bottom of the SharePoint List source, and connect it to the OLE DB Destination. Double click on the OLE DB Destination, and select the New button beside the Name of the table field. What this allows us to do is to create our Temp Table in the Data Warehouse with the appropriate schema for our Workflow History List. Once the create table widow is open, simple change the name of the table to what you want (in this case wfhStaging).

image

As soon as you click OK, the table is created in SQL. Next, click the Mappings tab on the left, and confirm that all of the fields are mapped correctly from the SharePoint list, to the SQL table. No changes should be required. When complete, click OK, and the data flow is ready for testing. From the BIDS debug menu, select Start debugging. After a pause, the process will run, and the boxes will turn  yellow and green as the process executes. If all works properly, you will see something like the screen below:

image

Both boxes green indicate that the process completed successfully, and there will be an indicator showing the number of rows that were transferred. You can confirm this by opening up SQL Server Management Studio, selecting your Data Warehouse database and running the following query:

SELECT Count(id) From wfhStaging

At this point, we need to stop our debugging process and switch back to the Control Flow tab. Given that we want to repopulate the staging table whenever we run this package, we need to first clear it at the beginning of the run. Drag an “Execute SQL Task” from the toolbox onto the design surface above or to the left of the data flow task. Us its arrow to connect it to the Data Flow task, and then double click on it. Select your OLE DB connection as its connection property and enter the following SQL (substituting your table name) as its SQL Statement:

TRUNCATE TABLE wfhStaging

image

Next, we will need to create and populate our actual archive table. To do this, drag another Data Flow task onto the design surface. Connect the output from the first data flow task to it and then double click on it. Drag an OLEDB Source, a derived column, and an OLEDB destination onto the design surface.

We want to be able to store the workflow history for multiple sites in the same data warehouse table. To do this, we need to add another identifier column to the schema of the workflow history list that will uniquely identify the source site. In our case we will use the relative site URL. The derived column action will add this column to each row as it is processed.

Configure the OLEDB Source to read from the Staging table. Then, connect the OLEDB Source to the Derived Column action with the green arrow. Double click on the derived column action. Under the Derived Column Name, enter the name of the new column. Leave the Derived Column action as “add as new column”, and for the expression we will simply use a literal string for the site relative site URL. When complete, the action should appear as below.

image

Click OK to close the dialog, and then connect the derived column action to the OLEDB Destination action with the green arrow. Double click the OLEDB Destination action and repeat the steps taken above to create the staging table, only this time, you’ll create the actual archive table. This time, when you click on the Mappings tab, note that the SiteURL column has been added at the bottom. Don’t run debug at this point, as it will run the entire package. Click back on the Control Flow tab,  right click on the new Data Flow action, and select Execute Task. Just that task will run, and if you move back to the Data Flow tab, you should see that the same number of rows have been added to the archive table.

Now we need to ensure that only the new columns from the staging table are moved into the archive table. To do this, we will change the select statement in the OLEDB source of the second data flow task. Firstly, we’ll need to know what the date/time latest record in the archive table for this site. The SQL statement for this looks like

SELECT MAX(Modified) From wfhArchive 
Where SiteURL='/SalespersonChangeRequest'

So therefore, we can embed that statement into the select statement for our staging table. However, we still need to accommodate the case where there are no records in the archive table, where the above statement returns a NULL value. To deal with this, we can use the ISNULL TSQL function, and our complete staging table select statement will be

SELECT * FROM wfhStaging 
Where Modified > 
  ISNULL(
      (SELECT MAX(Modified) From wfhArchive 
       Where SiteURL='/SalespersonChangeRequest'
       ),
       '1900-01-01')

Translated into English, this basically says “Find the value for modified of the most recent record of any items with SiteURL set to /SalesPersonChangeRequest. If you don’t find any, set it to 1900-01-01. Then, get me everything from the staging table with a modified date more recent.”

Now that we have our SQL, we need to modify our OLEDB Source action. Double click on it, and then change the Data access mode from “Table or view” to “SQL command”. Then, add the select statement to the SQL command text window. At completion, the window should appear as follows:

image

Once done click back to the Control Flow tab, and then start Debugging (you can also just press the F5 key to start debug). The first Data Flow task should write all of the source records to the table, and the second should write none (assuming nothing has happened to the source since you did the initial extract. You can try deleting some of the records from the archive table, and rerunning the package – they should get replaced. That was step 1.

Step 2 – Schedule the package

Now that we have our package, we want it to run periodically (usually nightly). We do this by deploying the package to the server, and then scheduling to run with the SQL job agent.

To deploy the package, we need to first create a deployment utility for it. To do this, we must first select the Project in the Solution Explorer pane, and then select Project-Properties from the menu. The Configuration Properties window is then opened. In the left pane, select the Deployment Utility tab, and ensure that the CreateDeploymentUtility is set to True.

image

Also – take note of the DeploymentOutputPath value.

run the deployment utility for it. The deployment utility is  stored in a subfolder of the package project. You can find the folder of the project by selecting the project in the Solution Explorer pane, and then examining its FullPath  property in the Properties pane. Open the project path in Windows Explorer, and then navigate to the DeploymentOutputPath as noted above. In that folder, you’ll find a file named yourprojectname.SSISDeploymentManifest. When ready to deploy, double click on it, and the Deployment wizard will start.

The deployment wizard is straightforward and self explanatory. You’ll want to select “SQL Server deployment” on the first screen, then the SQL server that you wish to deploy to (usually (local) ), and select a location for the Package Path (the root is likely fine). Once the wizard is complete, you are ready to schedule the package.

Open Up SQL Server Management Studio, and connect the destination server. If the SQL Server Agent service is not running, start it (you’ll want to make sure that it is started automatically). Expand the Agent node, and then expand the Jobs node. Right click on Jobs, and select New Job.

image

Give the Job a name, then click on the Steps tab. Click the New button to create a new step, and give it a name. In the Type dropdown, select “SQL Server Integration Services Package”. In the General section, select the SQL server that holds the package, and then use the ellipsis in the Package field to select the package you deployed above.

image

Next, click the Schedules tab, click the new button, give the schedule a name, and select when you want the job to run. Save the schedule, and then save the job (click OK). Your job should now appear in the Jobs folder. To test it, right click on it and select “Start Job at step”. The job will run and you will see its progress in a dialog.

There are many options for scheduling SSIS jobs, and for error handling, and I would strongly recommend investigating them.

Step 3 – Purge the Workflow History Data

As mentioned above, the workflow cleanup job removes workflow history associations, but does not actually delete the items from the list, allowing that list to grow large. If you use Nintex, there’s a Nintex command that will take care of this for you:

NWAdmin.exe –o PurgeHistoryListData  -siteUrl urlToSite 
 [-workflowName workflowName] [-days daysSinceLastActivity] 
 [-lastActivityBefore datetime DateFormat)] [-state All|Running|Completed|Cancelled|Error]
 [-deletedLists] [-clearAll [-workflowItemId id -workflowListName "list name"]] [-verbose]
 [-reportOnly] [-batchSize numberDefaultIs500]
 [-pauseAfterBatch] [-maxItemsToDelete number]

This command is run on a front end server. To keep things up to date it would need to be scheduled. However, if you’re using out of the box workflows, there is no equivalent command. You could just access the history list and remove old data, but since SharePoint has built in tools for this, I recommend using them. These features are contained in the Information management policy settings of any list.

Open the Workflow history list (your site url + Lists/WorkflowHistory or NintexWorkflowHistory). Open the List settings page, and select “Information management policy settings” in the Permissions and Management section. If you don’t see this option, you may need to enable the relevant features.In the Content type policies, select the Workflow History content type, and then select “Enable Retention”. Once enabled, you will be able to select “Add a retention stage”.

The retention stage is what we will use to delete the workflow history items (which, given the name, is somewhat counter-intuitive, don’t you think?). Date occurred is when the event occurred, so it is likely our best time indicator, and I would suggest a period of time at least double to the automatic cleanup task, which is 60 days. Finally, we want the item to be deleted at this point, so we select “Permanently Delete” from the Action dropdown. When complete, the stage will appear as follows:

SNAGHTML3911db5c

Once we save our policy, the expired items will be deleted the next time the timer jobs run.

And that’s all there is to it!

Now that we’ve taken the data out of SharePoint, it’s no longer obviously available to end users. If this is important, we will need to build some Reporting Services reports, and integrate them back into the appropriate locations in SharePoint. This will (hopefully) be the subject of an upcoming post.

9 Comments

Using Nintex Workflow and SharePoint to Create InfoPath Forms on a Schedule

A common customer requirement that we have come across involves periodic reporting requirements. Basically a number of departments, or individuals are required to fill out a form on a periodic basis, and that form must be run through an approval workflow of some sort. This scenario can easily be accommodated with Nintex Workflow and SharePoint.

The solution will require at least two lists:

Reporters – this list will contain all of the entities that are required to do the reporting, along with parameters that will drive the destination approval workflow. We will use a custom SharePoint list for this purpose.

Reports – this will be a forms library that will contain the submitted forms, along with the approval workflow itself. Technically, this does not need to be a forms library with InfoPath 2010, but if the form is to contain embedded sections, it will need to be.

In summary, a workflow will be scheduled to run on all of the items in the Reporters list on a periodic basis. This workflow will create new items in the Reports library, and set a number of their values. The creation  of these forms will initiate another workflow that will notify the reporter that the report is due, allow them to enter form data, and then move the form through an approval process. This walkthrough will cover only the basics, but all of the constituent items can be elaborated upon.

Step 1 – Create and publish a form

I’m going to assume that you know how to work with InfoPath forms in SharePoint already, but if not, there is a good walkthrough on this topic here. There are a few differences that you need to take into account when you want workflows (or data views) to be able to populate form data, so I’ll outline the steps required here. We’ll be working with the following simple form:

image

Not only will we want to promote the two highlighted fields into the SharePoint library, but we’ll want our automated workflow to fill them out for us ahead of time. The expense detail fields are repeating fields, and they can’t be promoted, except in aggregate. To do this, we’ll first need to publish our form to our SharePoint library. We’ll want to add or change our promoted fields, so we can’t use Quick Publish at this point. From the File menu, click Publish, and select SharePoint Server.

image

Follow the prompts for publishing, but stop at the field promotion screen. click add to add in a field to be promoted.

image

From here, you can select the fields that you want to promote. The critical part is that you must select the “Allow users to edit data in this field by using a datasheet or properties page” option. This allows direct editing of form data from SharePoint, and consequently, workflow.

image

Not all InfoPath field types support this capability, so you will need to work within these limits.

Complete the publishing wizard and test the form. Once you are happy with the result, it’s time to move to step 2.

Step 2 – Modify the form template to allow automated creation

Workflow processes, including Nintex workflows can create list items, documents, etc. This is also true of InfoPath forms. However, due to the way that an InfoPath form works, it’s not as simple as creating a new item using the built in template. With InfoPath, the standard template that gets published by the Designer is an XSN file, whereas the finished product is an XML file. Basically, we need to create a finished product, and use it as the template instead of the Designer generated XSN. We will want to keep the XSN around for future modifications though.

This step is also only pertinent if you’re using a forms library. If you’re just using InfoPath as a form for a SharePoint list, you can skip this step.

First, create a new item using your form. Immediately save, or submit your form, depending on how you’ve built it. In my case, I save it, and call it “ExpenseReportBlank”. Once the form is closed, you will see a new entry in the library. Now, you want to download it to a local file system. To do so, hover over the form, select the dropdown, and from the “Send To” item, select download.

image

The next step is to make this file available to the library as a template. To do that, we need to access the “Forms” folder for the library. The easiest way to do that is by opening the library in the Explorer view, navigating to the forms library, and copying the file to it.

image

Once it’s there, we can set it as the template for the library. To do so, click on Library settings, in the ribbon, and then Advanced settings. From there, you edit the Template URL to point to your new template. Once you do that, you need to click OK, and then come back into Advanced settings. Doing so “registers” the template. Once that’s done, you need to set the “Allow management of content types” option to yes, otherwise the form will not open in the browser.

image

Click OK, and return to the library. Test out your new template to make sure that all is well. If you’re happy with your form, we’re ready for the automation.

Step 3 – Add the workflow specific columns

At this point, we want to add in all of the columns that will drive our workflow. In a simple approval, you might want to use submitter, approver, and status. The submitter and approver fields will be “Person or Group” fields, and status will be a choice field consisting of submitted, approved, and rejected as possible choices.

The final list of columns in our example can be seen below:

image

Of course, the exact columns that you will need will vary with your workflow. We won’t be developing the actual approval workflow here today, but these are typically what you’d want at a minimum, and what we’ll be populating automatically.

Step 4 – Create the “Reporters” List

For the next step, we will create the list that contains the workflow to create the scheduled reports using the custom list template. The list will contain fields for submitter and approver. The title field will be renamed to employee in order to get the full display name of the employee as opposed to the account name (there are other ways to accomplish this, but this is simple for our purposes).

image

Next, create at least one entry in the list. We’ll use this for testing.

Step 5 – Build the Form Creation Workflow

The workflow to create the for is relatively straightforward. It needs to accomplish 3 things in our example.

  1. Get the current date to be used for our Week Ending field
  2. Build a form title that will be unique (Concatenate Employee and Date)
  3. Create a new Report in the reports library

The complete workflow appears below:

image

The first three actions are just variable assignments and text manipulation, but the last action is the one that creates the report and is worth opening:

image

Once the workflow is built, go ahead and run it. If everything is working, you will see a new report in the Expense Reports library.

If you have an approval workflow defined to start when items are created in the destination library, it will be running. The last thing we need to do now is to schedule the workflows.

Step 6 – Schedule the Workflows

Nintex supports the scheduling of workflows. Essentially, you pick a time for workflows to start on individual list items. This is done on an item by item basis, and needs to be performed for each item in the source list.

From the “Reporters” list, hover over one of the configuration items, click the dropdown and select “Schedule Workflows”.

image

You will presented with a screen showing you the scheduled workflows for that item (likely empty). Click the “Add Schedule” button, fill out the schedule form, and click save.

image

In this case, our workflow “Create New Report” is scheduled to run every Saturday at 2PM indefinitely. This will create a new weekly report for each employee that is configured, and if configured will kick off a review and approval cycle.

4 Comments

How to Replace the Built in SharePoint Approval Workflow

We use Nintex Workflow all of the time. It compresses project times, and reduces, if not eliminates the need to develop custom code. One of the problems that we run into though is that the Approval workflow that comes out of the box appears to be hard wired for the SharePoint approval process.

As it turns out, it doesn’t need to be. Christian from iLoveSharePoint has a great solution that lets us replace the built in workflow with a Nintex workflow. It would also work for workflows built with SharePoint Designer. As opposed to rewriting it, I’ll just say that you can link to it here:

http://cglessner.blogspot.com/2009/01/publish-major-version-declarative.html

I will add two things to it. Christian has some sample code that if you just copy and paste won’t work due to wrapping, etc. Here’s what the code should look like:

#workflow must be associated with the target list
$workflowFileName = "Nintex Test.xoml"
$targetListTitle = "MyListName"

function ItemUpdated()
{
    if($item.DisplayName -eq "$($workflowFileName).wfconfig")
    {
        $binFile = $item.File.OpenBinary()
        $wfConfig = [xml][System.Text.Encoding]::UTF8.GetString($binFile)
        $baseId = [Guid]$wfConfig.WorkflowConfig.Template.BaseID

        $library = $web.Lists[$targetListTitle]
        $assoc = $library.WorkflowAssociations.GetAssociationByBaseId($baseId)
        $library.DefaultApprovalWorkflowId = $assoc.Id
        $library.Update()
    }
}

He also mentions one way to get the GUID of the Workflows list. You can also get it by navigating to it with SharePoint Designer, drilling down into your declarative workflow, and opening the workflowname.xoml.wfconfig.xml file, and looking for the DocLibID tag.

1 Comment

I’ll Be At the Microsoft World Partner Conference Next Week

The Microsoft World Partner Conference (WPC) is being held in Washington next week. It’s an different sort of conference, and I almost elected not to go this year. In the past I’ve found the sessions to be a little light (but then, I am a techie, and this is a business conference) but this year it’s a solid line up.

I’m looking forward to meeting a few of the Product Managers, particularly around the Azure and BPOS space. My spider sense is telling me that very interesting things are about to happen here, and my company is very well positioned to take advantage of the cloud, and the new emphasis on it from Microsoft.

It will also be very good to catch up with our friends from Nintex, who are not only fun, but are always working on interesting things.

If you want to follow the goings on, you can follow the #wpc10 hash tag on Twitter. I’ll probably be more active than normal on twitter, I can be found at @diverdown1964, and I may even try my hand at liveblogging,but don’t count on it.

If you’re going to be there,and you run into one of the many John White there, I’ll be the tall one. Please say hi!

Leave a Comment