Fast data management in a limited CDS world

Fast data management in a limited CDS world

In May 2019 Dynamics 365 CE/CDS enacted some new throttling mechanisms that have caused some headaches for anyone wanting to manage a lot of data in CDS (I will refer to Dynamics 365/CDS as just CDS below). There are several different throttles but the one that has cause me most trouble is the concurrency throttle. Kingswaysoft will release support for handling this in the next release and you can also request a special version from them if you ask nicely. In the meanwhile this post can give you some help on how to work as fast as possible using application user mulitplexing and a loop with a 5 min wait to make sure that the throttles are reset.

The new throttling on the main CDS API, as described here: https://docs.microsoft.com/en-us/dynamics365/customer-engagement/developer/api-limits needs to be carefully considered when doing heavy data manipulations in the CDS. One of my customers has a large system with numerous integrations of which the most data heavy are the Marketing Automation systems and the booking systems. And yes, this is Business to consumer.

With the new per GB pricing, keeping the database as small as possible has become an essential task and using the bulk delete just doesn’t work for large data loads, at the time of writing this article. I do hope that Microsoft increase the speed of it so that it does become more useful but currently its speed is somewhere around 1-2 records per second.

The bulk delete also has limitations on that it can only base it selections on a query, i.e. a FetchXML. Often this is not enough, for instance when you want to remove “All emails except those that have any connection to either a case or a contact which has a case”.

For these reasons I almost always opt for using SSIS with Kingswaysoft connectors to CDS when working with complex data management. This article will be on how to get some performance now that there is tougher throttling to take into consideration.

User multiplexing

As the throttling is measured on a “per user”, one trick is of course to use multiple users and spread the load over all these users. You can, of course use normal users, but that will cost you licenses so the smart person will of course use application users instead. If you don’t know how to create application users in Dynamics 365, check it out here: https://docs.microsoft.com/en-us/dynamics365/customer-engagement/admin/create-users-assign-online-security-roles#create-an-application-user . In the example below, I will be using four different application users, one as the source account and three as destinations. The reason for this is that it is typically easier to read several thousand rows per request, but seldom efficient to do batch creates/writes/deletes of more than 10-20.

To do this with SSIS/Kingswaysoft you should start by setting up the connections. In this case, the four CDS/CRM connections and use the OAuth auth-type like below. 

As you might want to have several packages in the same project and have them share the connections, it may be a good idea to use project connections. I also use an Azure SQL db for logging any errors. Previously I used to use CDS but now with the throttling, that is not such a good idea as the error itself might be throttling and hence the error can cause an error. Writing to some target that you know will not fail is hence a good idea for logging errors. When you are done with the connections, it should look something like this:

Now it is time to build the actual flow. If you’d normally have a Source and a Target, it will now look something like the image below, which I will explain.

First of all, the Premium Derived Column creates a new column which simply contains the row number. It will look something like this: 

I like to use the components that are available in the Productivity pack from Kingswaysoft, and this Premium Derived Column is one of these. In this case I think it is actually equal if you use IncrementalValue() or RowIndex(). I think you can create this logic with a normal Derived Column too, it just has less features.

Next we need to create a Conditional split that divides the rows evenly between the three destination components. This is done using the mathematical operator modulus which is written using the “%”-sign. For those that didn’t study this in school, it simple means “the rest” in a division. For instance 5%3=2, if you divide 5 by 3 you will get 1 and a rest of 2. What we will do, is assign RowNr%3 == 0 to Case 1, RowNr%3 == 1 to Case 2 and the rest to Case 3. That should divide them evenly. It looks like this: 

You then create the three destination components. I typically create one first, copy it and change it, as that is faster. Make sure that you set the Connection Manager to the three different Target Connections.

I also recommend that you fiddle a bit with the batch size and the number of threads and test out which gives the best results for you and the entity and action you are working on. There is no one answer here. I would typically start at Batch 10, Threads 16.

Tuning DataFlow property settings

If you back out to the Control Flow view and right click on the Data Flow you have created, there are some other interesting setting you can twirk.

DefaultBufferMaxRows – 10 000

DefaultBufferSize – 10 485 760 (10MB)

EngineThreads – 10

These can also be tuned to allow for the Data Flow to handle more rows, more memory and use more parallell threads which of course will make it faster (if that is the bottle neck, typically not when working with Dynamics)

What I have found is changing the maxrows to 100k, the buffer size to 100 MB and engine threads to 32 will not hurt but you can find several other blog articles specializing in SSIS that discuss this.

Crude throttle handler

What I have noticed is that many of my Dataflows simple seem to grind to a halt after 400-600k rows read from Dynamics. Not sure if it the read or write part that is causing this but what I figured is that probably the most pragmatic way of solving this would be to create a loop that runs a data flow that is limited in the number of records, typically 400k, wait 5 minutes then iterate. Smartest version is of course to have a control variable which checks to see when when there are no more rows and then breaks the loop, simpler version is to just loop n number of times to cover the amount of data you are trying to move, ie. number of rows per iteration x number of iterations. It would look something like the picture to the left.

If you would like to refine the loop a bit to make it more automatic, create a variable of type Int, for instance RowCount, set the initial value to be 10 or something different from 0. Then set the EvalExpression to “@RowCount > 0”. After this add a RowCounter control to the Data Flow and connect this to the variable RowCount. When the Data Flow runs and returns 0 rows, it will run to the end, the EvalExpression will evaluate to “False” which will cause it to break.

Using this technique, I am able to remove several million records in just a few hours. With one of these jobs I managed to remove 20 GB of structured data in less than two days (no attachments or similar, just records). By adding more application accounts and of course both to the source and particulary to the destination side, you can increase the speeds you are getting.

I do also advise you to be on the lookout for Kingswaysofts new version which I think will come soon, and do as I, make sure to always download both the Dynamics and Productivity Pack. I have read that there are great things coming to the productivity pack!

How to reduce SubscriptionTrackingDeletedObject table in CDS/Dynamics 365 – Updated!

How to reduce SubscriptionTrackingDeletedObject table in CDS/Dynamics 365 – Updated!

One of my customers is a B2C customer with a very large online database exceeding 500 GB. With a very active Marketing automation tool interated, we generate a lot of data in Dynamics 365 CE which after defined retention periods needs to be removed. This has caused some side effects, that a table called SubscriptionTrackingDeletedObject has become very large. This article will describe how to set a configuration to reduce its size. UPDATED – Based on some new learning and information from Microsoft this article has now been updated!

We often monitor the Organizational Insights, and now lately the brand new capacity feature that can be found in the left hand menu in https://admin.powerplatform.microsoft.com, if you have a CDS/Dynamics 365 CE instance.

An interesting table started growing rapidly and we had no clue what this was, and I had during my now 15 years of working with Dynamics 365 never seen it. It was called SubscriptionTrackingDeletedObject. When I came back from my Swedish summer vaccation, it had grow to over an amazing 181M records. Time to fix this.

First thing, as usual is of course to google it (yes, it is a verb, get used to it). All I found was this somewhat informative post by my good friend Chris Cognetta who is an ace with infrastructure issues.

http://cognettacloud.net/2016/06/21/crm-database-log-growth-issue/ 

However, it seemed that they just truncated the table, and we were online so that was a bit tricky, to say the least. I was at this time a bit upset that Microsoft were taking up around 50GB of space for my customer without giving me any way of managing that, or having any direct use of it. I counted to ten and called Microsoft Support.

After a few emails back and forth, the excellent support technician at Microsoft informed me that there is actually a setting in the infamous super secret setting tool with the Star Trek-sounding name OrgDBOrg (it is pronounced “Org-D-Borg” in case you ever get stuck in Dynamics trivia). The setting is called ExpireSubscriptionsInDays. I will quote the support technician in what this table is used for and if anyone has any more information, please leave a comment.

“The SubscriptionTrackingDeletedObject table is the table that logs records for number of days before deleting inactive subscriptions as well as timed out deletion services.”

I am not sure for which purpose. If it is in regards to GDPR or some restore mechanism. I would like to know though. Default value for this i 90, which means that these logs will be stored for 90 days. The minimum they can be set to is 1. As I am currently not entirely sure what these logs are used for, I would not recommend you set them to 1, but I did set my customers to 5, hoping that this is not going to come back with a vengance.

We have during the day seen a dramatic drop in the amount of records in this table, with about 30M but and it is still ongoing, but hard to measure as there is a delay in the capacity measurement the Powerplatform admin portal.

Update! The size of the table fell to 160M rows but never below this so after some further discussions with Microsoft support they did some more investigation into this subject and came back with the following recommendation:

1. Reduce the value gradually from 90 to 60 and then on
2. Never go below 15

There is however, another related setting called ExpireChangeTrackingInDays which is located just next to the ExpireSubscriptionsInDays. This is defaulted to 30. We reduced this to 15. 

Based on these recommendations we tried 60 days and this resulted in a most dramatic drop to around 20M rows. – End of update

 

So, how do you do this? First, download the OrgDbOrgSettings tool and install it in the instance where you are having issues. Check out these links below for that:

https://github.com/seanmcne/OrgDbOrgSettings/blob/master/readme.md

https://github.com/seanmcne/OrgDbOrgSettings/releases

A word of advice regarding OrgDbOrg; don’t think that you are Captain Kirk and go flying off into the Beta Quadrant and beam every single setting just because you can. It won’t make your system better, rather the opposite. Make really sure on what you are doing and don’t even trust a blog article like this, read the KB-article linked in the tool and make up your own mind. It is a powerful tool, like a jackhammer.

After you have installed the OrgDbOrgSettings tool, you can see it and open it by clicking the display name.

Then just find the “ExpireSubscriptionsInDays” – Press “Edit” and change to whatever you would like it to be. You will typically have to confirm to save it to Dynamics 365 CE/CDS

With that done you should just have to wait for the magic to be done.

As far as I have understood these two settings and this table is used to indicate how long changes and deletes are stored in this table and related tables for integrating systems to be able to read. This can, for example be Data Export Service, the old Dynamics for Outlook client etc. Hence reducing the numbers to, for instance 15 (the lowest recommended number by Microsoft) can result in some changes not being propageted to these integrated systems in the case that the integrations or just an offline client being offline for more than 15 days. And I also got the feeling that setting it to 5 was below some internal threashold and hence wasn’t really supported despite the fact that it says in OrgDBOrg that the lowest value is 1.

Anonymizing data in UAT/Dev – GDPR

Anonymizing data in UAT/Dev – GDPR

On the eve of GDPR what could be more fitting than a post on GDPR. I think everyone is probably deadly tired of all the consent emails and I think that they will probably even have reached our friends in the US and Asia by now.

This article relates to legal matters on GDPR and are based on my personal interpretations and are not to be viewed as legal advice.

One interesting thing that has to be considered in relation to GDPR is how to handle personal information in non-production environments/instances. Microsoft have made it painfully easy with the instance manager to be able to copy the production instance but do you really have the right to use your customers personal information in a development, UAT or staging environment? Do you have legal support for that? I think that would be a very hard argument to make? Have you gotten your customers explicit consent for using their personal data for that purpose? Probably not. Hence, if you are planning on keeping the instance/environment for more than 30 days, you will need to remove all personal information from to stay within the boundaries of GDPR. I have found that using SSIS with Kingswaysoft and the Anonymization component in the productivity toolkit is very useful. I will in this rather lenghty article describe how I have used it to set up an anoymization script.

First of all you need to download SSDT and Kingswaysoft Dynamics 365 and Productivity Packs

Then start a new Business Intelligence -> Integrations Services project.

Then start by right clicking in the empty field at the bottom where it says “Connection managers” and choose “New connection…”

In the dialog that shows up choose “DynamicsCRM”

You should now see a dialog showing the connection settings to Dynamics.

Connection settings for your instance

Choose the right settings for your instance. Test your connection at the bottom when you are done to make sure it works. Make extra sure you are not connecting to your production environment, wouldn’t want to anonymize that!

When this is done, it is time to make your first Data Flow Task. Work in SSIS is divided into two parts, Control Flow and Data Flow. The control flow is the orchestration, which tells SSIS in which order everything is to be run. If you want thing to run in parallel, just have to boxes next to each other, if you want one Data Flow to run before the other, drag the arrow from the first to the second. It is also possible to have entire “Sequence containers” which can hold several components and make sure they execute before moving to the next stage.

Let’s start by dragging one new Data Flow from the Toolbox on the left hand side to the Control Flow work pane. Then double click it. This will open it up.

You will now see that the tab at the top has changed to “Data Flow” from the previous “Control Flow”. In the “Data Flow” view you will also have a different set of toolbox components available.

In the “Data Flow” you will control a single data flow. For instance the anonymization of Contact.

Start by dragging the Dynamics CRM Source from the toolbox (on the left) to the workspace (the big pane in the middle. Then double click it. – Before looking at the details of the source component, I like using FetchXML when building queries and of course the best way to build queries with FetchXML is using FetchXMLBuilder in XrmToolBox (thanks Jonas Rapp and Tanguy Touzard for all your work!) but if that is too much heavy lifting (it really isn’t), the easiest way to get a FetchXML query is to make an advance find query and export it with the “Download FetchXML” button in the top right hand side of the ribbon of the Advanced Find query builder. So let’s say we have decided the following fields in Contact are personal information and need to be anonymized:

The column editor in advaced find – don’t use composite fields like “fullname” or Address1

Downloading the FetchXml, and setting the Source component in SSDT (Visual Studio) will make it look like this:

I have set the “Connection Manager” to “Target” as we are using the same source and target (reading and writing to the same system.

I am leaving batch size as 2000. Seems to work well. Don’t reduce it too much, remember the API limit of 60 000 calls per 5 min period.

If you would like to try it a bit, you can set the “Max rows returned” to for instance “10” and then try it out a bit to see that it isn’t going crazy.

Source type I like as FetchXML – remember that you can have FetchXMLs with data from several entities which can make queries a lot easier than trying to match the data with lookups in SSIS.

I always try to read all data in UTC and write it in UTC which in most cases makes it correct. But make sure you understand how timezones work if you need to fiddle with this.

Also, don’t include more columns than you need. It will just make your script slow. After adding the FetchXML or changing it, it will try to parse it and read the meta data from Dyn365/CRM. Hence there might be a slight delay. You can check what data you will output from this component by clicking on “Columns” on the left hand side.

When done, press “OK” to go back to the “Data Flow” pane.

Now add a “Data anonymization” component and drag the arrow from the Source component to the Anonymizer. Then double click the anoymizer.

You should see something like this, where I have set anonymization settings for the different columns. By default it will say “Ignore” on all columns.

Try out the different anonymization types. Some are more generic than others. When done, click Ok and go back the data flow pane. Add a Dynamics CRM Destination and drag the blue arrow from the anonymizer (blue arrows are the normal data output, red arrows are error output) to the Dynamics CRM Destination component. Then double click it. The view in the data flow should look something like the picture below.

Dyn365 Soruce -> anoymizer -> writing to Dyn365
Dyn365 destination component – set values where the arrows are

When setting up the destination component, there are a few things to consider:

  • In this case we are always doing updates – hence set the action to update. It is faster than upsert.
  • You have to set the Destination Entity.
  • If you write data to Dyn365 with high latency, no batching and no threads you will be able to update at about 2-3 records per second. With low latency, correct batch setting and multi threading, I have been able to get up to 300 records per second. Very dependant on entity. Hold the mouse of the blue “i” just after the “Enable Multithread Writing” for some deep end tips from the scholars at Kingswaysoft.
  • Error handling is recommended to be directed at a file or some other output where you can monitor it. If you do nothing about it, and you get an error, it will break the flow and stop. You can control error handling of the destination component by clicking on the “Error handling” tab on the left hand side. Remember that all types of exceptions thrown by Dyn365 you will get here as well, like missing rights, disabled records etc.
A normal problem that needs to be handled is that records are deactivated and deactivated records cannot be changed, they have to be opened first, then changed, then re-closed.
This is an example of a dataflow handling this:
Data flow which splits the deactivated records to the left, adds two special columns to reactivate them, writes an update with only the statecode & statusreason to the record and then merges the two data streams and writes the original values, which recloses the ones that were closed from the beginning

This is how the conditional split is defined – if statecode is 0, send them to the output called “open” otherwise send them to the output called closed

Then you can test run your data flow, by right clicking the data flow pane, and pressing “Execute Task”

And when you have assemble an larger control flow – like for instance this:

A control flow with several dataflow being disabled – all in sequence.

you can execute the entire flow by clicking the green plus sign in the ribbon.

Gustaf Westerlund
MVP, Founder and Principal Consultant at CRM-konsulterna AB
www.crmkonsulterna.se

Deleting a lot of records fast

Deleting a lot of records fast

A quick one today…

Needed to delete a couple of million records for a customer and the natural thing was to use the Bulk Deletion service, well, I turned it on and it was extremely slow. Only got about 10 records/s which would cause the entire delete to take over a week. I have checked with Microsoft and this is not a bug, but it is working as designed and is not designed to be super fast. According to Microsoft bulk deletion jobs are put on the async queue on low priority to allow other more important jobs higher prio.

And a favorite quote of mine from Purvin Patel of Microsoft Does a dump truck need to outrace a Ferrari?” – and I think that the answer to that question is: it depends. Sometimes it does.  

Personally I would sometimes like it to be as fast as possible when removing a lot of records.

I also checked to see how fast the deletion would be with SSIS and Kingswaysoft. Used the following settings:

  • VM about 5 ms from the Dynamics 365 instance (important that it not be too far, use an Azure VM for this)
  • Used 64 threads
  • Used Execute Multiple batching with 10 (cannot use more that 10 if you are using a lot of threads, ie more than 2)
  • VM has 8 virtual cores and 32 GB memory
  • Loading in batches of 2000. Only loading the id-column, as that is all that is needed.

With this setup, I got somewhere around 345 records deleted per second. Which is a tad more than 34x faster than the bulk delete.

So, want to delete a lot of stuff, maybe Bulk Delete is not the way to go. Not yet anyway, let’s hope Microsoft makes it faster!

(this post was updated on Feb the 9:th 2018)

Gustaf Westerlund
MVP, Founder and Principal Consultant at CRM-konsulterna AB
www.crmkonsulterna.se

CRM 2011 Online and datawarehousing

I attended a half day presentation by Microsoft yesterday on the subject of Dynamics CRM 2011 and integrations. It seems that many parterners are a bit hesistant to integrating Dynamics CRM 2011 Online and are promoting the on-premise versions when the issues of integration are important.

Microsoft did a good job in showing that they do have sufficient technology to handle integrations to Dynamics CRM 2011 Online including the Azure based technology ACS – Access Control System that together with Active Directory Federation Services 2.0 can be used to allow Dynamics CRM 2011 Online to use a local AD as authenticator. It is probably not as easy as setting up a normal on-premise solution but at least it can be done and there hopefully are some whitepapers or Youtube-clips on how it is done.

One of the issues that were discussed was the issue of integrations in reagards to Business Intelligence and data warehousing. This is usually done using SQL Integration Service (SSIS) with direct SQL communication. Their suggestion on how to address this issue was to use the new OData interface that exists in Dynamics CRM 2011. I am no SQL expert but I do believe that this being a standardized protocol, integrations will be possible. However, there will still be issues with performance as the amounts of data that need to be transfered are quite large and the OData protocol is still a strict pull-protocol which does not allow for trigger-based updates.

Another issue that I asked about was the licensing issue in regards to Dynamics CRM 2011 Online. Take the following example: A large company of about 40 000 employees has about 500 people activly working in with Dynamics CRM 2011 Online for SalesForceAutomation. A data warehouse is created and data is integrated from many different systems, ERP, production systems, quality control systems and CRM. CRM being the customer data master. SharePoint is used as the global Intranet platform and some of the data from the data warehouse is published on the SharePoint portal to all employees. For instance our currently 10 most important customers. So, the 39 500 employees are only viewing a minute part of the customer data, and it indirecty, via the data warehouse, originates from the CRM system. What licenses in CRM are required for these users?

In the case of an on-premise installation, the best licensing option for Dynamics CRM is the Application Platform Agreement (APA) that is sort of a “free-for-all pass” which is negotiated with Microsoft. However, in the Online environment there is no correlating licensing agreement to the APA. I explicitly asked Micrsoft how this was to be licensed and their answer is that 40 000 separate users licenses are required for Dynamics CRM 2011 Online, making it a rather impossible option, in other words forcing the company to either an on-premise solution, removing some of the data from the data warehouse/Intranet or moving to another CRM supplier. Neither of which is in line with showing off the power of Microsofts cloud services.

I hope this is just a temporary flaw since it does limit Microsofts business opportunities with larger companies and I would think it is probable that some similar agreement form that matches the on-premise APA will be introduced.

On the other hand, the External Connector license is not required at all for CRM Online (or SPLA) making it even more interesting for smaller CRM customers as customer/event/portal integrations are more and more common.

Gustaf Westerlund
CEO, Chief Architect and co-Founder at CRM-konsulterna AB

www.crmkonsulterna.se