On the eve of GDPR what could be more fitting than a post on GDPR. I think everyone is probably deadly tired of all the consent emails and I think that they will probably even have reached our friends in the US and Asia by now.
This article relates to legal matters on GDPR and are based on my personal interpretations and are not to be viewed as legal advice.
One interesting thing that has to be considered in relation to GDPR is how to handle personal information in non-production environments/instances. Microsoft have made it painfully easy with the instance manager to be able to copy the production instance but do you really have the right to use your customers personal information in a development, UAT or staging environment? Do you have legal support for that? I think that would be a very hard argument to make? Have you gotten your customers explicit consent for using their personal data for that purpose? Probably not. Hence, if you are planning on keeping the instance/environment for more than 30 days, you will need to remove all personal information from to stay within the boundaries of GDPR. I have found that using SSIS with Kingswaysoft and the Anonymization component in the productivity toolkit is very useful. I will in this rather lenghty article describe how I have used it to set up an anoymization script.
First of all you need to download SSDT and Kingswaysoft Dynamics 365 and Productivity Packs
Then start a new Business Intelligence -> Integrations Services project.
Then start by right clicking in the empty field at the bottom where it says “Connection managers” and choose “New connection…”
In the dialog that shows up choose “DynamicsCRM”
You should now see a dialog showing the connection settings to Dynamics.
|Connection settings for your instance
Choose the right settings for your instance. Test your connection at the bottom when you are done to make sure it works. Make extra sure you are not connecting to your production environment, wouldn’t want to anonymize that!
When this is done, it is time to make your first Data Flow Task. Work in SSIS is divided into two parts, Control Flow and Data Flow. The control flow is the orchestration, which tells SSIS in which order everything is to be run. If you want thing to run in parallel, just have to boxes next to each other, if you want one Data Flow to run before the other, drag the arrow from the first to the second. It is also possible to have entire “Sequence containers” which can hold several components and make sure they execute before moving to the next stage.
Let’s start by dragging one new Data Flow from the Toolbox on the left hand side to the Control Flow work pane. Then double click it. This will open it up.
You will now see that the tab at the top has changed to “Data Flow” from the previous “Control Flow”. In the “Data Flow” view you will also have a different set of toolbox components available.
In the “Data Flow” you will control a single data flow. For instance the anonymization of Contact.
Start by dragging the Dynamics CRM Source from the toolbox (on the left) to the workspace (the big pane in the middle. Then double click it. – Before looking at the details of the source component, I like using FetchXML when building queries and of course the best way to build queries with FetchXML is using FetchXMLBuilder in XrmToolBox (thanks Jonas Rapp and Tanguy Touzard for all your work!) but if that is too much heavy lifting (it really isn’t), the easiest way to get a FetchXML query is to make an advance find query and export it with the “Download FetchXML” button in the top right hand side of the ribbon of the Advanced Find query builder. So let’s say we have decided the following fields in Contact are personal information and need to be anonymized:
|The column editor in advaced find – don’t use composite fields like “fullname” or Address1
Downloading the FetchXml, and setting the Source component in SSDT (Visual Studio) will make it look like this:
I have set the “Connection Manager” to “Target” as we are using the same source and target (reading and writing to the same system.
I am leaving batch size as 2000. Seems to work well. Don’t reduce it too much, remember the API limit of 60 000 calls per 5 min period.
If you would like to try it a bit, you can set the “Max rows returned” to for instance “10” and then try it out a bit to see that it isn’t going crazy.
Source type I like as FetchXML – remember that you can have FetchXMLs with data from several entities which can make queries a lot easier than trying to match the data with lookups in SSIS.
I always try to read all data in UTC and write it in UTC which in most cases makes it correct. But make sure you understand how timezones work if you need to fiddle with this.
Also, don’t include more columns than you need. It will just make your script slow. After adding the FetchXML or changing it, it will try to parse it and read the meta data from Dyn365/CRM. Hence there might be a slight delay. You can check what data you will output from this component by clicking on “Columns” on the left hand side.
When done, press “OK” to go back to the “Data Flow” pane.
Now add a “Data anonymization” component and drag the arrow from the Source component to the Anonymizer. Then double click the anoymizer.
You should see something like this, where I have set anonymization settings for the different columns. By default it will say “Ignore” on all columns.
Try out the different anonymization types. Some are more generic than others. When done, click Ok and go back the data flow pane. Add a Dynamics CRM Destination and drag the blue arrow from the anonymizer (blue arrows are the normal data output, red arrows are error output) to the Dynamics CRM Destination component. Then double click it. The view in the data flow should look something like the picture below.
|Dyn365 Soruce -> anoymizer -> writing to Dyn365
|Dyn365 destination component – set values where the arrows are
When setting up the destination component, there are a few things to consider:
- In this case we are always doing updates – hence set the action to update. It is faster than upsert.
- You have to set the Destination Entity.
- If you write data to Dyn365 with high latency, no batching and no threads you will be able to update at about 2-3 records per second. With low latency, correct batch setting and multi threading, I have been able to get up to 300 records per second. Very dependant on entity. Hold the mouse of the blue “i” just after the “Enable Multithread Writing” for some deep end tips from the scholars at Kingswaysoft.
- Error handling is recommended to be directed at a file or some other output where you can monitor it. If you do nothing about it, and you get an error, it will break the flow and stop. You can control error handling of the destination component by clicking on the “Error handling” tab on the left hand side. Remember that all types of exceptions thrown by Dyn365 you will get here as well, like missing rights, disabled records etc.
A normal problem that needs to be handled is that records are deactivated and deactivated records cannot be changed, they have to be opened first, then changed, then re-closed.
This is an example of a dataflow handling this:
|Data flow which splits the deactivated records to the left, adds two special columns to reactivate them, writes an update with only the statecode & statusreason to the record and then merges the two data streams and writes the original values, which recloses the ones that were closed from the beginning
|This is how the conditional split is defined – if statecode is 0, send them to the output called “open” otherwise send them to the output called closed
Then you can test run your data flow, by right clicking the data flow pane, and pressing “Execute Task”
And when you have assemble an larger control flow – like for instance this:
|A control flow with several dataflow being disabled – all in sequence.
you can execute the entire flow by clicking the green plus sign in the ribbon.
MVP, Founder and Principal Consultant at CRM-konsulterna AB
Related to my last post, on working with the API quickly, Microsoft have now released official
documentation that they will, effectivly March the 19:th start limiting the number of API-calls per instance that is allowed to stop what is called “noisy neighbour” problems.
First of all, read the full article here: https://docs.microsoft.com/en-us/dynamics365/customer-engagement/developer/api-limits
Let’s break this down a bit, 60 000 calls per 5 minutes, translates to about 200 calls per second. If you break this, you will start getting exceptions, until the 5 minute period has ended. You are expected to back off, and essentialy handle this. That is the short version. Read the full article for more details.
Update: George Doubinski, a friend of mine and one of the brains of CRM Tip of the Day made me aware of the fact that the limit is per user. I will update the article below on what this means.
What does this mean? Is this a problem?
For most organizations, no, at least that I work with, I not even close to breaking this. If they are using some integration tools like Kingswaysoft or other tools which enable multithreaded integrations, but generally do not need that kind of data throughput then you might temporarily be shut down, but it should self heal after some time, as after each 5 minute time span, you will get another 60 000 requests. That could probably quite easily be fixed by checking the settings of the integration tool. Update: Also, if you integrate each system using a separate accont, you do not risk one system temporarily blocking many other systems from integrating to Dynamics 365. If you are using normal users, this will of course entail a certain license cost, why I generally recommend using app users for integrations, if possible. And after this, you should have one app user for each integrating system.
However, there are some organizations where I forsee issues, and these are organizations which have combinations of any of the following criteria:
- Third party products which, like Marketing Automation, (ClickDimensions, FreshRelevance, SalesForce Marketing Cloud) which have not had time, or got this in their scope yet, and have large amounts of data that they integrate into Dynamics 365. Update: Especially if the user they are using to integrate, the service user, is a normal user, either used by a normal user, or shared with integrations with other systems.
- Legacy Code that has been upgrade to new SDK but uses inefficient architecture – can for example have issues with using ExecuteMultiple which in the article above is described as the recommended best practice. Typically for the reason that the architecure of the code, would require major rewriting to allow for ExecuteMultiple. Update: In this case I strongly recommend looking at using a dedicated user for this specific integration, to isolate any limiations set on that user.
- Organizations with multiple heavy integrations to Dynamics 365. Will be hard to control that the sum total does not exceed 60k per second, and handle back-off in a controlled way. The only reasonable way would probably be to rewrite the integrations to use a proxy or queue instead like Azure Service Bus Queues to integrate and have a single integration interface. Probably a lot easier to write in a blog article than to do in real life. Update: This was an incorrection deduction from my part, as it is not based on the sum total, but on the sum per user, this is not a risk unless many integrations use the same user for integration which I do not recommend.
- Organizations with complex heavy integrations with thousands of lines of integration code that need to be redesigned, rewritten, tested and deployed before March 19:th. And there is no way to test it as there is no TAP/Beta program for this “Feature”. Update: This is still very relevant. Even such a small change as changing the integrating user for an external system should be thoroughly tested and for larger implementations that can be hard to do before March 19.
I see is a typical B2C organization running Dynamics 365 with a marketing automation addon with email tracking and webtracking. They also have a very time critical integration of orders to be able to handle any incidents. Even if the order integration in itself does not reach the limits, it is not unforseeable that a mailblast, especially a good mail blast, to which many customers read the emails click the links, go their site, check their offers and start ordering, would cause a surge of traffic on the Marketing Automation integration – Dynamics 365 API. This of course depends on the settings of this, but perhaps it is critical that all events be tracked to Dynamics. With a mailblast to let’s say 1 Million recipients, quickly hitting the 60 k/5 min limit would happen. When this happens, this would also block all orders from going to Dynamics, causing an effective stop for working with any new incidents in the system.
Update: This is, of course, only relevant if both systems are integrating using the same user. Don’t. However, the marketing automation system above, would hit the limit fast anyway and if the supplier of this system didn’t have time to update their product/service then it would handle this incorrectly. I recommend checking integrating systems and try to turn down the verbosity of what they are writing to Dynamics 365. Then after March 18 when we see how this falls out in detail, you can test a more verbose setting in a test environment, and then see how that falls out.
For small and medium companies with low complexity working mainly with B2B. I don’t see that much of a problem. Larger companies with complex integrations, large databases, integrations to webtracking, email tracking which often will be B2C companies which have higher levels of automation and larger databases of customers, will probably have larger problems with this and need to start think about this right now.
We need to come back to this subject post March 19, to see how this will really work. But I think the real problem will be for the larger orgs with many and heavy integrations.
I would be really glad to hear your views on this like I got Georges’.
MVP, Founder and Principal Consultant at CRM-konsulterna AB
The Excel import functionality in Dynamics CRM is quite good. Version 7.1 (2015 Spring Wave) introduced some new features, like finally being able to work with xlsx-files which helps out a lot.
Generally we recommend using ETL-tools like our favorite KingswaySoft in SSIS, and sometimes the only way to migrate really complex data is to write your own code, but it usually isn’t necessary (if you disagree, please leave a comment below!).
However, both ETL and especially code, takes some training to get going with and some of our customers and others I have spoke to have expressed a will to handle migration themselves which makes the Excel import the best candidate to work with as most people have experience with Excel.
First of all, my recommendation is to complete the customizations in the target system. That way, when you download the Import templates, you will get all the fields that are shown on the “standard” (fallback) form.
|Excel Account Import Template (Swedish) from CRM 2016
As of CRM 2015 Spring wave (7.1) the Excel import template will be formatted as a excel table, which makes it easier to work with, I think.
During the work with migration of data, you can correct data in one of three places.
1. In the source system.
2. In the Excel files while transfering it.
3. In the target system/after migration
Typically the time when the live migration needs to be done, is as short as possible. As everyone needs to stop working in the old system, migration needs to be done, and then everyone can work in the new system. Hence it is typically done during evenings, nights or weekends, depending on which type of business you are in.
Due to this, you need to make sure that when you are doing the real live migration, you need to make sure that it will be as controlled and as smooth as possible. This requires you to first set up the migration and then test it properly. Dry runs are therefore a must unless you are CRM Rambo.
If you plan steps in the migration in the above described stage (2) – during the transfer, you will need to re-do these every time you do a dry run and when you do the live migration. If possible to change the data in the source system instead, for instance remove duplicates, you reduce risk and speed up the migration process.
Generally it is a really good idea to try to clean up you data before migrating it. Many are the CRM system owners who have understood quite a lot about how their users actually use the system, when the are to migrate the data and really start digging their head into it.
Some of the common problems when migrating with Excel are:
- Complicated to handle GUIDs – ETL tools often have ways to migrate the GUID from a source system, very useful if moving from CRM Onprem to CRM Online.
- Cannot handle updates/upserts easily. Can however upload several files at once instead. For instance, contacts have Parent accounts, and Accounts have Primary Contacts. Just make a zip with both xlsx-files and upload. CRM will figure out that they are interdependant.
- Duplicates. Duplicates. Duplicates. So you switched off the duplicate detection for accounts and thought you’d be fine? Well, not so much, lookup fields to account, for instance, by default, will use the primary field, which is “name” and if you have several accounts with the same name, the import function will not know which to select and you will get an error. Typical fields where this is a problem are:
- Contact – Parent Customer / Company (pointing to account)
- Opportunity – Potential Customer / Account (pointing to account)
- Opportunity – Contact
- Account – Primary Contact
- Account – Originating Lead
- Contact – Originating Lead
- Opportunity – Originating Lead
- Only active? Are you only going to import active records? Will that work? Do active records have relationships to inactive records? For instance, you might have an active account that has an inactive primary contact. Or active accounts with inactive parent accounts. If only moving the active accounts and active contacts, the import will fail as the target system will not find the primary contact.
- Activities. The activity parties, in other words the “to”, “cc”, “bcc”, “required”, “optional” etc. fields on activities are handled by a rather complex mapping entity called activity party. There is some support for migrating this with Excel but it is not very easy, so if you do have a lot of activities to migrate, I would strongly recommend using an ETL tool with support for activities.
- If you add a column to the CRM 2016 Excel Template, despite having the Display name in CRM, it will not be automatically mapped like the other columns as there is underlying data to handle it. It is no big issue as the import mechanism usually interprets this correctly anyway. Do note that the Display name mapping is case sensitive and CRM does allow for duplicates in fields having the same Display name, but that doesn’t make importing easier.
Some tricks that I use
- Generate an import template for the entity at hand, let’s say account from the target system, then remove all the columns you don’t want to import. Remove the entire column. Save this. Verify the template by filling in one or two rows by hand and importing. Make sure that the lookups match data in the system.
- Export data from the source system in the same order to match the import template, if possible, into Excel. Then, verify that all columns are identical (ie. picklist are the same etc.) and then copy-as-values from the source excel sheet to the target excel sheet without the header columns.
- Verify that the data in the target excel sheet is correct, that for instance there are cities in the city column, not street names.
- Then import the data into CRM. The first couple of times you will get errors. If you don’t you are very very lucky. Migrating data is complicated, as data is seldom clean and well ordered. So, when it is done, go to the imports section in CRM and go through the errors and try to understand what it is trying to say. Typically a specific lookup cannot be found or it has multiple values. My suggestion is to fix it in the source data (1 in the list above) and not try to do it in the Excel sheet (2). Also plan for making several test runs of migration. Write down the steps you need to do when doing it live, so you have your plan set.
- When importing data with lookups to data where there are lot’s of duplicates in the primary field, do note that you can change the mapping of the lookup field so that it mapps not on the primary field but on some other field instead. For instance, if there are several accounts with the same name but all accounts have different account numbers, you can use account number to indicate parent customer on a contact instead and then change the mapping when importing.
- Make sure that if you do change data in Excel and use formulas, copy and paste-as-values back to the cells/columns as the import does not support formulas. (not sure if the new CRM 2015 Spring Wave/2016 does, havn’t tested, but I wouldn’t provoke it.)
- All picklist values need to exist. Using the import templates is a good tool here as they include the possible values. But you can paste incorrect data into the cells anyway and then import it where you will get errors on the rows with the incorrect data.
- Decide what is a good enough result before. Is 99% good enough? That would mean that if you have 100 000 accounts, 1000 accounts are not imported… so maybe that is not good enough. Remember Pareto’s law, that the last 20% take 80% of the work. So fixing the last 1% can be a real pain, why it is sometimes easier just do fix it manually after import.
- Zip files with inter-dependencies. Like account-contact.
- You can increase the max file upload size. but the max size for a file after uncompression is still 20 MB, if I am not mistaken. This means that you need to think about how you upload your data if you have large amounts of data. ETL tools might be a better option here as well.
- After getting errors and getting an import of let’s say 50%, instead of trying fix the last 50% and importing those, I recommend deleting all and going back from the start. As you need to plan for doing many trial runs before doing a live migration, this is the only way to make sure that you get the source data better and better.
If you have done your own imports and migrations with Excel, I am sure you have some tips of your own, please share them below! I do moderate all comments to avoid spam.
MVP, Founder and CTO at CRM-konsulterna AB