With Release Wave 2 2022, the old advanced find will be removed and the new advanced find experience will be its replacement. However, there are many old goats, like me out there that still want to use the old advanced find. I think one of the main advantages of the old advanced find is that it is not bound to the tables/entities of the specific app which makes it easy to have a look at things that are a bit more hidden from normal view. The new experience will require you to, in essence, create an admin app with all the required tables. Another option is of course to use Jonas Rapp’s Fetch XML Builder to find what ever you need, but that has another level of geek to it that might not fit all superusers out there. (Although I think they really would like it if they learnt how).
So, with the following URL you can still access the old advanced find and you can hence just bookmark it and have it easily available. How long this will be the case, I cannot say. But probably a year or so. And do send your feedback to Microsoft of the new experience so that they can make it as good as it should be.
Have you ever tried creating bulk deletes in dataverse/Dynamics 365? It is still a very old interface and it is hard to control the exact definition of what is to be deleted as you cannot see the actual FetchXML that is being used. It is also hard to see existing recurring bulk delete jobs and what FetchXML they are using. Based on these facts, my colleague Ebba Linnea Nilsson and I decided to make our first plugin for XrmToolBox (XTB). We are now proud to announce that it is released and available for free (as usual in XTB). We have also identified a bug in the platform related to this, which I will describe below.
The bulk delete functionality in dataverse and hence in Dynamics 365 CE is an essential function for many organizations. This is especially true since GDPR was introduced and there is a strong legislative requirement to remove personal data that cannot be justified to be stored. There are also other reasons why the bulk delete functionality is more and more important and that is based on the capacity costs that can be inferred on a Power Platform tennant. Firstly just storing data, especially in the database storage in dataverse has a non trivial cost at $40/GB. There is a lot of value per GB, so it might not be justified to say that it is too expensive but removing unnecessary data is definitely something that can be worthwhile especially for larger implementations. I personally work with a customer in the travel retail industry which has millions of customers and some tables have 40M+ records. There are also a lot of integrations and automations causing a lot of data to be created. Data that at some point needs to be removed. As all data should, or there should at least be a conscious decision not to remove it and why if so. As you might be aware, if you have read my other articles in this blog, I have previously used SSIS and Kingswaysoft to remove massive amounts of data. However, now that the API Entitlements will be introduced (6 months after the report for API Entitlements is made Generally Available), we need to start to become more and more restrictive in using the APIs for massive data management, like deletes. Hence, we try to move as much as possible to the built in Bulk Delete.
Working with the built in bulk delete functionality is a bit sad. It is very old and you have to click through a wizard kind of experience to be able to set them up. But the most limiting factor is that you cannot see the actual FetchXML of an active recurring bulk delete and you cannot input FetchXML directly into the bulk delete.
Having worked with this wizard you might also have noticed something a bit off. If you create a view which shows all contacts that have no activities. Then this will work when using it in the system like a view. However, if you try to use this view in a bulk delete it will “simplify” this and remove that part of the query. My thought that this was a limitation in the UI based on the fact that it is very old and hasn’t gotten any love for many years (decades). My assumption was hence that creating a bulk delete via the API would allow me to create bulk deletes that were based on FetchXMLs that you couldn’t even input from the UI. These were the reasons for us starting to create this plugin and it was so useful that I used it in debug mode for several weeks before finalizing it and publishing it.
Now we have released the first version and you can download it directly from XTB. I would like to give a huge thanks to Jonas Rapp who helped us out a lot with both connection to his tool FetchXML Builder but also the general setup of the plugins and the details of getting it approved as a Plugin for XTB.
If you have any suggestions, comments or otherwise, leave them on the GitHub repo https://github.com/crmgustaf/BulkDeleteManager or down below. We already have a bunch of stuff we want to do. Ah, yes, and the bug we found, it seems that the outer joins that was a rather recent addon to FetchXML is not supported by the actual platform. Hence the UI and the platform match in that perspective. Just to make it clear, what happens is that you input a FetchXML saying “All contacts with no activities” or something like that, which it will simplify to “All contacts” which is not really what you want.
As it is supported to create bulk delete jobs via the API, I do think that this still can be seen as a bug as there is no clear documentation on this or even a control when creating the bulk delete job with a FetchXML that will be incorrectly parsed. My suggestion is hence to implement the new FetchXML parser in Bulk Delete functionality, at least on the platform side. With the current setup, it is very possible that bulk deletes are created that remove a lot more than what was initially intended which can be very damaging to any organization. And from a GDPR perspective, this type of query is rather common, as it might be definied that contacts with no cases can only be stored for 2 years, but with cases for 10. To remove the ones without the cases, you would make the question “Remove all contacts with no cases with created on > 2 years”. This would then cause all contacts with created on > 2 years to be removed regardless of if it has a case or not.
To inform users of this, we have added a warning, every time a new bulk delete job is to be saved. Hopefully Microsoft will fix this soon.
Previously integrations between systems have been rather specific. For instance, someone has build an integration between Dynamics 365 and Mail Chimp or something similar. The introduction of Logic Apps and Power Automate has, however, at least from a Microsoft ecosystem perspective, changed this quite a lot. Now, when someone asks for an integration it is usually preferable to build and use custom or standard connectors in Power Automate / Logic Apps instead. This allows for more flexibility in the integration with the downside that the actual logical flow of the integration still has to be built, which can be a daunting task in itself in some cases.
Hence, when we at CRM-Konsulterna started a conversation with one of the larger digital signature companies in Sweden, Verified, about helping them create an integration to Dynamics/Power Platform, we suggested to them that we acutally not build an integration at all but rather a connector to Logic Apps/Power Automate. This connector has now been released and we are very proud of this and hope that it can be of great use to many companies that want to build integrations to Verified.
And as LogicApps/Power Automate are more generic solutions, this actually enables integrations to any of the other 600+ systems to which there are connectors nowdays. Hence someone who wants to build an integration to Saleforce or even SurveyMonkey can now do this using Power Automate or Logic Apps.
One of the things that need to be mentioned with regards to Power Automate is, however, that there sometimes are issues with the execution of Flows. This is not particuarly common but it does happen. It has also been improved over time by Microsoft but we do feel that it is important to consider this fact when building integrations with Power Automate. So, for instance making sure the logic both triggers on a particular event and has a “clean up” job which runs periodically and catches any records which for some reason wasn’t processed correctly.
I don’t think this should discourage the use of Power Automate or Logic Apps but that you need to make the logic a bit more robust.
Another interesting issue which can arise due to the fact that Power Automate is so easy to work with is that people with no programming experience start making Flows. This might not be a problem if you make a simple flow but with more complex logic you typically have to consider some fringe cases, what we call exceptions in programming, for instance if there is no data in a field where data is expected. This can cause a Flow excution to break or do unforseen things. Hence, having a good process for monitoring Flows and having experts have a look at them to notice any potiential problems, especially when they grow in complexity, compliance or use.
With this, I hope you have fun with Power Automate and create some amazing Flows to help you in your everyday life! And thank you all amazing colleagues who made the launch of this connector possible.
First of all, the most important fact is that non-licensed users have been upped from 100k to 500k for all Dynamics 365 Enterprise and Professional licenses. On top of that, 5k requests are added per Dynamics USL. Capped to 10M. Hence a small org with 5 Dynamics Pro users will get 500k + 5x5k = 525k requests per 24h. A large org with 1000 Enterprise users will get 500k + 1000x5k = 5.5M. A larger org with 10k enterprise sales users will be capped at 10M requests.
Normal, payed users have also been changed a bit. An enterprise or pro license is entitled to 40k requests. Note that this does not include team member licenses, which are entitled to 6k.
The capacity addon has also been changed to include 50k instead of the previous 10k. If the prices at $50 is still the same, I don’t know at this time. Then this price has been reduced to 20%. I will get back to this later.
This change is good as it will probably cause the majority of the customers to not exceed this. There will probably still be a few larger customers or complex solutions that will exceed and I do suggest that you talk to your partner and your Microsoft account manager to try to arrange something.
There are still some issues that I think need addressing;
How can ISV:s like Click Dimensions which by nature will be rather verbose be able bundle requests?
Larger corporate/global tennants with multiple instances are still punished by this model and would benefit from splitting the one large tennant to several smaller. But that makes it a lot more complex from an IT-perspective and isn’t the point that it is great to keep them all in one?
Licensing is still rather complex from a capacity perspective and that might scare customers. I have talked to customers that have chosen SalesForce just due to this reason.
There is more to be discussed regarding this, but I wanted to give my perspective on this as soon as I could and also put some light on this. I will be back on the subject.
On the page in Microsoft docs where they discuss API Service Protections there is towards the end of the page a part which gives some recommendations. Some are great, like the recommendation to use many threads and remove the affinity cookie, however when I read it I really bounced at the recommendation that batching shouldn’t be used. That just didn’t rime with my experiences of doing heavy dataloads to dataverse. So I thought I might just test to see if it was true or not by creating a simple script in SSIS with Kingswaysoft. My results, using batching compared to not using it gives more than a 10x performance increase. Continue reading to understand more about how I tested this and some deeper analysis.
Parameters and excel
The first thing I did was to create an excel sheet for storing all the results. I really did have to think about the different parameters that could affect the result, so I chose the following columns:
Dataload – how many records. This needed to be a bit larger to make sure that the throttling time of 5 min was passed.
Operation – Different dataverse operation take different amounts of time. For instance, creates are typically rather fast, but deletes, depending on table, can be a lot slower as the platform might execute cascading deletes based on one single delete. For instance, if you remove a contact with 100 tasks connected to it with the regarding relation set to “parental” or “cascade delete” it will actually remove all the 100 tasks. If set to “remove link”, the platform has to make an update to each of the tasks, removing the link. There are also special operations like merge which are rather complex.
Table – There is a large difference between the different tables. Some of the OOB tables have a lot of built in logic and really small non-activity custom tables can be a lot quicker to create, update or delete.
Threads – How many threads were used.
Batch – The size of the batches being used.
Duration / Duration (ms) – Duration is where I input the duration as a normal time. I created a calculation to calculate the corresponding amount of milliseconds.
Time per record (ms) – This is the division of the duration in ms with the total number of records. During this first test, I always used 100 000 records as the dataload, but it could be interesting in the future to see the differences between different dataloads, with all else being the same. This is also the main output from this test.
Strategy – It is possible to have different strategies. In this first version I just ran everything at once, hence I called the strategy “All at once”. Different strategies might be “5 on, 5 off”, meaning that you design the script to run superfast for 5 minutes, the throttling limit, and then stop and do nothing for 5 minutes and then loop this. Not always possible to use that kind of strategy, but for massive deletes of for instance market list members (cannot be removed with bulk delete) that might be an option.
API – There are currently two APIs that can be used. The new WebAPI which uses JSON payloads and the older SOAP API which used XML payloads. It stands to reason that the smaller JSON payload should cause the WebAPI to be faster than the corresponding SOAP API. However SSL encryption also causes the data to be compressed, which might make these differences smaller than expected. There is also a server side aspect to this, as the APIs will run through different parts of the code on the server side which could affect the performance.
No of columns – How many columns are being sent to the API. Of course there would be a difference if you send a create message with 3 columns compared to 30. Hence this is a relevant point. It is still a bit rough, as there is a huge difference in creating a boolean record, a 2000 character nvarchar or a lookup. This could also be something that was adapted.
Existing records – How many records existed in the system prior to running this? Not sure if this makes any difference, in other words, everything else equal, would it take more time to write 100k records to a system with 0 records or one with 10M records? As I don’t know, and cannot rule it out, I added it.
Latency (ms) – Daniel Cai, Founder of Kingswaysoft, always recommend that the SSIS script with Kingswaysoft be run “as close as possible to the dataverse”. That does in other words imply that the latency to the server affect the performance. Do calculate this, I used diag.aspx from the computer running the script.
Location – Which geo is the instance located in. This is more for general information, the latency is really the important factor here. The throughput might also have some affect if you are using a really bad line to the dataverse. I was using a wired 1 GBit line. In this test, I was using an instance I got hold of as MVP, which is located in the US and my own stationary computer at home (a AMD Ryzen 9 3900X 12-Core Processor 3.79 GHz with 32 GB of memory). Hence the latency was rather high and not in line with Daniel Cai’s recommendations. It is hence also something to investigate further.
No of users – As I, and some others in the community have described, throttling is based on a per-user and per-front end server basis. Hence utilizing several service principals/application users can effectivly multiply the throughput. In this test I used just one.
Instance type – It is well known that sandbox instances do not have the same performance as a production instance. If you find Microsoft support on a happy day and you are working with a larger (no of licenses) instance, you might also get them to relax the throttles a bit, especially if you mention that you are doing a migration. As these factors strongly affect the performance of large dataloads, I did have to add this. During this test I was using a non-enhanced production instance, in other words, a production instance on which no throttles had been relaxed.
DB Version – The final parameter that I thought might affect this is the actual version of the dataverse instance. As improvements and god forbidd sub optimal “improvements”, can cause enhancements or degradations of the performance, this is necessary to document.
For setup of create tests in SSIS with the Kingswaysoft addons I used a dataspawner (productivity pack) to generate the data. I then just sent this directly to the CDS Destination.
And the CDS Destination config
After each run, I checked the log from SSIS to see how long the entire process took. Due to the fact that I have a computer with many threads and for this case, enough memory, it is my perception that most of the threads allocated were also used.
What are the results? This is a picture of the excel:
As you can see I did try both Create and Delete operations, but the results are rather obvious.
20 threads/20 per batch of both create and delete, took around 45 minutes
Reducing to 16/10 made only a minor difference – 48 minutes
Microsofts recommendation of not using batching, ie 20 threads/1 batch – took over 10 h, both for delete and create.
Using only 1 thread with 1 batch was more or less the same as using 20/1
1 thread with 20 in every batch (1/20) took almost 5 h, which is around half the 1/1 or 20/1
I think the results clearly show, that Microsoft docs are currently incorrect in their recommendation to not use batching. Perhaps they will update this soon. From an entitlement perspective, one needs to understand the additional cost of the “batch unpacking” request that is made. With 20 in every batch, this is an overhead of 1/21 but if you would lower the batches to 4, it would be 1/5. Hence using as large batch as you can, without loosing performance, is generally what I would recommend.
As I have implied in this article, there are a lot of other parameters to investigate in the API. I have a hunch that a create with 10 lookups compared to 10 textfields, will also make a significant difference, but I will need to test it.
Also do consider the request timeout. When working with complex and large batches, one request may taker quite some time. You will know, however, as it will return a timeout exception if you exceed it. Note that some records in that batch may have been written anyway. Just that your client wasn’t waiting around for the answer.
I do also encourage others to try out other parameters in the API. What is really optimal from many different aspects. From a mathematical perspective this can really be seen as a multidimentional surface where we are attempting to find the highest points. I have now started this journey, and I hope it was an interesting read. Please leave a comment, if you have any experience to share or just want to comment.