Incorrect API Optimization recommendations by Microsoft

Incorrect API Optimization recommendations by Microsoft

On the page in Microsoft docs where they discuss API Service Protections there is towards the end of the page a part which gives some recommendations. Some are great, like the recommendation to use many threads and remove the affinity cookie, however when I read it I really bounced at the recommendation that batching shouldn’t be used. That just didn’t rime with my experiences of doing heavy dataloads to dataverse. So I thought I might just test to see if it was true or not by creating a simple script in SSIS with Kingswaysoft. My results, using batching compared to not using it gives more than a 10x performance increase. Continue reading to understand more about how I tested this and some deeper analysis.

Parameters and excel

The first thing I did was to create an excel sheet for storing all the results. I really did have to think about the different parameters that could affect the result, so I chose the following columns:

Dataload – how many records. This needed to be a bit larger to make sure that the throttling time of 5 min was passed.

Operation – Different dataverse operation take different amounts of time. For instance, creates are typically rather fast, but deletes, depending on table, can be a lot slower as the platform might execute cascading deletes based on one single delete. For instance, if you remove a contact with 100 tasks connected to it with the regarding relation set to “parental” or “cascade delete” it will actually remove all the 100 tasks. If set to “remove link”, the platform has to make an update to each of the tasks, removing the link. There are also special operations like merge which are rather complex.

Table – There is a large difference between the different tables. Some of the OOB tables have a lot of built in logic and really small non-activity custom tables can be a lot quicker to create, update or delete.

Threads – How many threads were used.

Batch – The size of the batches being used.

Duration / Duration (ms) – Duration is where I input the duration as a normal time. I created a calculation to calculate the corresponding amount of milliseconds.

Time per record (ms) – This is the division of the duration in ms with the total number of records. During this first test, I always used 100 000 records as the dataload, but it could be interesting in the future to see the differences between different dataloads, with all else being the same. This is also the main output from this test.

Strategy – It is possible to have different strategies. In this first version I just ran everything at once, hence I called the strategy “All at once”. Different strategies might be “5 on, 5 off”, meaning that you design the script to run superfast for 5 minutes, the throttling limit, and then stop and do nothing for 5 minutes and then loop this. Not always possible to use that kind of strategy, but for massive deletes of for instance market list members (cannot be removed with bulk delete) that might be an option.

API – There are currently two APIs that can be used. The new WebAPI which uses JSON payloads and the older SOAP API which used XML payloads. It stands to reason that the smaller JSON payload should cause the WebAPI to be faster than the corresponding SOAP API. However SSL encryption also causes the data to be compressed, which might make these differences smaller than expected. There is also a server side aspect to this, as the APIs will run through different parts of the code on the server side which could affect the performance.

No of columns – How many columns are being sent to the API. Of course there would be a difference if you send a create message with 3 columns compared to 30. Hence this is a relevant point. It is still a bit rough, as there is a huge difference in creating a boolean record, a 2000 character nvarchar or a lookup. This could also be something that was adapted.

Existing records – How many records existed in the system prior to running this? Not sure if this makes any difference, in other words, everything else equal, would it take more time to write 100k records to a system with 0 records or one with 10M records? As I don’t know, and cannot rule it out, I added it.

Latency (ms) – Daniel Cai, Founder of Kingswaysoft, always recommend that the SSIS script with Kingswaysoft be run “as close as possible to the dataverse”. That does in other words imply that the latency to the server affect the performance. Do calculate this, I used diag.aspx from the computer running the script.

Location – Which geo is the instance located in. This is more for general information, the latency is really the important factor here. The throughput might also have some affect if you are using a really bad line to the dataverse. I was using a wired 1 GBit line. In this test, I was using an instance I got hold of as MVP, which is located in the US and my own stationary computer at home (a AMD Ryzen 9 3900X 12-Core Processor 3.79 GHz with 32 GB of memory). Hence the latency was rather high and not in line with Daniel Cai’s recommendations. It is hence also something to investigate further.

No of users – As I, and some others in the community have described, throttling is based on a per-user and per-front end server basis. Hence utilizing several service principals/application users can effectivly multiply the throughput. In this test I used just one.

Instance type – It is well known that sandbox instances do not have the same performance as a production instance. If you find Microsoft support on a happy day and you are working with a larger (no of licenses) instance, you might also get them to relax the throttles a bit, especially if you mention that you are doing a migration. As these factors strongly affect the performance of large dataloads, I did have to add this. During this test I was using a non-enhanced production instance, in other words, a production instance on which no throttles had been relaxed.

DB Version – The final parameter that I thought might affect this is the actual version of the dataverse instance. As improvements and god forbidd sub optimal “improvements”, can cause enhancements or degradations of the performance, this is necessary to document.

SSIS/Kingswaysoft setup

For setup of create tests in SSIS with the Kingswaysoft addons I used a dataspawner (productivity pack) to generate the data. I then just sent this directly to the CDS Destination.

And the Data Spawner config:

And the CDS Destination config

The main changes done in this case were to the parameters “Batch size” (set to 20 in the picture above) and how many threads to use (also set to 20 in the picture above).

After each run, I checked the log from SSIS to see how long the entire process took. Due to the fact that I have a computer with many threads and for this case, enough memory, it is my perception that most of the threads allocated were also used.

Results

What are the results? This is a picture of the excel:

As you can see I did try both Create and Delete operations, but the results are rather obvious.

  • 20 threads/20 per batch of both create and delete, took around 45 minutes
  • Reducing to 16/10 made only a minor difference – 48 minutes
  • Microsofts recommendation of not using batching, ie 20 threads/1 batch – took over 10 h, both for delete and create.
  • Using only 1 thread with 1 batch was more or less the same as using 20/1
  • 1 thread with 20 in every batch (1/20) took almost 5 h, which is around half the 1/1 or 20/1

I think the results clearly show, that Microsoft docs are currently incorrect in their recommendation to not use batching. Perhaps they will update this soon. From an entitlement perspective, one needs to understand the additional cost of the “batch unpacking” request that is made. With 20 in every batch, this is an overhead of 1/21 but if you would lower the batches to 4, it would be 1/5. Hence using as large batch as you can, without loosing performance, is generally what I would recommend.

As I have implied in this article, there are a lot of other parameters to investigate in the API. I have a hunch that a create with 10 lookups compared to 10 textfields, will also make a significant difference, but I will need to test it.

Also do consider the request timeout. When working with complex and large batches, one request may taker quite some time. You will know, however, as it will return a timeout exception if you exceed it. Note that some records in that batch may have been written anyway. Just that your client wasn’t waiting around for the answer.

I do also encourage others to try out other parameters in the API. What is really optimal from many different aspects. From a mathematical perspective this can really be seen as a multidimentional surface where we are attempting to find the highest points. I have now started this journey, and I hope it was an interesting read. Please leave a comment, if you have any experience to share or just want to comment.

xRMVirtual Presentation

xRMVirtual Presentation

By Charlotte S H Jensen – Flickr: Maskiner på Brede Værk, CC BY-SA 2.0,
https://commons.wikimedia.org/w/index.php?curid=20868496

Ever thought about the fact that there nowdays are so many ways that you can create logic in CRM?
Or did you start with callouts in CRM 3 and then learnt plugins in CRM 4 and have stuck with those, I mean, why change?

If you are interested in discussing this and listening on my views, join me on September the 27:th at 9:00 PST/18:00 CET when I will be presenting on xRMVirtual on this topic.

Hope to see you there!

Gustaf Westerlund
MVP, Founder and CTO at CRM-konsulterna AB
www.crmkonsulterna.se

Non breaking Space in SSRS

Non breaking Space in SSRS

Been working with creating classical reports in SSRS for a customer lately and there are some tricks that you sometimes need to employ. Today, I found one, so this article will be short. Non-breaking space in SSRS can be set as an expression with the value “=char(160)”,

White Sands – a sort of white space – broken by some tracks.

If using IIF or just adding it to the end of the line is up to you. It is at least good to know.

That was a short post and the tensions is building towards the 1:st. If you don’t know why, check back on the 1:st and you’ll know why.

Gustaf Westerlund
MVP, Founder and CTO at CRM-konsulterna AB
www.crmkonsulterna.se

CRM UG Summit last week – Avoiding crashed presentations

CRM UG Summit last week – Avoiding crashed presentations

For those of you who follow me on Twitter (@crmgustaf) you probably noticed that last week I was in St. Louis, US for the annual CRM User Group Summit. It was a very nice event with lots of interesting presentations from a lot of different presenters, Microsoft employees, MVP:s, users with great stories of their experiences and of course the CRM UG team lead by Tony Stein (@tonysteinND) who did a great job of arranging the entire summit!

I was invited to hold three presentations solo and one together with and George Doubinski and Blake Scarlavai.

I will blog about the three first at a later date, this blog will mainly be about the joint one I had, or was supposed to have with George and Blake. It was a Developer jam session and we were to discuss some of our favorite tools. As my absolute favorite tool is Visio, not really a developer tool, but as it is great for structuring the architectural work before you start hacking away, I wanted to show some of the different uses I have for Visio. I had neatly placed everything in a folder, so that I had all the files ready, in my OneDrive… and I had marked it to be offline… but OneDrive did not agree with me and the Internet Connection was sadly very bad, why the audience just got to see the loading dialog for OneDrive…

@georgedude 

My colleague Rickard and I, usually joke about the fact that you have to sacrifice a goat before a demo or a presentation to the God of Demo, otherwise something is bound to break. We have found that there now-days are a lot of clips on Youtube which will do just fine and are friendlier to goats. We have found that just talking about it, is actually better for sensitive stomachs than watching the clips as well.

In this case, I Think I should not have trusted OneDrive so totally. Or I should have had a backup on a USB or similar. Everything actually did turn out all right after all, as George is a pure full blood pro so I just queued him and he spun off like a Duracell Bunny.

My advice to you, is hence to Always have some backup plan as you might not always have your own George Doubinski or some Other Duracell Bunny to jump in.

If you want to get hold of George or Blake to be you Bunny, their Twitter account is below.

George: @georgedude   
Blake: @bscarlav 

Gustaf Westerlund
MVP, CEO and owner at CRM-konsulterna AB
www.crmkonsulterna.se

Recording of data modeling session

Recording of data modeling session

Today, well soon yesterday, I held the presentation for www.XrmVirtual.com on Data modeling in Dynamics

CRM. It is a bit hard to know if people liked it but some of the comments were very positive so I think it was appreciated.

If you were unable to attend, you can find the recording here: http://t.co/69AMqzQPUW and if you have/had any questions regarding this, please feel free to use the comments below to ask these. Do note that I moderate all comments to avoid getting spam, why it may take some time until your comment is published.

Gustaf Westerlund
MVP, CEO and owner at CRM-konsulterna AB
www.crmkonsulterna.se

XrmVirtual Presentation on Tuesday

On Tuesday I am doing my first presentation for www.xrmvirtual.com. The presentation will be on data modeling in Dynamics CRM and I will be discussing general aspects of this and also specific Dynamics CRM aspects that are important to consider when modeling data.

Areas that I will be discussing are Concept exploration and definition and how these relate to entity maps. Some of the peculiar aspects of Dynamics CRM that you need to take into consideration when modeling data. I have one hour and I am sure I will fill it to the brim. If not, make sure to prepare any questions. I will post a link to the recording of the presentation when it is done here and any questions on the subject can of course be discussed here as well.

It is scheduled to be held on Tuesday the 11:th of march at 12:00 EST / 17:00 CET. A recording will be made available later if you are unable to join live.

To join, please use this LiveMeeting link: https://www.livemeeting.com/cc/usergroups/join?id=PPQ8P9&role=attend

Gustaf Westerlund
MVP, CEO and owner at CRM-konsulterna AB
www.crmkonsulterna.se