Some people might have heard about an industry best practice that you should never have custom columns (fields) on the systemuser table (entity) in dataverse. Is this true and why so? This article is based on my understanding of how the inner workings of dataverse works and hence what you need to think about when designing your application to not unintentionally create an application that destroys your environments performance. In short, be careful about adding custom columns to the systemuser and if you do, only add fields that have static data, ie data that doesn’t often change. Let me describe this in more detail.

First of all, I would like to give credit to a lot of this to my friend and former Business Application MVP Adam Vero, who described this in detail for me, I have also discussed this with other people and since had it confirmed but not actually seen it documented as such, why it might not be fully official. I do, however, not see any problems with people understanding this, rather the opposite.

Dataverse is an application platform that has security built into it as an integral part, there are security roles, system users, teams and business units that form the core pieces of the security in the system. As the system will often need to query data from these four tables, it has a built in “caching” functionality that per-environment loads these four tables and precalculates them into an in-memory table for easy and fast access. This is then then stored in-memory for as long as the data in these four tables is kept static, in other words, nothing is changed, no updates, no creates, no deletes.

What could then happen if you add a column to the systemuser table? Well, that depends. If this column is a column that you set when the user is created and then never change that, that isn’t a problem, as this wouldn’t affect the precalculated in-memory table. However, if the data of these columns are constantly being changed, like for instance, if you add a column called “activities last 24h” and then create a Flow which every time an email, appointment etc. is created it will increment this by one per day and reset it every night for every user. Then every time, this writes to any user, the precalculated in-memory table will be flushed and recalculated before it can be used again causing a severe performance hit that can be very hard to troubleshoot.

How would you create a solution for a the “activities last 24h”, as described above then? Well, I would probably create a related entity called userstatistics with a relationship to systemuser. In this case it could even be smarter to have a 1:N relationship to this other entity as you could then have many userstatistics per user and measure differences in activities day by day.

But wouldn’t the NLB:s (Network Load Balancer) make this irrelevant as each environment is hosted together with many others? Well, I cannot, due to NDA talk about the details of how the NLBs actually work for the online environements, but I can say this, no, it is still relevant. The NLB will make it so for performance reasons.

As for teams, it is only the owner teams that count in this equation, the access teams are only being used for sharing or other types of grouping and hence never part of this pre-calculation.

And the savvy person would then of course realize that the multiplied size of:

systemusers x owner teams x business units x security roles

Does make up the size of this pre calculated table and for large implemenations, this can give indications of where performance can start to make a difference as every time a user does not have organizational level privilige, the system has to go through the entire table to check what is right. And then of course the POA. But that table is story for another day and another article.

Just final word. The platform is constantly shifting and even though this was true and probably still is true, there might be changes going on or that have happened that I am unaware of, that have changed how this works. If I hear of this, I will let you know.