Identify Storage Engine and Formula Engine bottlenecks with new SSAS XEvents

Undoubtedly, the big news of Analysis Services 2012 is the introduction of the new Tabular model and its language DAX.

However, there are some interesting news also in the multidimensional model – I just mentioned one of these, the SSAS Storage String limit, in my previous post, but I will write a specific post about it -, including the introduction of Extended Events (XEvents).

Three months ago, Chris Webb wrote this interesting post where you can find all the information to start to play with Extended Events.

Today, I will show how to use Extended Events as an alternative to use Profiler to monitor the activities of our MDX queries.

In particular, we will see how to determine the Storage Engine and Formula Engine usage by our MDX query, and we will compare the solution that uses Extended Events with the more traditional one using the Profiler.

Some background info

Optimizing the performance of our cubes and MDX queries can be a very challenging activity, not only because it requires a strong knowledge of MDX and the SSAS Engine, but also because – usually – it is an iterative and time consuming activity.

If you want to start with a very complete overview of this complex activity, I strongly recommend having a look at these two precious posts by Chris Webb:

http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1

http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part2

As perfectly described in the above posts, the performance tuning activity of an MDX query typically start determining the Storage Engine and Formula Engine usage to identify which one – or both! – could be the bottleneck.

That is exactly what we are going to do!

Step 1: using XEvents to trace MDX query information

First step is to create the XEvent trace using XMLA command.

As Chris showed in his post about XEvents, the create command is very simple and allows us to declare the events we need to collect with the trace.

For our purpose these are the minimal events we need to trace:

  • EventClass = 10 (Query End)
  • EventClass = 11 (Query Subcube)

and this is an example of the complete XMLA command:

In the command, you can see the list of events to trace – in yellow – and the path of the “.xel” file to output to – in green.

After executing it, we can easily verify that our XEvent trace is running using this query:

Here’s the result on my machine:

where you can also immediately discriminate – looking at the [Type] column – XEvent traces from the other traces.

Step 2: execute MDX queries to collect data

Now that our trace is running, it is a simple matter of execute our MDX queries to collect data for further investigations.

Note:    remember to clear the SSAS cache before execute the queries, otherwise time values collected for the Storage Engine could be lowered by the use of the cache.

Step 3: stop XEvents trace

Even if we could to start to analyze the data when XEvents trace is running, is a good practice to stop it and it can be done using a XMLA command like this:

 

Note:    before to execute the command, be sure the TraceID is the correct one! J

We can also verify that our trace has stopped querying again the “discover_traces” DMV as seen in the Step 1.

Step 4: extracting data about FE and SE duration

Now that we have collected all the necessary data, we simply need to do few calculations to retrieve the information about time spent in Storage Engine and Formula Engine by our queries.

In fact, with the collected data, we are able to obtain:

a)      Query Duration

Is the value of the column [Duration] for rows with [EventClass] = 10 (“Query End” event). Since this event is fired only one time per query – when the query ends – we are sure that every row represents a single query.

b)      Storage Engine Duration

For every query, it is the sum of the value of the column [Duration] for rows with [EventClass] = 11 (“Query Subcube” event). Since this event indicates the usage of the Storage Engine, here we could find zero, one or many rows per query.

Note:    even if this value is not accurate – because we should consider parallelism – , it can be considered adequate to our purpose.

c)      Formula Engine Duration = Query Duration – Storage Engine Duration

But how could we retrieve these values from our trace file?

Well, one of the big advantages of using XEvents is that we can easily query the trace file using this System Stored Function: sys.fn_xe_file_target_read_file

This is a simple query to obtain all the rows from the trace file:

And the following is an example of the result in Management Studio:

As you can see, our collected data are stored in the [event_data] field as XML text.

Hence, we can convert the information in XML data and modify the query – using XQuery and the Value() method – to focalize on data we need for our calculations:

Note:    since XEvent traces usually split the result in more than one file, we can easily query all trace files using the keyword “*” in the name of the file.

     In the above example, we queried all files generated from our trace, in this way:

that means “Read all files, whose name begin with ‘XEventMDXQueryTrace_’ and whose extension is ‘.xel’, contained in the folder ‘C:\ExtendedEvents\Traces’.”.

The above query produced this result on my machine:

 

Note:    to make T-SQL code more readable, and – at the end – to merge all steps in a single usable T-SQL statement, I will use one or more specific CTE for every step involved in our calculations.

Step 4a: retrieve Query Duration value

As stated above, Query Duration value is the simplest to obtain.

In fact, we only need to filter all the rows of the trace file with this condition [Event] = ‘QueryEnd’

Here an example of such a query and its result:

Every row represents a single query and it is uniquely individuates by its [ConnectionID] value.

The value in the [Duration] column is exactly the Query Duration value we need for our calculations.

Note:    as you can see, the trace also collected information about queries executed to retrieve traces information. Those rows will be removed in our final query.

Step 4b: retrieve Storage Engine Duration

The next step is to retrieve information about the time spent by the Storage Engine.

Also this value is very simple to obtain, because we only need to filter all the rows of the trace file with the condition [Event] = ‘QuerySubcube’ and, for every query – identified by its [ConnectionID] -, summarize the value of the [Duration] column as shown below:

The result has only two columns: the [Storage Engine Duration] that is the aggregated value we need and the [ConnectionID] that is the field that we will use later – in the complete version of the query – to join to data that we extracted in the Step 4a.

Step 4c: calculate Formula Engine Duration

In the last step we simply need to subtract the value retrieved in the Step 4b (Storage Engine Duration) from the value retrieved in the Step 4a (Query Duration).

We can easily do it using the [ConnectionID] field to join the data retrieved by the two queries:

and this is the result of the above query:

You can download here a XMLA command to collect also these events:

  • EventClass = 12  (Query Subcube Verbose)
    • EventSubclass = 22 (Non-cache data)
  • EventClass = 60 (Get Data From Aggregations)

and a query that gives you some other information like:

  • Percentage of time spent in Formula Engine
  • Percentage of time spent in Storage Engine
  • Number of Storage Engine queries
  • Number of Aggregations read

XEvents vs Profiler trace

The same result can also be obtained using a Profiler trace, but there are two main reasons that make me prefer XEvents:

1)      XEvents is a light weight performance monitoring system that uses very few performance resources

2)      Queries can be directly executed against XEvents trace files, while Profiler trace need to be loaded   into a table

On the other hand, Xevents is available only starting from SSAS 2012.

Anyway, if you are interested to try to use Profiler trace instead XEvents, you should take a look at this post by Bojan Penev before.

Conclusions

In this post I’ve tried to show you how to use the new SSAS XEvents feature to collect and querying basic – but very useful – information of MDX queries.

Starting from the proposed example, it should be quite simple to implement a more complete solution.

Hope you enjoy it.

SSAS – String Storage Calculator

Usually, in an OLAP cube, measures are numeric values and strings are used primarily as descriptions of dimension members.

In Analysis Services, strings are stored in separate structures – called “string stores” – because they have dynamic records size, and are less efficient – much less efficient – than numeric fields.

Furthermore, every string has an overhead of 12 additional bytes for management – so even a 1-byte string takes more room than an 8-byte numeric field.

For these two reasons alone we should limit the use of strings in our projects – that means we should use them only as “user friendly labels” for all dimension members – so, whenever possible, we should prefer integer values.

However, the worst thing about strings in Analysis Services – at least before SQL Server 2012 – is that the string store has an absolute maximum size of 4GB.

This limit applies to every single attribute of every dimension and, more precisely, to any of the following types of string file stores:

  • *.ksstore – Used to store strings representing keys of attribute members
  • *.asstore – Used to store the names of members
  • *.bsstore – Used to store the blobs of members

When one of these files exceeds the 4GB size limit, processing will fail and you will receive – usually after a long time – a message like this:

This is a “physical” limit of the engine, hence there is no workaround and to avoid this problem, you can try one or more of these potential solutions:

  • Do not use strings as key columns for attributes with a large number of members
  • Shorten attributes names
  • Shorten translations of attributes names
  • Switch to a ROLAP dimension

What I am interested to underline here, is that we can avoid spending a lot of time processing a big dimension only to discover at the end that the engine is not able to complete the process because of this problem.

In fact, we can easily estimate the storage needed by our dimension using this simple calculation:

                Total storage = regular storage + overhead storage

Where

                Regular storage = (members * characters * translations * 2)

                Overhead storage = (members * translations * 12)

And

                Members = number of members in our dimension

                Characters = number of characters in the string

                Translations = number of translations including the default language

                2 = constant value for conversion to Unicode characters

                12 = constant value for strings overhead

Here an example:

                2,000,000 members

               x 94 characters per member name

               x 10 translations

               x 2 for conversion to Unicode characters

               +

               2,000,000 members

               x 10 translations

               x 12 bytes overhead per string

               =

                4,000,000,000 bytes

               (We can approximate this value to the maximum limit even if the real value is

               1GB = 1,0243 bytes = (1,024 * 1,024 * 1,024) bytes = 1,073,741,824 bytes

               4GB = 1,073,741,824 * 4 = 4,294,967,296 bytes)

From the above calculations, it is easy to understand that, with a member name that is 94 characters long and 10 translations, our dimension is limited to 2 million members.

Well, the formula is not so complicated, but surely, it would be easier to have a simple utility to calculate the estimated storage occupation of our dimension attributes.

That was exactly my thought a few days ago when I had to do it a lot of time in a very big project.

Therefore, I decided to develop a very simple application during my – very little indeed – spare time.

I called it “SSAS – String Storage Calculator” and you can download it here:

SSAS - String Storage Calculator
SSAS – String Storage Calculator

The tool is very simple and I do not want to offend anyone’s intelligence trying to explain what is obvious!🙂

However, if anyone has any question about it, feel free to ask me.

Three more things:

1)      I have checked the correctness of the above formula by creating a dimension – similar to the one of the example – with these values:

    • 20 million members (instead of 2 million)
    • 94 characters names
    • 1 translation (instead of 10)

The estimated size is exactly 4GB and the real size of the *.asstore file was very close to this limit as you can see in this picture:

2)      The 4GB limit also apply to DistinctCount measures

3)      In SQL Server 2012 the 4GB limit has been removed *but* you have to keep worrying about it because, by default, the limit is still present, so you need to explicitly remove it by activating the “scalable string storage” option for every attribute that may exceed 4GB – and you should do it only for attributes that really need it.

Hence, the SSAS String Storage Calculator will be useful even developing solution in SSAS 2012!

In conclusion, this utility cannot solve the problem for you, but it can help you to save a lot of time identifying the problem before it happens.

I hope you enjoy it!