SQL Server – Similarities Between T-SQL and MDX

learningmdxsql serverssast-sql

I have used T-SQL (CTEs, windowed functions, recursive CTEs, large data, table-valued functions, APPLY, PIVOT, stored procedures, etc.), but have never queried a Microsoft SSAS cube.

Are there similar concepts in MDX for the following T-SQL concepts?

PIVOT
UNPIVOT
Windowed functions
Recursive Common Table Expressions

Are there any pitfalls for a T-SQL user using MDX?

I am hoping there are some common analogies between T-SQL and MDX, that will help someone good at T-SQL transition to MDX.

It would also be great if there was a guide for common mistakes that are made when coming from T-SQL into MDX.

Best Answer

General notes

First of all, T-SQL and MDX are 2 completely different beasts. SQL is intended to query tabular relational data, while MDX is intended to query multidimensional data.

I'll start with addressing your question about common mistakes. In my opinion the most common mistake is to try and apply SQL knowledge to MDX. You really need to get into the multidimensional mindset and forget about thinking in relational structures.

For example, you don't query columns and rows but you slice dimensions and measures. See for example my answer here.

Pretty much the only thing in MDX that resembles SQL is the fact that you have a SELECT and a FROM and a WHERE clause but MDX doesn't necessarily return rows and columns but returns "axes", of which you can have more than 2.

Addressing some of your points will probably show you why SQL concepts don't map to MDX concepts:

Pivot and unpivot

You pretty much don't do this in MDX. If you look at the MDX select syntax you'll see an example like this (returning in this case a 2 dimensional result):

SELECT   
    [Date].[Calendar].[First8Months2003] ON COLUMNS,  
    [Product].[Category].Children ON ROWS  
FROM  
    [Adventure Works]  
WHERE  
    [Measures].[Order Quantity]

Say, you want to pivot that resultset, you just swap the axes around:

SELECT   
    [Product].[Category].Children ON COLUMNS,  
    [Date].[Calendar].[First8Months2003] ON ROWS  
FROM  
    [Adventure Works]  
WHERE  
    [Measures].[Order Quantity]

Bear in mind that the COLUMNS and ROWS axes are just shorthand names, you could also do:

SELECT   
    [Product].[Category].Children ON 0,  
    [Date].[Calendar].[First8Months2003] ON 1  
FROM  
    [Adventure Works]  
WHERE  
    [Measures].[Order Quantity]

and add a third axis and a fourth axis if you wish to do so (and if your client supports rendering that), so the result isn't a table but rather a multidimensional cellset.

Obviously, pivot and unpivot wouldn't keep up when you have more than 2 axes in your result set.

Recursive CTE's

Since MDX doesn't return "tables", there is no such thing as table expressions. You can have recursion in your calculated measures if that is something you need.

When you encounter the WITH keyword, it's defining Query-Scoped Calculated Members.

Window functions

Window functions also have no use in MDX, since they are used to calculate things that are already calculated in your cube and calculated measures. The OVER clause is pretty much the dimension members you put on your axes.

Take for example a running total. In SQL you would write something along the lines of

SELECT SUM(internet_sales) OVER (ORDER BY date) AS running_total
FROM sales;

However, in MDX you would have a calendar dimension and a sales amount measure, and your MDX would look something like:

WITH MEMBER [Measures].[Sum from start]
AS
SUM ({NULL:[Date].[Calendar].CurrentMember},
[Measures].[Internet Sales Amount])
SELECT {[Measures].[Internet Sales Amount],[Measures].[Sum from start]} ON 0 ,
[Date].[Calendar].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]

Conclusion

As you can see, trying to solve MDX problems by applying T-SQL principles, you will likely run into problems.

If you want to learn MDX, you need to learn MDX as if it were a completely new language and not a new syntax, because that's what it is, a different language and not a SQL dialect.

Related Solutions

Differences Between SQL I/O Sim and SQL IO Tools

SQLIO is a generic tool for testing your disk subsystem. It allows you to specify whether to test random/sequential, data block sizes, queue depth, threading, etc.

SQLIOSim is a tool that tries to emulate the patterns that SQL Server would expose to your system.

Usually I'll use SQLIO to test the subsystem when benchmarking for raw specs. Once satisfied with my SQLIO results I'll run SQLIOSim to get a real world run through of the disk subsystem.

You might want to see Brent Ozars tutorial on SQLIO, which also mentions SQLIOSim: http://www.brentozar.com/archive/2008/09/finding-your-san-bottlenecks-with-sqlio/

Sql-server – need suggestions to improve view performance

Views are typically not implemented for performance. And while you currently cannot implement explicit indexed views (which are just views that SQL Server maintains for you), you can certainly maintain facts manually yourself.

For example, you mention that you currently calculate "whether someone is dead or not" using three CTEs and a CASE expression (sorry, to be pedantic, it's not a statement).

Instead of referencing this set of CTEs every time the view is accessed, why not put that fact in a table (potentially along with other facts that have to be calculated per user), and calculate that periodically in the background? So maybe every 5 minutes (that is just a SWAG, you'll have to determine what's appropriate), you run a SQL Server Agent job that re-populates the table based on what it currently knows is the truth. Now the view just has to reference the table that is the output of that script, instead of calculating it over and over again while the users wait. So for example:

CREATE TABLE dbo.PersonProperties
(
  PersonID INT PRIMARY KEY REFERENCES dbo.Persons(PersonID),
  IsDead BIT NOT NULL DEFAULT 0
);

Now the job can simply merge that table with the results of the CTE, and then the view can include a reference to that table which simply pulls the BIT column along with a join on the PK. This should be MUCH less expensive at query time that re-evaluating all of that logic every time.

To minimize blocking (e.g. when users are accessing the view at the same time the job is running), you can implement what I call "schema switch-a-roo" and which I blogged about here:

http://www.sqlperformance.com/2012/08/t-sql-queries/t-sql-tuesday-schema-switch-a-roo

So instead of locking resources on the expensive query throughout the operation, the only blocking that happens is when the metadata switch actually takes place.

This works as long as you can afford some brief periods where the data is not accurate. You can tighten up that window to be pretty narrow, but there is always a chance that a person will die in between and for a brief moment a query will return that they are still alive. If you can't afford this, then you make it a part of the process that first introduces that fact to the database to make sure the CTE reflects that immediately and the new table also reflects it immediately.

Still not good enough? The flag a user as "dirty" the second a change for them comes in. The view can union or left join with the stale data for users that are "clean" and go after live data only for the users that are "dirty."