SQL Analysis Services – SSAS

ssasssis

Do you know what the best practices would be regarding SSAS and having real-time reports.

I am not sure if we should be pulling data off our live database or using SSIS to move data periodically to a Data warehouse database before re-processing the cube.

Any advice on what the best approach is would be greatly appreciated.

Best Answer

Operational vs. analytic reports

Ad-hoc reporting against an operational system is a bad idea, so the answer depends on what your real-time requirements are.

  • In most cases the actual realtime requirement is for a handful of operational reports. These are usually not aggregate or statistical in nature, and tend to be exception reports, status reports or to-do lists. They should not typically hit large amounts of data, but rather select certain records that match the criteria of the report.

    Operational reports are almost always tied to a specific process and never analytic in nature. Usually, analytic reports can be run off a periodic load and do not need up-to date data. There are a few exceptions to this such as market data feeds, but these are exceptional cases.

    In this case you should do a periodic load to a data mart, data extract or data warehouse system for the analytic reports and build the operational reports as bespoke, tuned reports. Run them of a replicated database instance if possible to reduce transient load on the operational system database.

  • Some applications do need real-time or near-realtime (low latency) data to be loaded into an analytic system. A couple of examples of this type of requirement are market data analytic systems or accounts reporting systems. In the former case the system is used by traders to run statistical analytics on the data and in the latter case there is typically a requirement for the accountants to be able to prepare and enter a journal and then report on it immediately.

    These are really operational reports in disguise, and tend to have their SLA tied to the source system. One thing these systems have in common is that they tend to have a relatively straightforward model for the underlying data.

  • You can front certain operational databases with a ROLAP toolthat queries the underlying schema. It is rarely a good idea to do this with an operational system database as they tend not to lend themselves to efficient aggregate queries.

    I have done this on one occasion with an accounts system and a report model - in theory a cube could have also been used. Oracle also supplies a discoverer based ad-hoc reporting tool for Oracle Financials that works similarly. However the native data model of an accounting system is actually quite close to a snowflake schema so you can sort of get away with this. However the reports tend to be quite slow and they still bog down the system if it gets loaded.

  • If you really want ad-hoc analytics on a low-latency source you could build a low-latency ETL process that populates a data mart. This sort of thing is more fiddly and complicated than a traditional ETL process, so you don't want to do this for applications where there is no genuine requirement. You can put a cube with a leading ROLAP partition on this and store the historical data as MOLAP. I have been involved in building a system that works like this.

Low-latency analytic systems

Depending on your data source you can use a changed-data capture mechanism or polling mechanism to identify new transactions. This process can then trickle the data into the data mart. If you want really low latency and control the source database you can implement triggers that push out transactions to the data mart, although this option may place siginifcant load on your system.

Leading ROLAP partition on a cube

A low-latency cube can be done with a hybrid architecture. Leading data (e.g. for the current open accounting period) can be queried through a ROLAP partition, which will issue queries against the underlying data. Closed or historical data is managed through MOLAP partitions that have aggregations. This gives you near-realtime data without having to issue database queries on large data volumes and the performance benefits of aggregations on the historical data.

This has a few caveats. Pure ROLAP dimensions substantially restrict the features that can be used on the measure groups (e.g. no aggregations except SUM or COUNT). If you expect incremental updates to dimensional data you are better off with MOLAP dimensions and an incremental update process triggered off from the job that updates the underlying dimension table. This is quite efficient if you process the dimension incrementally.

Testing this type of system is somewhat fiddly, as you have to make sure your changed data capture or incremental load mechanism is working correctly. You will probably need to create a test harness that can post transactions into the system in order to run unit tests.