Sql-server – Where and how to store the weird datamart

data-warehousedatabase-designperformancesql serverssrs

I could really use some help here.

Here's my situation.

I'm building a datamart in SQL Server 2005, which will furnish a single report (for now). We have 26 clients (medical organizations) that will be using this report. Each client will have between 1 and 2,000 users accessing it at any given time (most likely to average in the 50's but it needs to scale). There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.

The interface for said report is SSRS 2012. This is our first 2012 report – the rest are still 2005, and the rdl will access a 2005 database. I am not using OLAP; the SSRS report runs SPs and views.

We built a virtual server and installed SQL 2012 to host the report (both these virtual servers live on the same physical machine, in case that matters). Nothing else will be running on the SQL 2012 virtual server.

These are the facts about the environment…

Our system is not OLTP heavy – with one exception, which I'll describe below, it's all Read (except for the ETL of course).

We have one client-facing database per client, 26 altogether. In these we store transactional data, rolled-up data, some report-ready flat tables and a ton of T-SQL code that crunches numbers when clients pull up reports in SSRS. I'll refer to these databases as "operational" because, for me, they will essentially function as ODS's.

The operational databases are loaded by a horrific ETL process (clients have varying load schedules – usually monthly or weekly). I'll be building a mini ETL process (hopefully not so horrific) to populate the datamart from those operational databases.

All of the datamart's dimensions are conforming, but due to HIPAA restrictions, some of them (like Physician and Patient) can't be stored in a central database, and neither can the fact table. So there will need to be 26 versions of the same fact and dimension tables, one for each client.

There is a real-time component to our system. Physicians and nurses can enter transactional data via our site and all reports need to reflect the changes immediately. As for the datamart, they'll only affect the fact table. For what it's worth, this is why I decided not to use SSAS. I know diffs process really quickly but it just feels like too many moving parts.

I plan to create a trickle-in fact table with a view that combines it with the main fact table. Again, I'll need 26 of these. A pared down version of my new ETL process will need to run upon each user edit.

Here are my questions…

  1. Where should I store the 26 sets of datamart tables?
    • On a dedicated 2005 server, away from the SSRS server and the operational databases?
    • On the same server as the operational databases but in dedicated dds databases?
    • Within the operational databases themselves?
    • On the SQL 2012 reporting server?
    • On the moon? Other?
  2. Where should I store the 26 trickle-in fact tables?
    • In the same database as the main fact tables?
    • On the same server as the operational databases but in dedicated DDS databases?
    • Within the operational databases themselves? This seems logical to me given that they'll need to be unioned at runtime…
  3. Should I create a central database for non-sensitive dimensions?
    • Maybe create a cloning process to copy them to the individual DDS's?
    • Or is it simpler to just have 26 of these darned things as well?

With all of these questions, I’m concerned about good design practice, but mostly about the performance of the report and the performance of the ETL that will need to run upon user edit.

I hope all of this made sense. I would very much appreciate any feedback!

EDIT: @Jon Seigel – There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.

EDIT: @JNK – I merged accounts and read the faq's. Hopefully I'm responding correctly now.

Best Answer

My initial thoughts would be solution of two DBs per client, one for the datamart and one for the ODS. The ODS would be in full recovery mode as I don't think you can recreate the data here without retrieving it again from the upstream sources at your clients, if at all. The datamart may be in simple recovery mode as all the source data is coming from the ODS and can reloaded from scratch if needed, but the trickle-in fact table makes it complicated. I'm inclined to move the fact table into the ODS to have better options to enforce RI and other data cleansing needs before it goes to the datamart (via a DML trigger or seperate scheduled ETL perhaps). Also, if you can make the datamart DB read only in between ETL cycles, you'll benefit from not having SQL have to manage locks during the report requests.

I don't think you have an option to use a central schema/DB for the non-sensitive content. From a scalability perspective, having each client's data structures in their own containers allows you migration options to faster disks or another host if needed without impacting the other clients or sustaining a larger downtime. From a business logic perspective, if a particular client needs a business rule change that impacts the schema, will the central DB be able to accomodate it without negative impact? Also from scalability, think of the risks caused by locks on a table caused by one client impacting other clients concurrently connected.