Sql-server – ETL, Production, Development — Should these be different servers? Instances

etlsql serverssis

You can say I've become an "accidental DBA" as people have left this company and I'm managing many aspects of the BI system now suddenly.

My predecessor left quite the mess regardless but I'm trying to think of the best way forward in terms of hardware architecture and logical partitioning.

We have a SQL-Server Data Warehouse. We have ETL scripts (on the same server). We don't have a Test-Production split for database development.

My questions:

  1. Should the ETL scripts (SSIS and Pentaho PDI) be located on the same server as the data warehouse?

I understand this is a complicated question. My understanding is this.

Pros for separate ETL server: Contrary to belief, SSIS and SQL Server don't play nicely via memory management and SQL Server plays best by itself. Scalability and optimization generally favor a separate ETL environment as ETL processes have entirely different needs (far fewer backup needs, far less storage needs).

Cons: Well, additional hardware costs. Additional SQL Server license costs in the case of SSIS (an additional one if separated) so for Standard edition it's an extra $900 and for enterprise and additional $13k or whatever. ALSO, our data warehouse is only used during the day (by developers) – ETL is only run overnight – there is little overlap. Only potentially backup processes may interfere. So maybe the 'competition' aspects are overblown and resource-pooling (for more CPU hardware, which our company IS FRUGAL when it comes to hardware) is advantageous.

  1. Same question but for Test and Production environments.

What exactly is the point of an "instance" separation, as opposed to a database (one lower) or server (one higher) separation?

I guess they have wider configuration options, but — well even servers can be running on the same physical hardware/ computer. What are the pros of separation Test/ Prod databases via 2 servers or just 2 instances? I take it with 2 instances you're sacrificing configuration for simplicity?

Best Answer

There are probably a hundred reasons if we think about it really hard, but there are really just a few reasons that are so important that we never bother thinking of others:

  • Resources: If you haven't had a situation yet were someone did something on a non-production server that sucked up all the memory, CPU, or I/O, you will eventually have that if you continue to work in IT long enough. You don't want this happening in production. Yeah, the developers won't do any testing at night when the production ETL processes are running--until they are up against a deadline. Even if you use resource governing, separate instances--whatever, again, once you've been around long enough (my apologies if you feel that you have), you'll still not want them running on the same box.

  • Security: Ideally you want to have Active Directory groups for any access that doesn't use a SQL login. If your development and production are on the same instance, then all of the developers have some access to the production instance. Ideally you want to allow only as few people as possible the ability to even log in to the production instance

  • Changes: You are limited to what changes can be tested if all environments use the same system. Can't test a new service pack or cumulative update if everything's on the same instance.

  • The Future: If you have separate systems for each environment, you will be much more likely to be able to accommodate future needs. For example, isolating non-production systems from production systems is becoming A Thing for security purposes. So workstations can talk to dev and prod, but dev and prod servers are prohibited from communicating. Can't do this if it's all on the same system.

  • Reduced Probability of Accidents: If you have "dev" in the name of every server, database, SQL login, etc. that is in the development environment, you are much less likely to have someone accidentally run something in production. Lots of horror stories out there when credentials were the same name and then it just took a server name typo to bring down production.

Using multiple instances would certainly be a step in the right direction, but you still have risk of resource contention, inability to test impact of OS updates, security changes to OS, different driver versions, etc.

The same principles apply as to whether to run the SSIS jobs on the same server. Ideally, you would have a separate SSIS server so that you avoid the resource contention--you seem to have a good grasp of that. Whether it's worth it for you organization to pay for the additional hardware and software licenses--no way for us to tell. Present it to management, and if they decline then you are covered if things go badly due to too many things running on one system.