SQL Server – Database Engine and SSISDB Architecture Advice

clusteringhigh-availabilitysql serversql-server-2017ssis

I'm seeking some architecture advice involving the database engine and Integration Services components of SQL Server 2017. We have an existing SQL Server environment which is due for an upgrade and has historically been used in a less than ideal way. With an upgrade to a new SQL Server environment now on the horizon, as a systems administrator I want to use this opportunity to rethink the existing architecture to provide a more robust flexible environment going forward.

The Current Environment

A SQL Server 2012 Enterprise instance running on a 2-node Windows Failover Cluster. The database engine is clustered and DTS packages are stored in a clustered volume and these packages are executed as jobs via the SQL Server Agent. They do not use the explicit Integration Services component

The Problem

Staff who administer the data warehouse and create/manage the DTS packages have traditionally insisted on working directly on the clustered server nodes via RDP. This creates access issues with the systems admins who are responsible for the servers themselves. The data staff also require a variety of software tools which has led to persistent disk space issues over time. Additionally, many of these software tools are not cluster-aware and are only installed on one node, effectively leaving us with a crippled cluster right from the start. The result is that any time there is a planned or unplanned failover, there is pressure to get the original primary node back online asap because most jobs are failing because they rely on local software which is not cluster-aware.

A Proposed Solution

Install the database engine alone on its own cluster with no RDP access allowed by any staff. They will receive just the SQL Server clustered instance endpoint URL. Install the Integration Services component on a separate server and propose they begin using the SSISDB catalog to store their packages instead of the file system. My understanding is that there are many advantages to using the SSISDB.

Questions

  1. This only shifts the RDP access issue to another server. Because the data staff collaborate in real-time on packages and rely on all the same software tools, they argue they must have direct access to the server in order to provide a shared identical environment. My belief is that there must be a more modern design approach where they can use their development tools on their local workstations and connect to SSIS using the SSMS, where the jobs will be published and scheduled/executed. Any software that packages rely on (ftp, ssh, perl, python, etc) would be installed and managed by server admins on the SSIS. Since packages would be stored in the SSISDB, there would no longer be a need for the shared storage which has traditionally housed the package files and this should remove the need for any direct RDP access. Is my belief logical or is there still some argument for data developers to have direct server access?
  2. Now the SSISDB becomes a single point of failure since the packages are no longer part of the failover cluster as they traditionally have been. How would I build high availability into this component? I've read about SSIS Scale Out features but that seems more about balancing load rather than fault tolerance. I've begun reading about AlwaysOn availability groups, could that be a solution?

Any and all design advice is much appreciated. Thanks.

Best Answer

This is a culture change as much as it a process change, so having management support you will be important, and ultimately shield you from some level of technical resistance and aid in broadcasting how this strategic shift will improve the lives of everyone.

Before we get into the specific case of the SSIS Catalog, let’s look at an overview of your options to better arm you with the knowledge to communicate why developers should change their system:

  1. File system
    • Individual Package are stored on any folder, any location.
    • No security, files can be modified by any user or process without prior knowledge and no process natively exists to prevent this from occurring.
    • Not part of the regular backup schedule. System snapshots are not a viable backup and do not prevent accidentally modifications that can cause irreparable damage.
    • Must manually organize packages and no ability to group packages outside physical location and/or documentation
    • Forced to use Configuration Files to parameterize values
  2. SSIS Package Store SSIS Package Store
    • Packages by default are stored specific folder “C:\Program Files\Microsoft SQL Server\<SQL Version Compiled>\DTS\Packages”. Can be moved.
    • Limited security; can be modified by elevated permissions
    • Not part of any backup strategy outside system snapshots (again, NOT A VALID BACKUP SOLUTION)
    • Can organize packages under Folders
    • Cannot use roles to restrict access to packages from the SSIS Package Store
  3. SQL Server (msdb) IS Package Store (msdb)
    • Packages are stored in msdb and can be exported to a file-share.
    • Security is tied to access to msdb.
    • System msdb is part of a valid backup solution.
    • Cannot be added to an AlwaysOn solution
    • Folders can be used to categorize
    • Can use Roles to manage permissions in 2016+ that avoid using sysadmin. Integration Services Roles (SSIS Service) - MS Docs
  4. SSISDB Catalog SSIS Catalog
    • Packaged are stored exclusively inside SSISDB that is encrypted with a certificate
    • Security is further strengthened by separate permissions and roles (see above link).
    • SSISDB is part of a valid backup solution
    • SQL Server 2016+ has added support for AlwaysOn! SQL Server 2016 HA Series Part 2 – SSIS and Availability Groups – Blogs Microsoft
    • Mature management hierarchy that includes Parameters, Environments Variables.
    • Offers Full integration with Visual Studio
    • SQL Agent Jobs no longer directly use configuration file references or actual passwords (hope no one is doing this anyways) from being scripted out from SSMS.

Ultimately, you can decide what path will work for your organization. SSIS Catalog is one of several methods, so getting your management to support you in this endeavor will go a long way to making changes. After describing an example scheme for deployment, the rest of this post will highlights some Visual Studio Management and SSIS Catalog Tips.

An example scheme could be the following:

  • All new efforts will use [Visual Studio] 2017 or 2015 for backwards compatibility and SQL Server Data Tools - (Download SQL Server Data Tools (SSDT) - docs.microsoft.com). o note in VS 2019 they added this as a marketplace component)
  • projects will be saved in the code management folder (GitHub folder) so efforts are versioned or they will use the SSIS Catalog on a DEV environment, though ideally use the GitHub Repository.
  • Package components are installed on each node for Dev/Cert environments first. The Enterprise DBA assists in gathering these downloads in a single file-share for ease of access/ transportation to target clustered servers and ensures they are on each target server.

BONUS SECTION:

Visual Studio Management Tips:

  1. Your Package depends on both the Deployment Version (must match Integration Services of SQL Server instance)

enter image description here

  • And the 32-bit/64-bit configuration depending on the components used. enter image description here

    1. In order to use the Parameters, you must either user a package “parameter” (not variables) or Project Parameter. You an use these to environmentally control what connections are used.

enter image description here

  1. You can connect and deploy to your target location (it doesn’t HAVE to use the SSISDB)

enter image description here enter image description here

SSISDB Catalog Management Tips:

1) You can Restore Versions with a click of a button to undo negative changes or even revert forwards!

enter image description here enter image description here

2) You can manage project parameters, validate, control versions from SSMS. No VS required.

enter image description here

  • Configure the Environment Parameters from here:

enter image description here
enter image description here

  • Passwords, unlike in the SQL Server Agent jobs, are securely protected inside the SSISDB database.

Example:

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'<name of step>', 
        @step_id=1, 
        @cmdexec_success_code=0, 
        @on_success_action=3, 
        @on_success_step_id=0, 
        @on_fail_action=2, 
        @on_fail_step_id=0, 
        @retry_attempts=0, 
        @retry_interval=0, 
        @os_run_priority=0, @subsystem=N'SSIS', 
        @command=N'/ISSERVER "\"\SSISDB\Replication\SSISDB_Replication\SSISDB_Replication.dtsx\"" /SERVER <ServerName> /ENVREFERENCE 3 /Par "\"$ServerOption::LOGGING_LEVEL(Int16)\"";1 /Par "\"$ServerOption::SYNCHRONIZED(Boolean)\"";True /CALLERINFO SQLAGENT /REPORTING E', 
        @database_name=N'master', 
        @flags=0

3) Each SQL Server Agent Job then uses these parameters after you select the environments set.

  • Pick the Package Source SSIS Catalog.

enter image description here

  • Set the environment for your variables.

enter image description here

  • In most cases, the parameters will automatically be set.

enter image description here

4) Check what actually went wrong inside SSMS without resorting to RDPing into your server.

enter image description here

5) See performance between runs that uses real statistics!

enter image description here

Am I missing anything? Feel free to make this a community answer.