Sql-server – Best Setup for Database Cluster

clusteringsql-server-2008

I need to set up a good database clustering solution for a system that I am putting into production. I was hoping to get some advice from the experts as I am a primarily a developer and not a database administrator.

Background
My team and I are replacing a legacy system with a .NET-based system. Currently, we are doing parallel testing and slowly migrating users over to the new system; however, it strikes me that our current database solution is not adequate to handle the load once thigs really start going. Right now I have our entire database on one Dell PowerEdge Server running SQL Server 2008 R2 on it. Its been fine for development and for basic testing but when we decide to pull the plug on the old system and go forth completely with the new system we will need things like failover, load balancing, redundancy, etc. I have never had to implement a cluster before because everywhere I worked this was already implemented, and was not my responsibility to maintain. Since we are implementing the PC-based system from scratch I am looking for a starting point.

Recommendations
What do you recommend for my current situation: We have about 50 internal users that will be doing everything from data entry, claims processing, and the like, and the possibility of about 1500-2000 concurrent connections from outside. Is it better to do a virtual or physical setup? Fibre Channel or SATA?

Best Answer

Failover is easy to do. That just requires shared storage, and a couple of servers.

SQL Server doesn't support scale out to multiple nodes. However a properly designed database with only a couple of thousand users should be able to run on a dual chip server pretty easily. My companies systems support about 35k transactions per second on a quad chip server (with 6 cores per chip).

Like any database server you'll want to use multiple LUNs for each part of the system. 1 for data, 1 for indexes, 1 for transaction logs, one for tempdb.

If you can't afford a SAN (you'll be looking at a six figure purchase for the SAN alone) then you should look at database mirroring instead. This when used with a witness server can give you automatic failover in the event of a failure, and doesn't require shared storage. The different volumes for storage should still be used.

Before you can design your high availability solution (clustering, mirroring, log shipping, etc) you need to figure out what you are trying to protect against. How much data are you willing to loose? How long can you afford to be down during failover?

Related Solutions

SQL Server – Best Practices for Developers to Follow for Database Changes

In a VS environment, I've always used database projects to implement the update scripts. I tend to use unimaginative names like "DatabaseUpdate17.sql" or "PriceUpdateFebruary2010.sql" for my scripts. Having them as database projects lets me tie them to Team Server tasks, bugs (and if we did code reviews, to them as well). I also include in each database (that I have authority over) a table specifically for the collection of changes to the schema.

CREATE TABLE [dbo].[AuditDDL](
    [EventID] [int] IDENTITY(1,1) PRIMARY KEY NOT NULL,
    [EventData] [xml] NULL,                    -- what did they do
    [EventUser] varchar(100) NOT NULL,         -- who did it
    [EventTime] [datetime] DEFAULT (getdate()) -- when did they do it
    )
GO

Well, that takes care of 3 of the 6 Ws.

CREATE TRIGGER [trgAuditDDL]
ON DATABASE 
FOR DDL_DATABASE_LEVEL_EVENTS 
AS
INSERT INTO AuditDDL(EventData, EventUser)
SELECT EVENTDATA(), original_login()
GO

I include an insert statement to log the beginning of a patch as well as the end of a patch. Events happening outside of patches are things to look into.

For example, a "begin patch" insert for "patch 17" would look like:

INSERT INTO [dbo].[AuditDDL]
           ([EventData]
           ,[EventUser])
     VALUES
           ('<EVENT_INSTANCE><EventType>BEGIN PATCH 17</EventType></EVENT_INSTANCE>'
           ,ORIGINAL_LOGIN())
GO

Since it also catches when indices are rebuilt, you'll need to run the following every month or so to clear out those events:

DELETE FROM AuditDDL
WHERE [EventData].exist('/EVENT_INSTANCE/EventType/text()[fn:contains(.,"ALTER_INDEX")]') =1
GO

DELETE FROM AuditDDL
WHERE [EventData].exist('/EVENT_INSTANCE/EventType/text()[fn:contains(.,"UPDATE_STATISTICS")]') =1
GO

Earlier version previously posted on Server Fault.

In a SOX and PCI-DSS compliant environment, you will never have access to the production servers. Therefore the scripts need to be clear and exercised beforehand. The comments at the top of the update scripts include lists of new tables, stored procs, functions, etc as well as lists of modified tables, stored procs, functions, etc. If data gets modified, explain what is being modified and why.

A secondary problem is that you sometimes end up needing to manually coordinate changes if two separate tasks change the same database object. This may just be the way it is but it still seems like there should be some automated way of "flagging" these issues or something.

I've never come across a tool that lets us track this automatically. Previous employers used a principle of "database owner" - one and only one person who is personally in charge of the database. This person won't be the only developer working against that database, but rather all changes have to go through them. This has worked reasonably well to keep changes from colliding and damaging each other.

Mysql – Best architecture solution for heavy data mining system, that can run on cloud

The question is quite open to discussion with such few details. What kind of calculations are performed with what density? What kind of an application server are you using? What amount and what kind of data comes in and goes out? What's the acceptable latency between the application server and the database / clients and application server? Thus, are there any requirements to keep the application server in-house / will it need to be clustered? Will there be any business logic implemented in the database as stored procedures and stored functions? If so, will it be in the native SQL dialect, or will it be in any external language like C/C++/Python/Java etc?

I won't be able to single out and suggest a provider or strategy, due to the lack of details and due to the fact that (for the last few years) I'm generally managing DB servers deployed in-house or on dedicated servers on the cloud. But first of all, I strongly suggest you phase out MySQL.

Having the power of Oracle or the ease of use of MSSQL would be nice, of course, but I won't push such a proprietary system with (possibly) high cost of licencing.

Instead I strongly suggest moving towards PostgreSQL. Especially for stability and scalability issues. But the strategy to pick (will it be rented service, a platform or rented dedicated servers? Will there be a cluster, and if yes, which kind? What should be the population of the cluster, and what kind of network topology should be implemented) is all up to the details. It's kind of hard to suggest more than saying "build a cluster of PostgreSQL DB servers, consisting of a couple of dedicated hardware, within the compounds of a trusted infrastructure provider" with so few details. And maybe: "Keep an acceptable amount of business logic in stored functions and procedures if the application server will be residing in-house and the network latency between it and the DB cluster will be an issue." Maybe you need more, or maybe you don't even need all that hassle.

And if by an unlucky decision, you end up sticking with MySQL, do not even think about using MyISAM.

Best Answer

Related Solutions

SQL Server – Best Practices for Developers to Follow for Database Changes

Mysql – Best architecture solution for heavy data mining system, that can run on cloud

Related Question