Sql-server – Dealing with large log files

performancequery-performancesql serversql-server-2008

I'm new to the database world and have recently started working with a large database with several tables with mainly varchar text and integers. The two largest tables are of ~50 million and 25~million rows. The database contains about 350,000 ID numbers for people and I often need to retrieve information about all individuals that involves joins to both of my very large tables (they're one to many relationships). These queries are also temporal in nature and involve the between operator to determine events that happen without a certain time frame. It will often take 10-15 minutes for some of these queries to run (I'm still learning and try new indexes to see if I can improve performance. After running out of ram running a particular query I had to my computer froze and I had to reboot. Even after restarting I was unable to detach, drop connections and delete my log files to delete my database (which was in recovery mode).

I booted into safe mode to delete the mdf and ldf files and saw that my log file was twice the size of the data file. If I routinely need to run queries that will return aggregate, temporal information on tables of the above-specified size, is there anything I can do to prevent log bloat? Also, I know SQL Server eats resources for lunch, but what type of specs would a computer need to have to run a query like the following with the table sizes listed above? (it takes an hour on my local machine)

EDIT: this database is static in nature and will not have anything added to it. it is also only unavailable to one user, me. I'm not sure what type of recovery it had, I don't have that PC in front of me at the moment.

query:

--my real database field names are not this generic
;with x
as
(   --the first CTE finds the number of different classes a person qualified for and lists a number for a given date
    select  rx.pat_id
           ,c.cal_date
           ,COUNT(distinct rx.class) as distinctClass
           from rx,Calendar as c
           where c.cal_date between rx.fill_date and rx.script_end_date
           group by rx.pat_id,c.cal_date

),y
as
(   --the second CTE gives interval number to the date so the max and min date values can be grouped by grp_nbr to give a time span
    select x.pat_id
          ,x.cal_date
          ,c2.id-ROW_NUMBER() over(partition by x.pat_id,x.distinctClass
                 order by x.cal_date) as grp_nbr
          ,x.distinctClass
          from x
          inner join Calendar as c2 
          on c2.cal_date=x.cal_date
)
--this groups the max and min dates with a particular grp_nbr and the number of different classes a person qualifies for
select y.pat_id,min(cal_date),max(cal_date),MIN(distinctClass)
    from y
    group by y.pat_id,grp_nbr
    order by MIN(distinctClass) desc

Best Answer

If the database is truly static in nature, putting it into read-only mode would prevent log growth. The queries that are causing the growth would fail however, so this might not actually be what you are looking for.

Put the database into simple recovery mode. That way the transaction logs will be routinely truncated. When in full mode, your logs will continue to grow until they are backed up.

USE [master]
GO
ALTER DATABASE [YourDatabase] SET RECOVERY SIMPLE WITH ROLLBACK IMMEDIATE
GO

Related Solutions

Sql-server – Large SQL Server advice

2GB of RAM is barely acceptable to run Windows, let alone SQL Server on top.

Obviously your 15GB of data doesn't fit into 2GB of RAM. So, any and every query you run that requires rows that aren't in RAM will require reads from disk.

From there, you can do the math on how much data you're requesting versus how fast your disks can deliver it. Given that the server is so woefully underspecified for memory, one can assume you've got just a single spinner delivering IOPS at miserably slow rate, perhaps 50MB/sec for a sequential read but as low as 1MB/sec with several queries fighting for their share.

Have 5 users each run a query that needs 1GB of data that isn't currently in RAM and you'll be waiting 15+ minutes for the data to crunch its way through the system.

RAM is comically cheap, especially compared to the cost of you wasting your time trying to optimise 15GB into 2GB. Fill the server, or buy a new one.

Also: the Pentium E6300 seems to be a desktop-class processor. You want a computer designed to be a server. Just adding a couple of disks and more RAM may not be satisfactory.

Sql-server – Help required with query (join) optimization

The problem is the use of scalar functions. These are executed once per reference per row, and the current internal implementation is such that this is very nearly as expensive as running a separate query per invocation (18 million times, for 6 million rows with three function references per row).

A quick solution is to convert the function to an in-line table-valued function. These are in-lined into the query text, much in the same way views are expanded before query optimization. So the first step is to translate the function to:

CREATE FUNCTION dbo.dFK_InLine
    (@dt as sql_variant)
RETURNS table
AS
RETURN
    SELECT
        ReturnValue =
        CASE 
            WHEN CA.datatype = 'int' 
                AND @dt >= 19000101 AND @dt <= 20451231
                THEN CONVERT(integer, @dt)
            WHEN (CA.datatype = 'int' AND @dt < 19000101) 
                OR (CA.datatype = 'int' AND @dt > 20451231)
                THEN 1
            WHEN @dt IS NULL 
            THEN 1
            WHEN (@dt < CAST('1900-01-01 00:00:00.000' AS DATETIME) 
                OR @dt > CAST('2045-12-31 11:59:59.000' AS DATETIME)) 
                AND CA.datatype = 'datetime' 
            THEN 1
            WHEN (@dt < CAST('1900-01-01' AS DATE) 
                OR @dt > CAST('2045-12-31' AS DATE)) 
                AND CA.datatype = 'date' 
            THEN 1
            ELSE FORMAT(CAST(@dt AS DATETIME2), 'yyyyMMdd')
        END
    FROM
    (
        VALUES
        (
            CONVERT(varchar(128), SQL_VARIANT_PROPERTY(@dt, 'BaseType'))
        )
    ) AS CA (datatype);

Then modify the source query to use it:

SELECT 
    SD.ReturnValue AS StartDate, -- INTEGER DATE!
    ED.ReturnValue AS EndDate,
    EFD.ReturnValue AS EffectiveDate
FROM OpenEnrollmentWindow oew
CROSS APPLY dbo.dFK_InLine(oew.StartDate) AS SD
CROSS APPLY dbo.dFK_InLine(oew.EndDate) AS ED
CROSS APPLY dbo.dFK_InLine(oew.EffectiveDate) AS EFD
INNER JOIN ProductYear py 
    ON oew.OrganizationProductYearID = py.ID
INNER JOIN Marketplace m 
    ON py.MarketplaceID = m.ID
INNER JOIN Organization o 
    ON m.OrganizationID = o.ID
INNER JOIN Consumer c 
    ON c.OrganizationID = o.ID
LEFT JOIN OpenEnrollmentWindowProduct oewp 
    ON oew.ID = oewp.OrganizationOpenEnrollmentWindowID
LEFT JOIN OpenEnrollmentWindowProductType oewpt 
    ON oew.ID = oewpt.OrganizationOpenEnrollmentWindowID;

All that said, this is still quite an...unusual strategy, particularly the use of sql_variant and the CASE logic. You might get better value from refactoring the design to use a strong types, and a more traditional model.

Best Answer

Related Solutions

Sql-server – Large SQL Server advice

Sql-server – Help required with query (join) optimization

Related Question