Sql-server – Dealing with large log files

performancequery-performancesql serversql-server-2008

I'm new to the database world and have recently started working with a large database with several tables with mainly varchar text and integers. The two largest tables are of ~50 million and 25~million rows. The database contains about 350,000 ID numbers for people and I often need to retrieve information about all individuals that involves joins to both of my very large tables (they're one to many relationships). These queries are also temporal in nature and involve the between operator to determine events that happen without a certain time frame. It will often take 10-15 minutes for some of these queries to run (I'm still learning and try new indexes to see if I can improve performance. After running out of ram running a particular query I had to my computer froze and I had to reboot. Even after restarting I was unable to detach, drop connections and delete my log files to delete my database (which was in recovery mode).

I booted into safe mode to delete the mdf and ldf files and saw that my log file was twice the size of the data file. If I routinely need to run queries that will return aggregate, temporal information on tables of the above-specified size, is there anything I can do to prevent log bloat? Also, I know SQL Server eats resources for lunch, but what type of specs would a computer need to have to run a query like the following with the table sizes listed above? (it takes an hour on my local machine)

EDIT: this database is static in nature and will not have anything added to it. it is also only unavailable to one user, me. I'm not sure what type of recovery it had, I don't have that PC in front of me at the moment.

query:

--my real database field names are not this generic
;with x
as
(   --the first CTE finds the number of different classes a person qualified for and lists a number for a given date
    select  rx.pat_id
           ,c.cal_date
           ,COUNT(distinct rx.class) as distinctClass
           from rx,Calendar as c
           where c.cal_date between rx.fill_date and rx.script_end_date
           group by rx.pat_id,c.cal_date

),y
as
(   --the second CTE gives interval number to the date so the max and min date values can be grouped by grp_nbr to give a time span
    select x.pat_id
          ,x.cal_date
          ,c2.id-ROW_NUMBER() over(partition by x.pat_id,x.distinctClass
                 order by x.cal_date) as grp_nbr
          ,x.distinctClass
          from x
          inner join Calendar as c2 
          on c2.cal_date=x.cal_date
)
--this groups the max and min dates with a particular grp_nbr and the number of different classes a person qualifies for
select y.pat_id,min(cal_date),max(cal_date),MIN(distinctClass)
    from y
    group by y.pat_id,grp_nbr
    order by MIN(distinctClass) desc

Best Answer

If the database is truly static in nature, putting it into read-only mode would prevent log growth. The queries that are causing the growth would fail however, so this might not actually be what you are looking for.

Put the database into simple recovery mode. That way the transaction logs will be routinely truncated. When in full mode, your logs will continue to grow until they are backed up.

USE [master]
GO
ALTER DATABASE [YourDatabase] SET RECOVERY SIMPLE WITH ROLLBACK IMMEDIATE
GO