You can shrink an MDF file using:
USE [Databasename]
GO
DBCC SHRINKDATABASE(N'Databasename' )
GO
and you can shrink an LDF file using:
USE Data base
GO
DBCC SHRINKFILE(databsename_Log, 1)
GO
For what you want to do, I would recommend the following (which is pretty much what you were thinking).
1> Create history tables for the historic data you have - keep the schemas as similar as possible. Split up by some logical grouping (such as year/month) based on how they are going to be queried (say you need to report with in month/year as well as all). Do not worry about the table size of the splits unless they are getting into the TB size range (your dbms should handle it) just make sure that they are appropriately indexed for the queries that need to be run. You should consider putting these onto a different disk to the active data if performance is an issue.
2> Create a routine to move data from the active table to the relevant historic table. Run this periodically. As a practice rebuild the indexes on the table that has had the data removed from it, and maybe update the table statistics. Easiest way to do this is to write a sql script.
3> Consider the reporting you want to do. If you want to only have to deal with 1 table when writing queries, create a view that joins the archived tables together. Create indexes on all the tables to suite the view. This way if you want all the data, select from the view. If you want data from a specific year/month, query that table. The view will look something like:
create view view_all_data as
select "Jan12" as month,a.* from data_Jan12 a
union
select "Feb12" as month,b.* from data_Feb12 b
....
I am assuming here that the system is not a highly used transactional system and that you have windows of low usage to run the analysis queries. If you need to maintain high levels of performance, you may like to do the above in a separate database (separate hardware) and port across the new data that you get from backups.
Best Answer
Deleting data is easy from the syntax point of view, but that's not all there is.
DELETE FROM myTable WHERE someDate < deletionDate
might give you a few nasty suprises.Constraints might block you from doing a delete. A textbook example is a foreign key constraint between customer and order tables. Deleting a customer is not allowed as long as there are orders. If it would, you would end up having orders not linked to a customer. In order to overcome this kind of limitations, you must understand the database schema and start deleting from the right table first (or rely to constraint to use
ON DELETE CASCADE
.)Don't underestimate the performance effects. If you are lacking proper indexing (or have overextensive indexing), deleting records might generate serious a performance hit by creating lots of unnecessary index updates and/or full table scans. In addition, write locks will cause overall performance hits for any operations from the affected tables.
Transaction log management is a common gotcha. Pruning a lot of data will generate lots of transactions and you log files will go sky high. As a best practice, break the delete operations in batches. Aaron's blog has detailed discussion about the issue.