The most likely scenario based on the test JNK did, is transactions that are rolling back, but still causing the seed to increment.
The best route is to trace (profile) the activity so you can catch the offending transaction(s). Here's a starting point for using Profiler:
http://msdn.microsoft.com/en-us/library/ms181091.aspx
When you're starting to learn tracing, err on the side of tracing less than you think you need. tracing is "invasive", and you can imperil a DB instance if you go after too much info.
In this case you're looking for transaction based info. Feel free to post back if you have specific questions while setting up your trace. Note that you'll probably want to log to files unless you can leave the GUI up overnight. Logging to files is preferred overall unless you're trying to catch something in real time.
The easiest way would be to start a new table. Inject the 30,000 rows you want to keep, then drop the old table and rename the new table.
BEGIN TRANSACTION;
CREATE TABLE dbo.copy_of_foo
(
ID INT IDENTITY(1,1) PRIMARY KEY,
col2 ..., ...
);
INSERT dbo.copy_of_foo(col2, ...)
SELECT TOP 30000 col2, ...
FROM dbo.foo
ORDER BY ...;
DROP TABLE dbo.foo;
EXEC sp_rename N'dbo.copy_of_foo', N'foo', N'OBJECT';
COMMIT TRANSACTION;
You may need to nest the order by and add OPTION (MAXDOP 1)
if you want to ensure that you get the 30K most recent rows and the IDENTITY values are assigned in order from oldest to newest, e.g. hopefully you have some other column, such as a date time column, that can help identify the 30000 rows you want to keep:
INSERT dbo.copy_of_foo(col2, ...)
SELECT col2, ...
FROM
(
SELECT TOP 30000 col2, ...
FROM dbo.foo
ORDER BY date_time_column DESC
) AS x
ORDER BY date_time_column
OPTION (MAXDOP 1);
Keep in mind though that if someone reported an exception yesterday with row #42,000,564, good luck finding it today. As I suggested in the comment, maybe you shouldn't be exposing these obviously-otherwise-absolutely-meaningless surrogate identifiers to users.
Best Answer
Your colleague is an idiot.
The solution won't be scalable, the UDF isn't concurrent (same reason as this). And how do you deal with multi-row inserts: this would require a UDF call per row
And migrating to other RDBMS doesn't happen often in real life... you may as well not use SQL Server now and use sequences on Oracle and hope you don't migrate away.
Edit:
Your update states that moving data is for refreshing non-production databases.
In that case, you ignore the identity columns when refreshing. You don't compromise your implementation to make non-prod loading easier. Or use temp tables to track the identity value changes.
Or use processes: we refresh our test system every night from production which avoids the issue entirely. (And ensures our prod backup can be restored too)