Try turning on Read Committed Snapshot isolation (RCSI) (but be aware this will put increased pressure on your tempDB, which should ideally be on its own set of dedicated physical spindles).
There are two 'snapshot' levels available in SQL Server 2005 onwards: READ COMMITTED SNAPSHOT does optimistic reads and pessimistic writes; whereas SNAPSHOT ISOLATION performs optimistic reads and optimistic writes. Suggest you try RCSI.
Enabling Row Versioning-Based Isolation Levels
To alter this setting, you need to switch to single-user mode to ensure there are no queries in flight (which would then fail):
ALTER DATABASE dbname
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
ALTER DATABASE dbname
SET READ_COMMITTED_SNAPSHOT ON;
GO
ALTER DATABASE dbname
SET MULTI_USER;
GO
I would not trust rsync to copy the ldf and mdf files for user or system databases, nor would I trust anything I hacked together (VSS or otherwise) in a production environment. SQL Server is very fussy about when (and if) things get written to the ldf and the mdf files. Software (rsync) that isn't designed with that in mind might not get it right, especially if it doesn't understand that the ldf files and mdf files need to be treated as an interrelated system of files. If that software doesn't get things right, nothing might be noticed until you failover, try to go live and have your databases flagged as suspect due to what SQL Server sees as data corruption. Even worse, "getting it right" might be dependant on how much load is on the system and you might not find the problem in a lightly-loaded test environment.
I have seen enough examples of people who thought that they were replicating their files but were not. They were left with corrupt files on the recovery site and with inaccessible backup files at the primary site. So, that meant no database for them.
In short, you are making an appointment for trouble.
If you had some sort of block-level replication technology like EMC's SRDF or were looking at shared-nothing clusters, that might be different. Those technologies interface with SQL Server and the clustering services that Windows provides in a way that your writes will be safe and your files will/should be consistent.
If my only disaster recovery option was a remote site that was normally down, I'd use log shipping and make sure that I had all of the pieces to restore the database(s) on the remote site. If you can't make the built-in log shipping do that, writing your own isn't that hard. I've probably written three or four (simplistic) log shipping systems from scratch in the last 14 years.
At a minimum, the key things you need are:
A full backup needs to be taken and copied over to the remote site.
Your tlog backups need to be taken and copied over to the remote site.
You need an automated method to restore that full backup and any relevant tlog backups. Ideally, this should be simple enough for someone else to do and/or simple enough for you to figure out at 3AM when your primary server fails and you are half asleep.
When you have an event, bringing up the other server will take longer because you would have so much manual stuff to do. That means that this isn't as good as simply implementing regular log shipping. You will need to test this periodically, as well.
(Of course, you need to worry about other things too, like jobs, packages, login and user synchronization, changing DSNs on the web servers during a failure, etc. If you have a large environment and a serious disaster, like the loss of your primary data center, you will be trying to do this when IIS guys, file server guys, network guys and whatever else guys are trying to bring their stuff up too.)
If it were me, I would be agitating for a warm standby server at the remote site. That would make (standard, out-of-the box) log shipping easier and database mirroring possible. It sounds like you have tried that.
Best Answer
Whilst not answering your question re. tracing locks by date range, all sorts of things can have this effect on the database.
I would first check whether the problem is environmental. Use Windows Perfmon to see what the system resources are doing at the time. For example, I have seen this type of thing occur when SQL Server checkpoint operations takes an overly long time to complete. The aforementioned issue can normally be solved by configuring a "recovery interval" of 1 minute or so.
To be sure of what's going on though, I would setup Windows Perfmon counters to monitor CPU usage, memory usage, and disk queue lengths of any disks on which your DB / Log files may be residing. In my experience (High OLTP) disk queue lengths should be no more than 2 for any disk. You can normally configure a separate server to log these counters to disk every 30 seconds or so in order for the monitoring not to adversely affect your DB server's performance. The number of concurrent connections and locks can also be monitored in this way.