20,000/60 = 333 records per minute is not that rate where you worry about inserts, even if they are burst. Of course it depends on your hardware, but since your table is rather huge, you need something big. So I would defenitely create 1 or 2 indexes (you can omit clustered) to speed up your queries.
At a higher insert rate you probably go with partitioning and split PAGELATCH contention between partitions. Or you could use a variation of GUID clustered primary key. There is a lot of arguing here and you may want to check Tom Kejser blog post here.
Also there is no shame if you go below 3NF. People create highly denormalized data shops just for reporting purposes. Since you have enterprise edition you could use CDC (Change Data Capture) technique and create SSIS package to update reporting table, at a time you need fresh data.
This is completely expected. I've talked about this before but I'll recap it for you.
There could be multiples reason you are seeing this behavior. The two that most people are confused or inaccurate of how it works are how availability groups synchronize data and how queries on readable secondary replicas work.
Please note the below is ONLY pertinent for SQL 2012 and 2014, and to some extent 2016.
How Data Synchronization Works (brief overview)
There are two types of replica synchronization, synchronous and asynchronous. The way the data synchronization happens in both is exactly the same. The way that SQL Server behaves, though, is different. When synchronous is used we wait for the data to be HARDENED on the secondary replicas. This means it only needs to be acknowledged that it was successfully written to the log, not that the log block shave been successfully redone. Asynchronous does not wait for the status message and just continues.
Thus it is entirely working properly, however there seems to be a misunderstanding as to how it works. AGs ship log blocks, not transactions, thus the entire transaction may not be shipped together and may not even be redone yet.
Querying Readable Secondary Replicas
When you run a query on a secondary replica, the read committed (default) isolation level is silently mapped to snapshot isolation under the covers... whether or not you have SI or RCSI enabled for the database.
Since snapshot works by row versioning and is consistent from the beginning of the transaction, you may not be able to see new data as it comes in per how snapshot isolation works. This is entirely working as intended.
The other point to keep in mind is that just because you have the acknowledgement that the data was hardened to the log on the synchronous secondary replica does not mean that the REDO thread has redone those log blocks yet. Thus, just because you have an acknowledge on the primary doesn't mean REDO has completed on it, only that it has been hardened to the log. Additionally, if your redo thread is blocked it may take a long time (or never) to not be blocked and thus your redo queue size will grow.
In the end I can totally expect the behavior you are witnessing, however it doesn't mean that it isn't synchronous. It just means it doesn't necessarily work the way it was believe or thought to. Hopefully this clears the confusion.
Best Answer
Looks like the opposit is true: the implicit conversion takes the offset into the equasion, but the cast/convert functions do not.
Comparing this (deducted 7 hours from @dt) results in:
Did some more investigation, and stumbled on this article.
"When you convert from datetime2 or datetimeoffset to date, there is no rounding and the date part is extracted explicitly. For any implicit conversion from datetimeoffset to date, time, datetime2, datetime, or smalldatetime, conversion is based on the local date and time value."
So when you want to treat '2014-07-07 09:49:33.000' and '2014-07-07 09:49:33.000 +07:00' as equal, your only option is to make an explicit conversion via cast or convert. Since implicit conversions would only work when your server's timezone offset happens to be the same as the offset specified in @dto.