I know it's an old Question but this might still help searchers and it's a problem that pops up every now and then.
The main reason why you are hitting a performance ceiling without you seeing any resource bottleneck is because you've reached the limit of what is possible to process within one session single thread. The loop isn't processed in parallel, but all inserts are done serially.
In my case, it takes 36 seconds to insert 3 million rows. That means 36/30000000 = 0.000012 seconds per row. That's pretty fast. On my system, it simply takes 0.000012 to go through all the steps that are necessary.
The only way to get it done faster is start up a second session in parallel.
If I start 2 sessions in parallel both doing 15 million inserts. Both of them finish in 18 seconds. I could scale out more, but my current test setup is hitting 95% cpu with two parallel session, so doing 3 would skew the results since I would hit a CPU bottleneck.
If I start 2 parallel session both inserting 3 million rows, they both finish in 39 seconds. so that is now 6 million rows in 39 seconds.
Okay, that still leaves us with the NETWORK_IO wait showing up.
The NETWORK_IO waits are added by the fact that you are using extended events to trace them. In my case the insert takes 36 seconds (on avg). When using the extended event way (from the link above in the very first comment) this is what is registered:
Wait Type Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
NETWORK_IO 3455 68808 68802 6
PAGEIOLATCH_SH 3 64 64 0
PAGEIOLATCH_UP 12 58 58 0
WRITE_COMPLETION 8 15 15 0
WRITELOG 3 9 9 0
PAGELATCH_UP 2 4 4 0
SOS_SCHEDULER_YIELD 32277 1 0 1
IO_COMPLETION 8 0 0 0
LATCH_SH 3 0 0 0
LOGBUFFER 1 0 0 0
You can see that 68 seconds of NETWORK_IO is registered. But since the insert loop is a single threaded action that took 36 seconds, this can't be. (Yes, multiple threads are used, but the operations are serial, never in parallel, so you can't acummulate more wait time than the total duration of the query)
If I don't use extended events but just the wait stats DMVs on a quiet instance (with just me running the insert) I get this:
Wait Type Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Signal Resource Wait Time (ms)
SOS_SCHEDULER_YIELD 8873 0.21 0.01 0.20
PAGEIOLATCH_UP 3 0.02 0.02 0.00
PREEMPTIVE_OS_AUTHENTICATIONOPS 17 0.02 0.02 0.00
PAGEIOLATCH_SH 1 0.00 0.00 0.00
So the NETWORK_IO you were seeing in the extended events log, wasn't related to your insert loop. (If you wouldn't turn nocount on, you would have massive async network IO waits, +1 Martin)
However I don't know why the NETWORK_IO show up in the extended event trace. Sure the writing out to a async file target of the events accumulates ASYNC_NETWORK_IO, but surely this is all done on a differenent SPID then the one we are filtering on. I might ask this as a new question myself)
The data you are looking to compress is that sent over the wire via TDS. There is some minor compression here but nowhere near the type of compression you get with page/row compression, backup compression or ColumnStore compression.
It has been asked for before:
http://connect.microsoft.com/SQLServer/feedback/details/412131/enable-network-compression-compress-tds-stream
http://connect.microsoft.com/SQLServer/feedback/details/377479/wan-compression-option
The items are still open, so maybe there is some hope. There is no way to control this via the connection string that I've ever seen.
In the meantime there are some products that claim to do this, e.g.
http://www.nitrosphere.com/products/nitroaccelerator/
http://toonel.net/tcpany.htm
You can also potentially configure the network between your SQL Server and the application servers to support compression (and other things like encryption) but you are beyond my scope here, and I'm not sure if this would be supported by every single feature of SQL Server.
And to be honest, I'm not convinced this is the place you want to focus on optimizing. Compressing this stream might actually slow things down and outweigh the benefits of sending fewer bytes. I'd rather plunk the money down on better network connectivity between server and client(s) than to spend time investing in this type of work and testing whether it has any actual benefits - and not being able to do that until afterward. From from 10/100 to gig fiber has a known and predictable impact on network I/O.
I am not sure about the format of the bytes sent over the wire; you will have to set up some kind of packet sniffer for that (or maybe someone has already done that and will chime in).
As for the impact of compression, unless you are on Fusion-IO or other high-end SSD-type solutions, you are almost certainly I/O bound currently, and not CPU-bound. So as long as you have CPU overhead, you should see faster performance with compression enabled (but this won't change network performance, since the data is uncompressed before transmission). I say that knowing nothing about your servers, your application, your data or your usage patterns - you could very well have an edge case where compression actually hurts performance, or where the data just isn't a good candidate for good compression ratios.
Best Answer
You can also take a look at this whitepaper written by the SQL CAT team. Take note of who all reviewed that document too, it is very well written.
The whitepaper will explain that since we are talking about data compression there is some data that will compress better than other. I believe the section on Application Workload has information for some of your questions, is goes over the performance implications found when using data compression. My advice is going to be to test it. That is really the only way you will find out for sure if it will benefit or hurt your application/system.
A snippet from the whitepaper points out some data that does not benefit from data compression:
Columns with numeric or fixed-length character data types where most values require all the bytes allocated for the specific data type
Not much repeating data
Repeating data with non-repeating prefixes
Data stored out of the row
FILESTREAM data