Sql-server – MongoDB Master and Slave Replication – Windows or Linux

linuxmongodbperformancesql-server-2008windows

I have recently moved my MS SQL Server Database to MongoDB (2.0) on Windows Server 2008. Its containing 2 databases with following stats. (both databases get around 100 reads/writes per second). These writes are performed by a .Net application running on the same Database server.

> db.stats()
{
        "db" : "FirstDB",
        "collections" : 112,
        "objects" : 94020788,
        "avgObjSize" : 62.86065307174409,
        "dataSize" : 5910208136,
        "storageSize" : 14340792320,
        "numExtents" : 1277,
        "indexes" : 110,
        "indexSize" : 4279800784,
        "fileSize" : 27837595648,
        "nsSizeMB" : 16,
        "ok" : 1
}

> db.stats()
{
        "db" : "SecondDB",
        "collections" : 10,
        "objects" : 2926198,
        "avgObjSize" : 158.5025346883567,
        "dataSize" : 463809800,
        "storageSize" : 720027648,
        "numExtents" : 49,
        "indexes" : 8,
        "indexSize" : 115248896,
        "fileSize" : 2080374784,
        "nsSizeMB" : 16,
        "ok" : 1
}

I have found that MongoDB can hold all my data with required read/write speed. Now I need experts assistance to enforce reliability to my single-server (Database + application) setup;

But, during a period of 1 month, I have experienced following MongoDB-On-Windows Issues;

  • Unexpected shutdown resulting as corrupt collections
  • MongoDB not releasing disk space for deleted objects (currently its around 28-GB and growing)
  • Increased number of Page Faults
  • No control over data in RAM (unable to pre-load required data in RAM and remove after few minutes)
  • Frequent Database Errors like: Unable to read data from the transport connection…

What would be the best rescue setup for me towards Reliability/Backup/Performance;

  1. Move MongoDB to Linux (provided that my .Net application using this Database will reside on Windows Server within same LAN)? OR
  2. Setup a 2-servers replica set with Master on Windows and Slave on Linux ? OR
  3. What you suggest… ?

Edit-1:

> db.serverStatus()
{
        "host" : "OWNEROR-GTPD0H9",
        "version" : "2.2.2",
        "process" : "mongod",
        "pid" : 5972,
        "uptime" : 2246315,
        "uptimeMillis" : NumberLong("2246314871"),
        "uptimeEstimate" : 1418073,
        "localTime" : ISODate("2013-02-01T18:20:48.371Z"),
        "locks" : {
                "." : {
                        "timeLockedMicros" : {
                                "R" : NumberLong(1009609910),
                                "W" : NumberLong("41166641284")
                        },
                        "timeAcquiringMicros" : {
                                "R" : NumberLong("233444382285"),
                                "W" : NumberLong("26675293061")
                        }
                },
                "admin" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong("20598738435"),
                                "w" : NumberLong(0)
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong("681905051780"),
                                "w" : NumberLong(0)
                        }
                },
                "local" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong(7054206),
                                "w" : NumberLong(0)
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong(323140436),
                                "w" : NumberLong(0)
                        }
                },
                "FirstDB" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong("40372661580"),
                                "w" : NumberLong("54059509747")
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong("769458843030"),
                                "w" : NumberLong("357708107433")
                        }
                },
                "SecondDB" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong("25053503869"),
                                "w" : NumberLong("395081595504")
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong("69429086729"),
                                "w" : NumberLong("9163194312205")
                        }
                }
        },
        "globalLock" : {
                "totalTime" : NumberLong("2246314871000"),
                "lockTime" : NumberLong("41166641284"),
                "currentQueue" : {
                        "total" : 54,
                        "readers" : 52,
                        "writers" : 2
                },
                "activeClients" : {
                        "total" : 2,
                        "readers" : 0,
                        "writers" : 2
                }
        },
        "mem" : {
                "bits" : 64,
                "resident" : 69,
                "virtual" : 34059,
                "supported" : true,
                "mapped" : 16793,
                "mappedWithJournal" : 33586
        },
        "connections" : {
                "current" : 114,
                "available" : 19886
        },
        "extra_info" : {
                "note" : "fields vary by platform",
                "page_faults" : 631285910,
                "usagePageFileMB" : 6848,
                "totalPageFileMB" : 49132,
                "availPageFileMB" : 34274,
                "ramMB" : 24567
        },
        "indexCounters" : {
                "note" : "not supported on this platform"
        },
        "backgroundFlushing" : {
                "flushes" : 37430,
                "total_ms" : 86130745,
                "average_ms" : 2301.115281859471,
                "last_ms" : 3853,
                "last_finished" : ISODate("2013-02-01T18:19:49.233Z")
        },
        "cursors" : {
                "totalOpen" : 227,
                "clientCursors_size" : 227,
                "timedOut" : 125,
                "totalNoTimeout" : 226
        },
        "network" : {
                "bytesIn" : 46807928165,
                "bytesOut" : 24400717839,
                "numRequests" : 462799358
        },
        "opcounters" : {
                "insert" : 92590009,
                "query" : 92755757,
                "update" : 183285338,
                "delete" : 7489,
                "getmore" : 1004,
                "command" : 94208472
        },
        "asserts" : {
                "regular" : 0,
                "warning" : 0,
                "msg" : 0,
                "user" : 97431,
                "rollovers" : 0
        },
        "writeBacksQueued" : false,
        "dur" : {
                "commits" : 17,
                "journaledMB" : 0.139264,
                "writeToDataFilesMB" : 0.033338,
                "compression" : 2.2571516556184057,
                "commitsInWriteLock" : 0,
                "earlyCommits" : 0,
                "timeMs" : {
                        "dt" : 3291,
                        "prepLogBuffer" : 0,
                        "writeToJournal" : 8,
                        "writeToDataFiles" : 1,
                        "remapPrivateView" : 38
                }
        },
        "recordStats" : {
                "accessesNotInMemory" : 3942359,
                "pageFaultExceptionsThrown" : 15956,
                "FirstDB" : {
                        "accessesNotInMemory" : 163718,
                        "pageFaultExceptionsThrown" : 6931
                },
                "SecondDB" : {
                        "accessesNotInMemory" : 3778641,
                        "pageFaultExceptionsThrown" : 9025
                },
                "admin" : {
                        "accessesNotInMemory" : 0,
                        "pageFaultExceptionsThrown" : 0
                },
                "local" : {
                        "accessesNotInMemory" : 0,
                        "pageFaultExceptionsThrown" : 0
                }
        },
        "ok" : 1
}

Best Answer

This is quite an involved question, but I will give a brief answer on each. You should try these out, then ask a more specific question (like give details about RAM availability and utilization, with working data set size to diagnose page faulting for example).

Un-expected shutdown resulting as corrupt collections

You don't list your full version, but make sure you are on a later version of 2.0 at least and preferably 2.2 (2.2.3 is in release candidate status as of writing this). There were multiple Windows related fixes later in the 2.0 branch and several of them dealt with making the Windows build less susceptible to hard crashes (which can lead to corruption).

MongoDB not releasing disk space for deleted objects (currently its around 28-GB and growing)

This has been covered in numerous other posts and discussions - see my answer here:

https://stackoverflow.com/questions/13390160/does-mongodb-reuse-deleted-space

Increased number of Page Faults No control over data in RAM (unable to pre-load required data in RAM and remove after few minutes)

As mentioned above, page faulting and data fitting into RAM requires more details - how much data do you have, what is your working set size, do your indexes fit into memory etc. There are also numerous discussions on the mongodb-user Google group as well as Stack Overflow on this, and how to track this in MMS - a bit of searching for common approaches here is advised before you submit a new question. You should also note that page fault reporting in Windows can be tough to interpret thanks to the OS reporting "soft" page faults (already in memory but not owned/touched by a process) and "hard" page faults (actually paged from disk) in pretty much the same way.

A note about pre-loading, you can do this with the touch command in 2.2+, for indexes, data or both (see the linked page for options).

Frequent DB Errors like: Unable to read data from the transport connection...

Again, this will require logs, an idea of the relative load on the system etc. More detail definitely required - you could just be hitting a timeout or similar because you are overloading the DB. I would advise a separate question if these errors persist after you resolve other problems

Move MongoDB to Linux (provided that my .Net application using this DB will reside on Windows Server within same LAN)?

More people use MongoDB on Linux than Windows so it is more extensively tested and used by the community on that platform, but there is significant usage on Windows and large installations. Moving OS will not necessarily solve or automatically fix issues. If your data will not fit in RAM on Windows, it will likely not fit on Linux either for example, nor will it change how MongoDB reuses deleted space.

Setup a 2-servers replica set with Master on Windows and Slave on Linux ? OR

You should always run a replica set (minimum primary, slave, arbiter), regardless of the OS. It's just a good idea in general, but particularly if you are seeing some of the issues you describe. While in theory you can mix OS like you mention, I would recommend picking one or the other and just using that.