MongoDB db.collection.validate() Missing Output and EBUSY Error

data validationerrorsmongodbmongodb-3.2

logs of the successful run

# grep conn103640 mongodb.log | head -10
2017-01-12T11:29:51.492+0000 I ACCESS   [conn103640] Successfully authenticated as principal tz6z on admin
2017-01-12T11:31:20.954+0000 I COMMAND  [conn103640] CMD: validate example.example
2017-01-12T11:31:20.970+0000 W STORAGE  [conn103640] verify() returned EBUSY. Not treating as invalid.
2017-01-12T11:46:26.307+0000 I INDEX    [conn103640] validating index example.example.$_id_
2017-01-12T11:46:26.608+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 100
2017-01-12T11:46:26.802+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 200
2017-01-12T11:46:26.964+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 300
2017-01-12T11:46:27.208+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 400
2017-01-12T11:46:27.389+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 500
2017-01-12T11:46:27.636+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 600

# grep conn103640 mongodb.log | tail -10
2017-01-12T13:03:27.090+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 2700
2017-01-12T13:03:27.101+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 2800
2017-01-12T13:03:27.112+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 2900
2017-01-12T13:03:27.123+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3000
2017-01-12T13:03:27.133+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3100
2017-01-12T13:03:27.146+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3200
2017-01-12T13:03:27.158+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3300
2017-01-12T13:03:27.181+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3400
2017-01-12T13:03:27.198+0000 I STORAGE  [conn103640] WiredTiger progress WT_SESSION.verify 3500
2017-01-12T13:03:27.951+0000 I COMMAND  [conn103640] command example.$cmd command: validate { validate: "example", full: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:536 locks:{ Global: { acquireCount: { r: 3, w: 1 } }, Database: { acquireCount: { r: 1, w: 1 } }, Collection: { acquireCount: { r: 1, W: 1 } } } protocol:op_command 3884708ms

first run fails

{
        "ns" : "example.example",
        "nrecords" : 3087895,
        "nIndexes" : 3,
        "keysPerIndex" : {
                "example.example.example.$_id_" : 3087895,
                "example.example.example.$files_id_1_n_1" : 3087895,
                "example.example.$files_id_1" : 3087895
        },
        "indexDetails" : {
                "example.example.$_id_" : {
                        "valid" : true
                },
                "example.example.$files_id_1_n_1" : {
                        "valid" : true
                },
                "example.example.$files_id_1" : {
                        "valid" : true
                }
        },
        "valid" : true,
        "errors" : [
                "verify() returned EBUSY. Not treating as invalid."
        ],
        "ok" : 1
}

Why verify() returned EBUSY. Not treating as invalid? Why does this error happen and what does it mean?

> db.example.validate(true)
{
        "ns" : "example.example",
        "nrecords" : 3087895,
        "nIndexes" : 3,
        "keysPerIndex" : {
                "example.example.$_id_" : 3087895,
                "example.example.$files_id_1_n_1" : 3087895,
                "example.example.$files_id_1" : 3087895
        },
        "indexDetails" : {
                "example.example.$_id_" : {
                        "valid" : true
                },
                "example.example.$files_id_1_n_1" : {
                        "valid" : true
                },
                "example.example.$files_id_1" : {
                        "valid" : true
                }
        },
        "valid" : true,
        "errors" : [ ],
        "ok" : 1
}

We use 3.2.11 release in a sharded cluster setup. A lot of output from docs is missing at us.

For example

  • validate.deletedSize
  • validate.deletedCount
  • validate.keysPerIndex
  • validate.firstExtentDetails

why?

here the stats

> db.getCollection('example').stats()
{
        "ns" : "example.example",
        "count" : 3093574,
        "size" : 41265087190,
        "avgObjSize" : 13338,
        "storageSize" : 44227588096,
        "capped" : false,
        "wiredTiger" : {
                "metadata" : {
                        "formatVersion" : 1
                },
                "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_max=15,merge_min=0),memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
                "type" : "file",
                "uri" : "statistics:table:example/collection/3--7826890430442575271",
                "LSM" : {
                        "bloom filter false positives" : 0,
                        "bloom filter hits" : 0,
                        "bloom filter misses" : 0,
                        "bloom filter pages evicted from cache" : 0,
                        "bloom filter pages read into cache" : 0,
                        "bloom filters in the LSM tree" : 0,
                        "chunks in the LSM tree" : 0,
                        "highest merge generation in the LSM tree" : 0,
                        "queries that could have benefited from a Bloom filter that did not exist" : 0,
                        "sleep for LSM checkpoint throttle" : 0,
                        "sleep for LSM merge throttle" : 0,
                        "total size of bloom filters" : 0
                },
                "block-manager" : {
                        "allocations requiring file extension" : 569147,
                        "blocks allocated" : 1199781,
                        "blocks freed" : 609087,
                        "checkpoint size" : 44227502080,
                        "file allocation unit size" : 4096,
                        "file bytes available for reuse" : 69632,
                        "file magic number" : 120897,
                        "file major version number" : 1,
                        "file size in bytes" : 44227588096,
                        "minor version number" : 0
                },
                "btree" : {
                        "btree checkpoint generation" : 20780,
                        "column-store fixed-size leaf pages" : 0,
                        "column-store internal pages" : 0,
                        "column-store variable-size RLE encoded values" : 0,
                        "column-store variable-size deleted values" : 0,
                        "column-store variable-size leaf pages" : 0,
                        "fixed-record size" : 0,
                        "maximum internal page key size" : 368,
                        "maximum internal page size" : 4096,
                        "maximum leaf page key size" : 2867,
                        "maximum leaf page size" : 32768,
                        "maximum leaf page value size" : 67108864,
                        "maximum tree depth" : 5,
                        "number of key/value pairs" : 0,
                        "overflow pages" : 0,
                        "pages rewritten by compaction" : 0,
                        "row-store internal pages" : 0,
                        "row-store leaf pages" : 0
                },
                "cache" : {
                        "bytes currently in the cache" : 14869343433,
                        "bytes read into cache" : NumberLong("1459320699070073"),
                        "bytes written from cache" : 29038851892,
                        "checkpoint blocked page eviction" : 0,
                        "data source pages selected for eviction unable to be evicted" : 4135,
                        "hazard pointer blocked page eviction" : 1009,
                        "in-memory page passed criteria to be split" : 6665,
                        "in-memory page splits" : 3325,
                        "internal pages evicted" : 27664,
                        "internal pages split during eviction" : 92,
                        "leaf pages split during eviction" : 3640,
                        "modified pages evicted" : 587435,
                        "overflow pages read into cache" : 0,
                        "overflow values cached in memory" : 0,
                        "page split during eviction deepened the tree" : 0,
                        "page written requiring lookaside records" : 0,
                        "pages read into cache" : 6996594,
                        "pages read into cache requiring lookaside entries" : 0,
                        "pages requested from the cache" : 27857673,
                        "pages written from cache" : 1184917,
                        "pages written requiring in-memory restoration" : 2,
                        "unmodified pages evicted" : 4714196
                },
                "cache_walk" : {
                        "Average difference between current eviction generation when the page was last considered" : 0,
                        "Average on-disk page image size seen" : 0,
                        "Clean pages currently in cache" : 0,
                        "Current eviction generation" : 0,
                        "Dirty pages currently in cache" : 0,
                        "Entries in the root page" : 0,
                        "Internal pages currently in cache" : 0,
                        "Leaf pages currently in cache" : 0,
                        "Maximum difference between current eviction generation when the page was last considered" : 0,
                        "Maximum page size seen" : 0,
                        "Minimum on-disk page image size seen" : 0,
                        "On-disk page image sizes smaller than a single allocation unit" : 0,
                        "Pages created in memory and never written" : 0,
                        "Pages currently queued for eviction" : 0,
                        "Pages that could not be queued for eviction" : 0,
                        "Refs skipped during cache traversal" : 0,
                        "Size of the root page" : 0,
                        "Total number of pages currently in cache" : 0
                },
                "compression" : {
                        "compressed pages read" : 320144,
                        "compressed pages written" : 61726,
                        "page written failed to compress" : 1085090,
                        "page written was too small to compress" : 38101,
                        "raw compression call failed, additional data available" : 0,
                        "raw compression call failed, no additional data available" : 0,
                        "raw compression call succeeded" : 0
                },
                "cursor" : {
                        "bulk-loaded cursor-insert calls" : 0,
                        "create calls" : 1682,
                        "cursor-insert key and value bytes inserted" : 28740274252,
                        "cursor-remove key bytes removed" : 4341036,
                        "cursor-update value bytes updated" : 0,
                        "insert calls" : 2151614,
                        "next calls" : 9257008,
                        "prev calls" : 1,
                        "remove calls" : 1085226,
                        "reset calls" : 7027759,
                        "restarted searches" : 4058,
                        "search calls" : 3792524,
                        "search near calls" : 0,
                        "truncate calls" : 0,
                        "update calls" : 0
                },
                "reconciliation" : {
                        "dictionary matches" : 0,
                        "fast-path pages deleted" : 0,
                        "internal page key bytes discarded using suffix compression" : 2041847,
                        "internal page multi-block writes" : 12480,
                        "internal-page overflow keys" : 0,
                        "leaf page key bytes discarded using prefix compression" : 0,
                        "leaf page multi-block writes" : 9106,
                        "leaf-page overflow keys" : 0,
                        "maximum blocks required for a page" : 160,
                        "overflow values written" : 0,
                        "page checksum matches" : 1261346,
                        "page reconciliation calls" : 626355,
                        "page reconciliation calls for eviction" : 4518,
                        "pages deleted" : 586775
                },
                "session" : {
                        "object compaction" : 0,
                        "open cursor count" : 22
                },
                "transaction" : {
                        "update conflicts" : 0
                }
        },
        "nindexes" : 3,
        "totalIndexSize" : 143032320,
        "indexSizes" : {
                "_id_" : 47726592,
                "files_id_1_n_1" : 50769920,
                "files_id_1" : 44535808
        },
        "ok" : 1
}

Is it correct that db.collection.validate() does not repair? Only read?

Best Answer

Why verify() returned EBUSY. Not treating as invalid? Why does this error happen and what does it mean?

This a generic warning that a WiredTiger file could not be read because it was being used exclusively by another operation (for example, if a checkpoint happened to update a file at the same time as it was being validated). The "Not treating as invalid" message is very opaque but is trying to describe "not success but not an error" (the file was busy). This should be a transient error and the validate command will likely succeed if you retry. I created SERVER-27670 in the MongoDB issue tracker to make this message more user-friendly.

We use 3.2.11 release in a sharded cluster setup. A lot of output from docs is missing at us.

Validate output details will vary based on storage engine, MongoDB server version, and configuration. The common and storage-engine specific validate field output is better described in the MongoDB 3.4 documentation but hasn't been updated for 3.2/3.0 yet (I created DOCS-9766 to do so).

I can see keysPerIndex (which is a common field) in your sample output. The extent and deleted/freelist details are specific to the MMAPv1 storage engine, so aren't included since you are using WiredTiger.

Is it correct that db.collection.validate() does not repair? Only read?

Validate behaviour is storage-engine specific, but in general validate walks data structures and does not attempt to repair any errors. As at MongoDB 3.4, the one exception to validate being read-only is for the WiredTiger storage. Some WiredTiger collection statistics (eg. document counts) may be inaccurate in the event of an unclean shutdown and will be corrected when a collection is validated. The collection stats are metadata rather than structural changes to the data files.

If you need to repair/salvage data, the relevant command is repairDatabase . In general you should consider this a last resort as many forms of data corruption are not repairable and the end outcome is salvaging data that can be read successfully. For production deployments the first recourse for fixing a corrupt database should be re-syncing from a healthy member of the same replica set.