Mongodb – Choosing shard key and friendly URL Ids for the MongoDB

mongodb

I have decided to use MongoDB as my Database for a web application. However, I have some difficulties to get started and I hope that you can help me out with a few questions.

I am developing my application in ASP.NET and with MongoDB as the back-end. I intend to start with a single server + 1 replication but wanted to built it right so I won't have problem sharding the database in the future if I have to.

One of my biggest problems is choosing the right shard key and friendly URLs for my website.

I have a folders collection and files as embedded collection inside the folders collection. Each user can create any number of folders and add files to it. Each folder belongs to one user. I wanted to know what is the best shard key for this type of collection?
Many queries will query by the user, getting the folder and its items by querying the folders collection by its unique id. I will also use the id in the URL to get the folder and its filers: ex. mywebsite.com/folder/[the_id_of_the_folder]

I will also will use paging in my application, so I need to query the data (also in a sharded environment) and get for example: the last 10 records, page 2 with 10 records – all ordered by the last time the were inserted/updated

  • So my first question is what is the best shard key to use for a single machine, but considering that I will have to shard in the future
  • Does the shard key has to be the primary unique id of the document in MongoDB?
  • How can I generate more user friendly URLs – I prefer a numerical value instead of GUID (is there option to convert it?)

Help will be very appreciated, as I am stuck and can continue until I solve this.

Best Answer

Before you go further you need to answer a few questions

  • how do you represent files within folders into the database
  • how do you represent folders
  • do you have relations between folders (parent -> child)
  • how often do you expect to create folders and files
  • how often do you update existing files into folders and what is the number of files you update

Based on your answers you can have a write optimized schema or a read optimized schema. Write optimized is a schema that contains many entries that are very small or you can use built in operators like $inc over a collection. Read optimized is generally a larger collection like the one you described, into your scenario you could have very easy something like this (assuming all folders are at the same level)

{ "userid" : "email or id",
  [ 
     { "folder1" : [ "file1", "file2"] },  
     { "folder2" : [ "file3", "file4"] },
  ]
}

But with this schema it gets quite complicated if you need to link a folder to a parent folder ... But is obvious that the userid is the shard key.