Mongodb – Improving mongodb read throughput for tiny database

mongodbperformance

I am given to understand that MongoDB will essentially perform like it is pulling records from memory if the working set is small. I wrote a simple MongoDB test program that inserts just one record into a collection with an indexed primary key and another field and uses findOne to read the field of the inserted key.

The read throughput I am getting with many threads is just ~14K/s on my 2-core laptop, which is better than, say, mysql, but this throughput still seems awfully low given that a java hashmap gives me a read throughput of nearly ~2 million/s. Shouldn't I be getting performance comparable to a completely in-memory map? What else does MongoDB really have to do for a read-only workload with a tiny database? Do I need to change any settings from the MongoDB defaults?

I have just one "document" that has a string primary key and small string field "some value".

Test code I scratched up that gives me 15-20K/s on a 2-core machine. You would need org.json and mongodb jars to run it.

import java.net.UnknownHostException;
import java.text.DecimalFormat;
import java.util.concurrent.ScheduledThreadPoolExecutor;

import org.json.JSONException;
import org.json.JSONObject;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.DuplicateKeyException;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.util.JSON;

@SuppressWarnings("javadoc")
public class MongoSmallWorkingSetRead {
private static ScheduledThreadPoolExecutor executor = new   ScheduledThreadPoolExecutor(
        8);

private static long initTime = System.currentTimeMillis();
private static int count = 0;

private static synchronized int incrCount() {
    return ++count;
}

private static synchronized int getCount() {
    return count;
}

private static synchronized void reset() {
    count = 0;
    initTime = System.currentTimeMillis();
}

private static void testReadRate(String dbName, String collectionName,
        String primaryID, String fieldKey) throws UnknownHostException,
        JSONException {
    MongoClient mongoClient = new MongoClient("localhost");
    DB db = mongoClient.getDB(dbName);
    String primaryIDKey = "primaryIDKey";
    db.getCollection(collectionName).createIndex(
            new BasicDBObject(primaryIDKey, 1),
            new BasicDBObject("unique", true));

    JSONObject json = new JSONObject();
    json.put(fieldKey, "some value");
    json.put(primaryIDKey, primaryID);

    DBCollection collection = null;
    db.requestStart();
    try {
        db.requestEnsureConnection();

        collection = db.getCollection(collectionName);
        DBObject dbObject = (DBObject) JSON.parse(json.toString());
        try {
            collection.insert(dbObject);
        } catch (DuplicateKeyException e) {
            // throw new RecordExistsException(collectionName, primaryKey);
            // suppress it as it's expected
        } catch (MongoException e) {
            e.printStackTrace();
        }
    } finally {
        db.requestDone();
    }

    db.requestStart();
    db.requestEnsureConnection();

    // test read speed
    try {
        DBObject dbObject = null;
        BasicDBObject query = new BasicDBObject(primaryIDKey, primaryID);
        BasicDBObject projection = new BasicDBObject().append("_id", 0)
                .append(fieldKey, 1);
        int frequency = 10000;
        do {
            try {
                dbObject = collection.findOne(query, projection);
            } finally {
                db.requestDone();
            }

            if (incrCount() % frequency == 0) {
                System.out.println(dbObject);
                System.out.println("op/s = "
                        + new DecimalFormat().format(getCount() * 1000.0
                                / (System.currentTimeMillis() - initTime)));
                if (getCount() > frequency * 20) {
                    System.out
                            .println("**********************resetting************************");
                    reset();
                }
            }
        } while (true);
    } catch (Exception e) {
        System.out.println("Lookup failed: " + e);
    }
}

public static void main(String[] args) throws Exception {
    if (args.length == 3) {
        for (int i = 0; i < executor.getCorePoolSize(); i++) {
            executor.submit(new Runnable() {
                public void run() {
                    try {
                        testReadRate("dbName", args[0], args[1], args[2]);
                    } catch (UnknownHostException | JSONException e) {
                        e.printStackTrace();
                    }
                }
            });
        }
    } else {
        System.out.println("Usage: "
                + MongoSmallWorkingSetRead.class.getSimpleName()
                + " <collectionName> <primaryIDKey> <fieldKey>");
    }
}
}

db.stats() output:

db.stats()
{
"db" : "node",
"collections" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"numExtents" : 0,
"indexes" : 0,
"indexSize" : 0,
"fileSize" : 0,
"ok" : 1
}

Best Answer

You have a major mistake in your code. MongoClient creates a connection pool. Even in large applications, it is hence usually a singleton. So you should have it as a global variable, initialize it in main and reuse it in each runnable. Which is perfectly fine, since MongoClient is thread safe.

Another thing to keep in mind is that although the single document sure is in the working set and hence should be in RAM, you application still needs to communicate with mongod. So your query will be translated to MongoDB's wire protocol, sent to the server where it will be executed, the matching documents identified (in this case only one, though this is not necessarily transparent before execution), finally sent back to the client and the answer is translated from MongoDB's wire protocol to Java terms. This is obviously going to be slower than simple Java native accesses in the same JVM without match conditions.

Finally, let's do some maths:

time per doc

which even not taking the overhead of unnecessary MongoClients into account is pretty fast in my book.