MongoDB Performance Tuning

Best feature of MongoDB is not it's performance but simple and flexible data model. So, let's say you build prototype - you concentrate on the big picture - the product itself and ignore little things like performance and db indexes.

Later you deploy your product into the wild users came and it starting to get slow. You need to add indexes, to do so you need to know data usage patterns. Doing it manually by searching codebase is boring and not very productive. Thankfully MongoDB has Profiler - all you need is to enable it and it will give you all details about slow queries and what indexes you need to add.

I like this approach very much, because it fits iterative & lean development very well - you always concentrate on the most important things at the moment. At the first step most important thing is to experiment with the product and features without being distracted by performance issues. And flexible data model of MongoDB comes very handy to that. Later you deploy product into production and can use its Profiler to zoom to more fine grained performance details.

Example

I created a web crawler, it works without performance problem for tens of sites, but when it comes to hundreds it start to get slower.

All I needed to do is to enable MongoDB Profiler mongod --profile=1 (also can be done in its config). Run the crawler for a second to collect database usage statistics and query the most recent slow queries

db.system.profile.find().limit(3).sort({ts : -1}).pretty()

...
"ns" : "db.urls",
"query" : {
    "query" : {
        "id" : "http://some-url..."
    },
    "orderby" : {
        "id" : 1
    }
...

It displayed that there are lots of slow queries to db.urls by id. So now I can easily fix it by adding db.urls.ensureIndex({id: 1}) index.

After this update application is faster but still slow, we need to run crawler for a second more and take look at new statistics.

...
"ns" : "db.urls",
"query" : {
    "$query" : {
        "siteId" : "some-site.com",
        "state" : "discovered",
        "random" : {
            "$gte" : 0.8306238979566842
        }
    },
    "orderby" : {
        "siteId" : 1,
        "state" : 1,
        "random" : 1
    }
}
...

There are lots of slow queries using siteId, state, random attributes (needed for random fetching during URL graph traversal) and we can also easily fix it by adding db.urls.ensureIndex({siteId: 1, state: 1, random: 1}) index. After this update there seems to be no more slowness, crawler works fast.

I stopped after adding those two indexes, sure, I can add more and speedup it further. But it already satisfies performance requirements and adding more indexes would be just adding unnecessary complexity and time spent for nothing valuable in return. And, if in the future with data growth crawler start to get slower - we can repeat those steps and tune its performance better.

Really like this approach, it allows to concentrate on the right things at the right time.