CouchDB and ORMs
Written on March 29, 2009 by george
Alex did a good introduction talk to CouchDB at Scotland on Rails. Towards the end of the talk he did an overview of the current ruby plugins/gems available for interfacing with CouchDB, one of which was my own CouchFoo. Alex’s opinion was that any ORM for CouchDB should be as thin as possible just wrapping the Ruby to JSON object translation. I raised my opinion in the question section at the end by saying that I didn’t agree and thought the ORM should map the level of functionality available in ActiveRecord. This sparked a debate both in the talk and via Twitter of the best approach for an ORM for CouchDB to take. As a result I agreed to write this blog post to outline my views.
CouchDB is a document orientated database with a HTTP interface amongst other features. When I first started using it I played with the database a lot via simple interactions through CURL. In the same way I feel it is important to know SQL before using any higher level API to store and retrieve objects in a relational database, I feel it is important to understand how CouchDB works before using a library to interact with it. As with most areas of computing you will find a range of opinions over what level you interact with the database – there are the purists who like to write SQL queries for each database query performed and those who are willing to sacrifice a bit of performance (maybe not having the optimum query run each time) for the time efficiencies realized whilst developing. I align quite well with the Rails mantra on this one – I’m willing to sacrifice perfect SQL each time for the efficiency gains made whilst developing. Part of Alex’s argument was that you should be as close to the database as possible because the Ruby to JSON conversion is much less than the Ruby to SQL conversion. Whilst I don’t disagree that it’s important to know how CouchDB works, I do disagree on the level at which any Ruby library should sit. I’m happy to pay a small price in terms of extra ruby code executed because I want as clean as DSL as possible.
Whilst developing CouchDB I tried all the existing ruby libraries and as I worked through them I ran into several issues. After using ActiveRecord’s save and find methods it was particularly annoying to use a library that used different method names for the same conceptual operations (eg get instead of find). This wasn’t a major issue of course I just forked the library and made changes. But as time went on there were features that I missed from ActiveRecord. Validations, callbacks, finders and associations were the prime contenders. Then dynamic finders and named scopes got added to the list. In the end changing the existing libraries became so much work I decided to start with ActiveRecord and work from there.
Of the features in ActiveRecord Associations are perhaps the most controversial on whether they should apply to Document orientated databases or not. The argument goes that if you’re trying to use associations you don’t understand how CouchDB should be used. I disagree on this point – a simple counter argument is presented by having a document that allows comments. Those comments could be stored inline in the document itself or in separate documents that have a reference to their parent. This is association whichever way you look at it. Which approach you decide to use will depend on your application and the characteristics of it. Incidentally Alex’s gem did a great job of this letting the user specify in the association whether they wanted the object stored inline or not. This has since been removed from his gem but is something that’s definitely on the TODO list for CouchFoo.
For me CouchDB lends itself well to two distinct domains. Firstly domains where documents are used – that is an object where the fields that are stored to the database change depending on the object. Secondly domains where you wish to take advantage of some of CouchDB’s features not present (or poorly implemented) in relational databases – a HTTP interface, fantastic scaling ability due to bi-directional replication, and schema free nature (see this excellent article on friendfeed experience with MySQL) are just a few that spring to mind. People may use CouchDB for the second set of criteria even though their database design could be considered quite structured, and I fully expect this group of people to rise as CouchDB reaches 1.0. However that wasn’t why I wrote CouchFoo, my project fell into the first domain. Whilst I provided a way to use ActiveRecord’s higher level API I also provided access to a database object that allows simple storage and retrieval of documents by id. If that is all the functionality you require then I would expect CouchREST would be a better choice. However I believe in reality you will quickly find you need to add validations to a field, or maybe add an association or two. And as soon as you start on that slope I believe CouchFoo to be a better choice.
Ultimately I created CouchFoo as I missed the richness of the ActiveRecord API. Whilst I don’t believe my library will be perfect for everyone it has received a lot of good feedback. To paraphrase DHH I didn’t create the perfect framework for everyone else, I created it for me. I only hope that other people find it useful.
If you enjoyed this post Subscribe to our feed




Eventhough I wouldn’t want to use an ORM to work with couchdb I think you have some valid points in your argumentation. I really think usage patterns can (and should) be different in different cases. In lucene-land similar different solutions emerged… and they seem to co-exist… something about using te best tools foor the given problem!
I prefer to have CouchDB handle valudations. Also many apps tend to cross languages so any business logic in your Ruby model will be lost when connecting from JS or python.
george, thanks for this post. a few things i would like to outline:
you are talking about ORMs – that stands for “object relational mapper” and couchdb is not a relational database, i.e. there can’t be an ORM for couchdb. i know it’s just a small thing but i think getting the words right is important.
when you compare using activerecord as an abstraction on top of sql to using couchfoo as an abstraction to couchdb i think there is one important point you are missing: while activerecord allows you to generate some 99% of all the queries you will ever need this doesn’t work with couchfoo. you can only generate very basic and minimal map/reduce functions automatically and as soon as you start overusing that (by using couchdb like a relational database with loading relations and joining them) you very quickly start working against couchdb. i’m not talking about losing a bit of performance here and there (btw. i love how ativerecord abstracts away sql for me) but about missing the whole concept of how couchdb works.
to your document/comments example: suppose you want to show a list of documents with the number of comments on each. what you would do with activerecord is to load the documents first and then load all the comments to all the documents in order to count them (you would optimize this at some point and do a custom sql query that counts for you but that is something activerecord doesn’t help you with). now if you insist on doing this in couchdb the same way you are missing the fact that with more direct access to the couchdb API this would have been trivial and most efficient from the start without even thinking about optimizations. by keeping to the activerecord way you create yourself a class of problems you shouldn’t even have.
@alex – Using the example on http://www.cmlenz.net/archives/2007/10/couchdb-joins CouchFoo currently uses the 2nd approach. I’m going to add support for inline associations (1st approach) as there are still some circumstances where this is useful.
The 3rd approach though is still a pattern and something that could be added for associations (indeed I intend to do this). In the same way the more you use SQL you notice patterns and can abstract around them to make it quicker to use via an ORM, I believe the same happens with CouchDB. No abstraction can cover all cases though and that’s why you can define views manually (in the same way you can define SQL queries manually).
Personally I prefer to have the majority of cases handled for me and manage the exceptions myself although this does depend on the user having a knowledge of how CouchDB works.
I don’t see the appeal in using to couchdb if your application depends on a schema that is very structured and relational. For example, with posts to comments, this seems like a perfect match for a relational database + ActiveRecord, so why not use that?
There are some great uses for couchdb, but I don’t think “a HTTP interface, fantastic scaling ability due to bi-directional replication, and schema free nature” are reason enough (in most cases) to try and map the functionality of a relational database onto document oriented database.
You shouldn’t try mapping the functionality of a relational database onto a document oriented database. That makes no sense. If you have relational data, use a relational database.
But most web data isn’t really relational. It is made relational to fit into a relational database, because that’s what we’re used to. Post + Comments? Post document + array of comments. Inline. Who needs relations here?
@chris: I think a blog (Post + Comments) is the perfect example of relational data. A post also has an author, so does a comment. A post may belong to a category. It may have a handful of tags. How is that not relational?