ActiveRecord allows you to serialize objects into text columns through YAML. This seems useful but in my experience is under-used. One of the primary reasons for this is it’s not possible to use the data that the object encapsulates without the ruby model. For example it’s not possible to find on the contents of that object or for that matter, modify the object with languages that lack YAML support. With CouchDB all data is stored in JSON so this is not an issue.
The project I wrote CouchFoo for used complex ACLs and I wanted to encapsulate this all in an object rather than use several many-many relationships and construct an ACL object based on their contents. So how do you this with CouchFoo? Simple, any object can be assigned as a property in a CouchFoo model as long as it has a .to_json method and a class .from_json method. The methods do what you’d expect, for example:
class DataObjectAttributeList
attr_accessor :attributes
# Constructs the object from JSON
def self.from_json(json)
DataObjectAttributeList.new(json)
end
# Converts the object to JSON
def to_json
@attributes
end
def initialize(initials = {}, *args)
@attributes = initials
end
This is just a simple example storing a hash but the structure could be as complex as you’d like. In the future I plan to add inline associations to CouchFoo, so rather than have a one-to-many association where the many are accessed via a second database query you could have the objects stored as part of the parent contents. Performance wise, this is normally much more efficient (although not in all situations – eg heavy write and low read).
CouchDB is an excellent database, designed especially for distributed applications. To quote the official site site:
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
along with the knowledge it’s written in Erlang, you know it’s going to go be a winner in the future.
For one of my current freelance projects I needed to store data in a document fashion – ie unstructured. This made CouchDB an ideal candidate. There were several ruby gems available: CouchPotato, CouchREST, ActiveCouch and RelaxDB gems. Each offered its own benefits and own challenges. After hacking with each I couldn’t get a library was happy with. So I started with ActiveRecord and modified it to work with CouchDB. And so CouchFoo was born.
In the end I ended up with a gem that mirrors ActiveRecord in all but a few minor places. In particular:
CouchDB is schema free so property defintions for the document are defined in the model (like DataMapper)
:select, :joins, :having, :group, :from and :lock are not available on find or associations as they don’t apply (locking is handled as conflict resolution at insertion time)
:conditions can only accept a hash and not an array or SQL. For example :conditions => {:user_name => “Georgio_1999″}
:offset is less efficient in CouchDB – there’s more on this in the rdoc
:order is applied after results are retrieved from the database. Therefore :order cannot be used with :limit without a new option :use_key. This is explained fully in the quick start guide and CouchFoo#find documentation
:include isn’t implemented yet but the finders and associations still accept the option so you won’t need to make any code changes
By default results are ordered by document key. The key uses a UUID scheme so these don’t auto-increment and are likely to come out in a different order to insertion. default_sort can be used on a model to sort by create date by default and overcome this
validates_uniqueness_of has had the :case_sensitive option removed
Because there’s no SQL there’s no SQL finder methods
Timezones, aggregations and fixtures are not yet implemented
The price of index updating is paid when next accessing the index rather than the point of insertion. This can be more efficient or less depending on your application. It may make sense to use an external process to do the updating for you – see CouchFoo#find for more on this
On that note, occasional compacting of CouchDB is required to recover space from old versions of documents and keep performance high. This can be kicked off in several ways (see quick start guide)
The RDoc for the gem contains more details on each of these differences, new features that I added, a quick start guide and additional areas of responsibility to think about when using CouchDB (in particular performance).
As a quick overview, basic operations are the same as ActiveRecord:
class Address < CouchFoo::Base
property :number, Integer
property :street, String
property :postcode # Any generic type is fine as long as .to_json can be called on it
end
address1 = Address.create(:number => 3, :street => "My Street", :postcode => "secret") # Create address
address2 = Address.create(:number => 27, :street => "Another Street", :postcode => "secret")
Address.all # = [address1, address2] or maybe [address2, address2] depending on key generation
Address.first # = address1 or address2 depending on keys so probably isn't as expected
Address.find_by_street("My Street") # = address1
As key generation is through a UUID scheme, the order can’t be predicted. However you can order the results by default:
class Address < CouchFoo::Base
property :number, Integer
property :street, String
property :postcode # Any generic type is fine as long as .to_json can be called on it
property :created_at, DateTime
default_sort :created_at
end
Address.all # = [address1, address2]
Address.first # = address1 or address2, sorting is applied after results
Address.first(:use_key => :created_at) # = address1 but at the price of creating a new index
Note that there’s an optimisation that will order results by created_at if there are no conditions so in the above case, the default_sort wasn’t required. However when using with conditions it will be required so it makes sense to use at all times.
Conditions work slightly differently:
Address.find(:all, :conditions {:street => "My Street"}) # = address1, creates index on :street
Address.find(:all, :conditions {:created_at => "sometime"}) # Uses same index as :use_key => :created_at
Address.find(:all, :use_key => :street, :startkey => 'p') # All streets from p in alphabet, reuses the index created 2 lines up
As well as providing support for people using relational databases, CouchFoo attempts to provide a library for those wanting to use CouchDB as a document-orientated database:
Document.number_ordered(:limit => 75) # Will get the last 75 documents in the database ordered by number, street attributes
Associations work as expected but you must to remember to add the properties required for an association (we’ll make this automatic soon):
class House < CouchFoo::Base
has_many :windows
end
class Window < CouchFoo::Base
property :house_id, String
belongs_to :house
end
There’s a few bits left to tidy up (as noted in the readme) but generally speaking it’s now ready for use by others. Grab it on github and feel free to fork and send me pull requests.
And now to do something I’ve not being doing a lot of lately, spend some more time on the Couch…
With the recent announcement that Rails and MERB will merge and my preference for DataMapper I decided to plug datamapper into rails for my next freelance project. The theory goes this should make the upgrade path to Rails 3 a lot simpler!
It’s currently possible to use Datamapper with Rails, heck even DHH himself commented so, but it’s not quite easy as using ActiveRecord. After a quick Google I only ran into question of how to do it, no howto guide. So I set out to make mine own – it really was quite simple in the end:
sudo gem install addressable data_objects do_mysql
# do_mysql can be changed for do_postgres or do_sqlite3 as appropriate
sudo gem install dm-core dm-more
In the dm-more github repos there’s a folder called rails_datamapper which is a plugin for rails to add datamapper support. This doesn’t install with the dm-more gem so it’s a case of cloning the git repository and copy the folder to your rails project:
Then edit your project environment.rb file and add the following lines:
# Load the required gems in the correct order
config.gem "addressable", :lib => "addressable/uri"
config.gem "data_objects"
config.gem "do_mysql"
config.gem "dm-core"
# Make datamapper load first as some plugins have dependencies on it
config.plugins = [ :rails_datamapper, :all ]
# Remove ActiveRecord if you no longer need it
config.frameworks -= [ :active_record ]
The connection to the database will be made by the rails_datamapper plugin using your database.yml configuration file. You’ll need to use a slightly different format for datamapper:
development:
:repositories:
:adapter: mysql
:database: opnli_dev
Or alternately you can specify your own initializer and forgo the rails plugin:
The only real gotcha in using datamapper is some rails plugins assume you’re using ActiveRecord. Hopefully this won’t be the case in the future, but for now you’ll need to get forking!
For those readers of my blog who don’t live in the rails world I highly recommend checking out Git, a distributed version control system. It has been big in the rails world since early this year for several good reasons:
It has distributed and offline functionality
Making and merging branches is a breeze – encouraging you to try experiments in branches
It uses much less space than alternatives, such as Subversion, and only has one .git folder at the base of your project
It’s in active development with constant releases of new features (but stable enough to be used for the linux kernel)
The terminology is slightly different from subversion and friends but once you’ve got used to it you never look back!
Merb was very quick to jump on the git bandwagon and rails followed not much later. Practically this made distributed development a hell of a lot easier, but it also had some nice knock on effects. Patching is now a lot quicker too – you simply fork the project, make a fix and inform the admin who can then choose to merge back into the master (if they see fit). It’s made the process for fixing bugs a hell of a lot quicker.
Soon after git came along the fantastic github.com followed making it easy to host remote repositories. And so to the reason for me writing this post – github just launched git pages where you can upload your own page to front your repositories. It’s a neat idea and naturally is all managed through a git repository. You simply create your site in a repository, push to github and the deployment is automatic. Although it’s only simple HTML pages, it’s a great proof of concept of other things that could be possible. My effort can be found here which following the git ethos I just forked from somebody else
Yesterday I spoke at and attended Ruby Manor. It was a grass route conference with the attendees determining the agenda and the organisingduo aiming to keep costs to an absolute minimum. So successful were they infact, and as one twitterer aptly points out, that for an amazing £12 you got a series of excellent talks, no annoying sponsorship and £500 behind the bar at the end of the night. Given even the cheapest of conferences aimed at freelancers clocks in well over the £100 mark, this really was a fantastic achievement.
On a personal note I presented on nanite which is a background processing solution for rails and merb. It was quite a technical talk and I was trying to get a lot into the 30 minutes timeslot, so it felt a little rushed. Nevertheless most people seemed to grab the concept and the live demo at the end went really well. For those that missed the event I’m sure the videos will be online soon, but in the meantime there’s an excellent coverage on this blog and the slides from my presentation are available on slideshare or below:
I’m George Palmer and rowtheboat.com is my personal blog. I’m the founder of 5ft Shelf and a freelance developer living and working in London. You can find out more about me, or hire me, on the about page