Friday, March 26, 2010

A Silly CouchDB Replication Scheme in node.js

‹prev | My Chain | next›

Picking up from last night's failure to get node.js playing nicely with CouchDB replication, I start by trying it from the command line. The failure stemmed from the use of the create_target attribute, which should create a new database if it does not already exist. With curl, I find:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/_replicate \
> -d '{"source":"seed", "target":"http://couch-011a.local:5984/seed", "create_target":true}'
{"error":"db_not_found","reason":"could not open http://couch-011a.local:5984/seed/"}
Ah good to know. It is not a problem with node.js. Something in my understanding of the create_target attribute is amiss.

Switching the source and target does, however, work:
cstrom@whitefall:~$ curl -X POST http://couch-011a.local:5984/_replicate \                                                                      >     -d '{"source":"http://whitefall.local:5984/seed", "target":"seed", "create_target":true}'
{"ok":true,"session_id":"cb968554905b1527e6fab6eaa0c6f754","source_last_seq":743,"history":[{"session_id":"cb968554905b1527e6fab6eaa0c6f754","start_time":"Fri, 26 Mar 2010 21:40:59 GMT","end_time":"Fri, 26 Mar 2010 21:47:15 GMT","start_last_seq":0,"end_last_seq":743,"recorded_seq":743,"missing_checked":0,"missing_found":675,"docs_read":678,"docs_written":678,"doc_write_failures":0}]}
Interesting. I learn a couple of things from this result. The first is that I need create another seed DB. It is silly to replicate 678 documents for a test.

More importantly, I figured out how the create_target option works. Specifically, it only works when the target database resides on the server being POSTed to, though the wiki seems to imply otherwise. Speaking of the wiki, while reading it, I found that a local target is preferred. I have been doing local source / remote target. That means that I need to update couch-replicate, but first, I want to see last night's node.js replication script through.

I delete the incredibly large seed databases:
cstrom@whitefall:~$ curl -X DELETE http://couch-011a.local:5984/seed
{"ok":true}
cstrom@whitefall:~$ curl -X DELETE http://localhost:5984/seed
{"ok":true}
Using the couch_docs gem, I create a smaller seed DB:
cstrom@whitefall:~/tmp/seed$ ls
2002-08-26-grilled_chicken.json 2002-08-26.json 2002-08-26-pasta.json 2002-08-26-pesto.json _design
cstrom@whitefall:~/tmp/seed$ couch-docs push http://localhost:5984/seed -R
Updating documents on CouchDB Server...
My simple goal is to replicate a database from my localhost (whitefall.local) machine onto CouchDB VM A, and the from A to B, and B to C. At the end, the seed database, which did not exist on VM C at the start should contain the newly created and replicated DB. The node.js script, now using local targets, becomes:
var
sys = require('sys'),
couchdb = require('node-couchdb/lib/couchdb'),
client = couchdb.createClient(5984, 'whitefall.local'),
clienta = couchdb.createClient(5984, 'couch-011a.local'),
clientb = couchdb.createClient(5984, 'couch-011b.local'),
clientc = couchdb.createClient(5984, 'couch-011c.local');

client.allDbs(function (er, data) {
sys.puts("DBs on localhost: " + data);
});

clientc.allDbs(function (er, data) {
sys.puts("DBs on C before replication: " + data);
});

clienta.replicate("http://whitefall.local:5984/seed", "seed", {create_target:true});
clientb.replicate("http://couch-011a.local:5984/seed", "seed", {create_target:true});
clientc.replicate("http://couch-011b.local:5984/seed", "seed", {create_target:true});

clientc.allDbs(function (er, data) {
sys.puts("DBs on C after replication: " + data);
});
Now when I run the script I find... that the database still is not replicated to C:
cstrom@whitefall:~$ ./local/bin/node ./tmp/node-couch.js 
DBs on localhost: eee,test,seed
DBs on C before replication: test
DBs on C after replication: test
Gah!

The explanation for this failure turns out to be simple enough. Checking the log in server B, I find:
[Fri, 26 Mar 2010 22:53:18 GMT] [error] [<0.222.0>] {error_report,<0.31.0>,
{<0.222.0>,crash_report,
[[{initial_call,{couch_rep,init,['Argument__1']}},
{pid,<0.222.0>},
{registered_name,[]},
{error_info,
{exit,
{db_not_found,<<"http://couch-011a.local:5984/seed/">>},
[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},
But checking server A, I see that the seed database is there:



My guess is that server A has not had time to finished creating / replicating the seed database before server B tries to replicate it. To verify, I add some timeouts to my node.js script:
clienta.replicate("http://whitefall.local:5984/seed", "seed", {create_target:true});

setTimeout(function () {
clientb.replicate("http://couch-011a.local:5984/seed", "seed", {create_target:true});
}, 2000);
setTimeout(function () {
clientc.replicate("http://couch-011b.local:5984/seed", "seed", {create_target:true});
}, 4000);

setTimeout(function () {
clientc.allDbs(function (er, data) {
sys.puts("DBs on C after replication: " + data);
});
}, 6000);
Indeed, server C now contains the seed database:
cstrom@whitefall:~$ ./local/bin/node ./tmp/node-couch.js 
DBs on localhost: eee,test,seed
DBs on C before replication: test
DBs on C after replication: seed,test
This is clearly a contrived example. Were I to even try this for real in node.js, I would not be using timeouts, which would certainly fail for large databases. Still, I feel like I made progress—I figured out why last night's script failed and learned a thing or two about CouchDB replication.

Day #54

No comments:

Post a Comment