Monday, August 30, 2010

How to Crash Faye the Stupid Way

‹prev | My Chain | next›

Yesterday I got my (fab) game to the point that I could briefly lose the backend or DB without any loss of player data and only brief interruption of the action. It was a nice confirmation of the power of node.js and CouchDB.

It was also timely because I am finding that the backend crashes with a faye error when players timeout of the game:
cstrom@whitefall:~/repos/my_fab_game$ ./game.js
[INFO] Starting up...
[INFO] [init_timeouts] bob 1793694
[INFO] players.update_player_status: bob
[INFO] players.update_player_status: bob
[INFO] players.update_player_status: bob
[INFO] timeout bob!
[INFO] players.drop_player bob

net:678
throw new Error('Stream is not writable');
^
Error: Stream is not writable
at Stream._writeOut (net:678:11)
at Stream.write (net:664:17)
at Object.handshake (/home/cstrom/.node_libraries/.npm/faye/0.5.2/package/faye-node.js:1207:14)
at Object.initialize (/home/cstrom/.node_libraries/.npm/faye/0.5.2/package/faye-node.js:1042:19)
at Object.<anonymous> (/home/cstrom/.node_libraries/.npm/faye/0.5.2/package/faye-node.js:149:28)
at Object.handleUpgrade (/home/cstrom/.node_libraries/.npm/faye/0.5.2/package/faye-node.js:1854:18)
at Server.<anonymous> (/home/cstrom/.node_libraries/.npm/faye/0.5.2/package/faye-node.js:1816:52)
at Server.emit (events:43:20)
at Stream.<anonymous> (http:777:14)
at IOWatcher.callback (net:494:29)
The drop player message is being broadcast to all players, including the player that has been idling too long. I suspect that the faye channel in the dropped player's browser is being closed before the backend can finish sending messages. I would like to see if things might be made a bit more graceful.

I could try to track this down in at least two ways: (1) try to reproduce to see if there is a pattern or (2) look at the code to see what is going on in there. I feel lazy when I play with the browser to reproduce, especially when I have a perfectly good stack trace telling me where things are going wrong. Besides, the last time I started playing, I felt ridiculous when no pattern emerged and a cursory code examination made it obvious where the problem was. So I opt for (2) today...

Checking out the handleUpgrade method, I find that the error is originating from the first line (when the websocket is initialized):
handleUpgrade: function(request, socket, head) {
    var socket = new Faye.WebSocket(request, head),
self   = this;

socket.onmessage = function(message) {
try {
var message = JSON.parse(message.data);
self._server.process(message, socket, function(replies) {
socket.send(JSON.stringify(replies));
});
} catch (e) {}
};
}
I also notice that errors generated on the reception of the message are swallowed without note—exceptions when initializing sockets should be similarly non-breaky (though I believe that they ought to be noted, not silently ignored).

Since this is happening under websockets, I wonder if something similar will occur in long-polling. To test this out, I set the player timeout to 10 seconds, then play / idle in Firefox 3 (which does not have websockets). That seems to behave itself in the backend:
[WARN] {"error":"not_found","reason":"deleted"}
[INFO] players.add_player: bob
[WARN] {"error":"not_found","reason":"deleted"}
[INFO] players.update_player_status: bob
[INFO] players.update_player_status: bob
[INFO] players.update_player_status: bob
[INFO] players.update_player_status: bob
[INFO] timeout bob!
[INFO] players.drop_player bob
Aside from needing to ignore warnings from CouchDB about previously deleted players, I look to be good here.

So I try the same thing in Chrome and find... that it works. I allow my player to idle timeout 10+ times and nothing. Gah!

Burned again. I am not sure which approach it best when crashes occur. I ought to start keeping a log to see which is the most likely approach to get quick resolution.

For now, I am stuck with a very hard to reproduce error. While pondering what to do next (simply restarting is starting to look better and better), I end up staring at this for a long time:



Stupid alerts. I really ought to put up a modal or just set a flash message / redirect to the login screen. Distracted from my original train of thought, I click OK and... crash the server.

Again, I think, "stupid alerts". They lock the browser, blocking it from processing events like faye notifications. I suspect that the faye socket timed out in the interim, then tried to reconnect/upgrade. Since the browser was blocking, nothing happened. Clicking OK freed the browser to respond, but the stream was already defunct. Or something like that.

I believe that I will call it a night there. Tomorrow, I will remove the alert dialog. This seems like a good opportunity to play with mustache.js and sessions in express. After that, I will return to this issue of timing out faye sockets. This feels like an edge case (how many people are stupid enough to use alert() dialogs in 2010?). Still, it is fertile ground for learning a bit more about websockets and faye's internals so I will be back.


Day #211

No comments:

Post a Comment