Friday, August 5, 2011

Mobi from Epub with Kindlegen

‹prev | My Chain | next›

Last night, I stumbled upon the proper incantation of asciidoc's a2x command to insert cover images into epubs:
a2x -f epub -d book -a docinfo -v book.asc
The key here is the -a docinfo switch, which tells asciidoc to read a file named book-docinfo.xml (or, alternatively, just docinfo.xml). In my case, the docinfo file contains:
<mediaobject role="cover">
<imageobject>
<imagedata fileref="images/cover.jpg" format="JPG"/>
</imageobject>
<textobject><phrase>The SPDY Book</phrase></textobject>
</mediaobject>
This is how I was able to get git-scribe to generate epubs with cover images.

I wonder if I can use that epub to generate nice looking mobi (including the cover image). Let's give it a try with Amazon's kindlegen:
➜  output git:(master) ✗ kindlegen book.epub -o spdybook_from_epub.mobi

**************************************************
* Amazon.com kindlegen(Linux) V1.2 build 33307 *
* A command line e-book compiler *
* Copyright Amazon.com 2011 *
**************************************************

Info(prcgen): Added metadata dc:Title "The SPDY Book"
Info(prcgen): Added metadata dc:Creator "Chris Strom"
Info(prcgen): Parsing files 0000019
Info(cssparser): @rules other than @import and @charset are not supported.
....
Info(prcgen): Resolving hyperlinks
Info(prcgen): Building table of content URL: /tmp/mobi-0K4Jyz/OEBPS/toc.ncx
Info(pagemap): No Page map found in the book
Info(prcgen): Computing UNICODE ranges used in the book
Info(prcgen): Found UNICODE range: Basic Latin [20..7E]
Info(prcgen): Found UNICODE range: General Punctuation - Windows 1252 [2018..201A]
Info(prcgen): Found UNICODE range: Letter-like Symbols [2100..214F]
Info(prcgen): Found UNICODE range: Arrows [2190..21FF]
Info(prcgen): Building MOBI file, record count: 0000064
Info(prcgen): Final stats - text compressed to (in % of original size): 049.01%
Info(prcgen): The document identifier is: "The_SPDY_Book"
Info(prcgen): The file format version is V6
Info(prcgen): Saving MOBI file
Info(prcgen): MOBI File successfully generated!
Hrm... that seems OK. For the most part. I am not sure what the deal with the CSS rules. Similarly, "No Page map"? I wonder what that is about.

First, I check that the output looks OK:



The formatting is still kinda nice. And, no, Scott Chacon is not on the front cover, so there's that.

Now that I think about it, there could be more spacing between paragraphs. The bullet points are on the same line as the associated text (though there may be too much leading whitespace on some bullet points).

A big win is the footnotes, which appear at the end of chapters like normal mobi books (they had been inserted inline).

What is definitely not working is the table of contents. It seems that that "No Page map" warning was in earnest. If I unzip the source epub and check out OEBPS/toc.ncx, it sure seems like there is a page map:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<ncx:ncx xmlns:ncx="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
<ncx:head>
<ncx:meta name="cover" content="cover"/>
<ncx:meta name="dtb:depth" content="-1"/>
<ncx:meta name="dtb:totalPageCount" content="0"/>
<ncx:meta name="dtb:maxPageNumber" content="0"/>
</ncx:head>
<ncx:docTitle>
<ncx:text>The SPDY Book</ncx:text>
</ncx:docTitle>
<ncx:navMap>
<ncx:navPoint id="id395944" playOrder="1">
<ncx:navLabel>
<ncx:text>The SPDY Book</ncx:text>
</ncx:navLabel>
<ncx:content src="index.html"/>
<ncx:navPoint id="id395330" playOrder="2">
<ncx:navLabel>
<ncx:text>Copyright</ncx:text>
</ncx:navLabel>
<ncx:content src="pr01.html"/>
</ncx:navPoint>
<ncx:navPoint id="id421493" playOrder="3">
<ncx:navLabel>
<ncx:text>History</ncx:text>
</ncx:navLabel>
<ncx:content src="pr02.html"/>
...
And the chapters themselves seem to contain reasonable headers:
<div class="chapter" title="Chapter 1. A Case For SPDY">
<div class="titlepage">
<div>
<div>
<h1 class="title">
<a id="chapter_a_case_for_spdy"/>
Chapter 1. A Case For SPDY
</h1>
</div>
</div>
</div>
...
Ah, a little research tells me that the page-map thing was just a red-herring. That is apparently for page numbering in the new Kindle software. That's a nice-to-have, but I am not going to worry about that today.

Ugh. So do I work with the existing git-scribe mobi and try to make it more like the epub (e.g. with proper footnotes) or do I work with the mobi generated from the epub and try to get the formatting correct and a TOC?

Dunno that I have a good answer for what's best. I will ruminate on that overnight and pick back up with one of those two approaches tomorrow.


Day #105

No comments:

Post a Comment