Wednesday, August 10, 2011

Reverse Engineering a Mobi Start Page

‹prev | My Chain | next›

Today, I continue my efforts to get the start page for the mobi version of SPDY Book working just so. And by "working just so", I mean actually working. Currently a new reader of SPDY Book on the Kindle is greeted by the first page in the book. I would rather start people past this, the table of contents and the acknowledgements. Rather I would like SPDY Book to open directly at the Introduction.

I know this is possible because I have seen other books do it. I have been trying to solve this by reading documentation and trying things out. I am giving up on that. Instead I am going to try to reverse engineer an existing mobi in an attempt to apply lessons I learn to SPDY Book's mobi.

The first thing I notice when I do this is that the mobi is actually an epub:
$ ls
etype  META-INF  mimetype  OEBPS
$ cat mimetype 
application/epub+zip%  
I have the same thing in my epub output:
➜  book.epub.d git:(master) ✗ ls
META-INF  mimetype  OEBPS
➜  book.epub.d git:(master) ✗ cat mimetype 
application/epub+zip%    
I copy the epub directory into a mobi equivalent and create the etype file (no idea if this is necessary):
➜  output git:(master) ✗  cp -a book.epub.d book.mobi.d
➜  output git:(master) ✗ cd book.mobi.d
➜  book.mobi.d git:(master) ✗ ls
META-INF  mimetype  OEBPS
➜  book.mobi.d git:(master) ✗ cp mimetype etype
Next, I attempt to manually create a toc.html file that points to just the actual contents of the book. I pull in the git-scribe / a2x generated NCX table of contents, stripping out subsections (e.g. Chapter 2, section 3):
<h1>The SPDY Book</h1>
<div class="toc">
<p>Table of Contents</p>


<ncx:navPoint id="id435783" playOrder="1">
<ncx:navPoint id="id435199" playOrder="5">
<ncx:navLabel>
<ncx:text>Introduction</ncx:text>
</ncx:navLabel>
<ncx:content src="pr04.html"/>
</ncx:navPoint>
<ncx:navPoint id="id435420" playOrder="9">
<ncx:navLabel>
<ncx:text>1. A Case For SPDY</ncx:text>
</ncx:navLabel>
<ncx:content src="ch01.html"/>
</ncx:navPoint>
<ncx:navPoint id="id481317" playOrder="13">
<ncx:navLabel>
<ncx:text>2. Your First SPDY App</ncx:text>
</ncx:navLabel>
<ncx:content src="ch02.html"/>
</ncx:navPoint>
<ncx:navPoint id="id481802" playOrder="18">
<ncx:navLabel>
<ncx:text>3. SPDY and the Real World</ncx:text>
...
Thanks to a bit of Emacs macro work, I convert the above to:
<h1>The SPDY Book</h1>
<div class="toc">
<p>Table of Contents</p>

<div>
<span class="chapter"><a href="pr04.html">Introduction</a></span>
</div>
<div>
<span class="chapter"><a href="ch01.html">1. A Case For SPDY</a></span>
</div>
<div>
<span class="chapter"><a href="ch02.html">2. Your First SPDY App</a></span>
</div>
<div>
<span class="chapter"><a href="ch03.html">3. SPDY and the Real World</a></span>
...
Now, I need to add the table of contents to the Open Packaging Format file:
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
...
<manifest>
<item id="ncxtoc" media-type="application/x-dtbncx+xml" href="toc.ncx"/>
    <item id="htmltoc" media-type="application/xhtml+xml" href="toc.html"/>
...

</manifest>
<spine toc="ncxtoc">
<itemref idref="cover" linear="no"/>
    <itemref idref="htmltoc" linear="yes"/>
...
</spine>
<guide>
<reference href="cover.html" type="cover" title="Cover"/>
    <reference href="toc.html" type="toc" title="Table of Contents"/>
</guide>
</package>
With that, I am ready to rebuild my epub:
➜  book.mobi.d git:(master) ✗ zip ../book_manual.epub . -r
adding: etype (stored 0%)
adding: META-INF/ (stored 0%)
adding: META-INF/container.xml (deflated 33%)
adding: OEBPS/ (stored 0%)
adding: OEBPS/index.html (deflated 45%)
...
And then convert the epub (with TOC file) to mobi format:
➜  book.mobi.d git:(master) ✗ cd ..
➜  output git:(master) ✗ kindlegen book_manual.epub

**************************************************
* Amazon.com kindlegen(Linux)   V1.2 build 33307 *
* A command line e-book compiler                 *
* Copyright Amazon.com 2011                      *
**************************************************

Info(prcgen): Added metadata dc:Title        "The SPDY Book"
Info(prcgen): Added metadata dc:Creator      "Chris Strom"
Info(prcgen): Parsing files  0000020
Info(prcgen): Resolving hyperlinks
Info(prcgen): Building table of content     URL: /tmp/mobi-gpRH2H/OEBPS/toc.ncx
Info(pagemap): No Page map found in the book
Info(prcgen): Computing UNICODE ranges used in the book
Info(prcgen): Found UNICODE range: Basic Latin [20..7E]
Info(prcgen): Found UNICODE range: General Punctuation - Windows 1252 [2013..2014]
Info(prcgen): Found UNICODE range: Letter-like Symbols [2100..214F]
Info(prcgen): Found UNICODE range: Arrows [2190..21FF]
Info(prcgen): Building MOBI file, record count:   0000076
Info(prcgen): Final stats - text compressed to (in % of original size):  046.55%
Info(prcgen): The document identifier is: "The_SPDY_Book"
Info(prcgen): The file format version is V6
Info(prcgen): Saving MOBI file
Info(prcgen): MOBI File successfully generated!
And it works!

When I open up The SPDY Book on the Kindle, the start page is the Introduction. If I "Go to…" the Beginning in the Kindle, I go to... the Introduction.

For good measure, I edit toc.html, removing the Introduction and the first two chapters. After copying the result to my Kindle, the start page is, indeed, chapter 3.

Crazy. It seems that the toc.html file is used by the Kindle to determine the start page. I could have sworn that I tried that previously when generating the mobi from HTML, but maybe there is something in the epub source that is telling the Kindle to reference toc.html.

I am a bit frustrated with this whole process which begins to feel more and more like a black art than a deterministic, well-documented process. I may just be satisfied with this process and move on to more important things. But first, tomorrow, I will automate this process in git-scribe.


Day #109

No comments:

Post a Comment