View Single Post
Old 01-15-2009, 06:50 AM   #1 (permalink)
Baraka_Guru
warrior bodhisattva
 
Baraka_Guru's Avatar
 
Super Moderator
Location: East-central Canada
PDF to HTML—Am I missing something?

If you don't already know, the Amazon Kindle format is HTML. If you want to submit books for sale through Amazon's "Digital Platform" (DP), it basically tells you to convert to HTML no matter what source your book is in:

Quote:
Digital Text Platform lets you upload and convert your content from several formats. However, for best results, we suggest that you upload your content in HTML format, as Amazon DTP converts all uploaded content into HTML first. You can export or save many documents as HTML, e.g. from inside Microsoft Word.

HTML (.html, .htm)
We recommend uploading content in HTML format, usually as a single .html file. Please note that DTP supports a number of select HTML tags, but that most complex formatting options do not translate well to the reader. By default, we suggest that you use only the supported tags for your content and avoid CSS and other formatting inside your HTML.

Zipped HTML (.zip)
If your HTML content contains images or multiple files you'll need to compress all files into a .zip file before uploading. Important: All the files in the .zip archive must be in a single folder, without any files (like images) in sub-folders. If your content is viewable in a web browser, we recommend that you save it in the web browser using the "Save As Web Page (Complete)" option or similar, which will include any images on the page. The resulting files can be then put inside a .zip file.

MobiPocket (.mobi and .prc)
Amazon DTP will handle .mobi file formatting and images very well. Please note that only unencrypted mobi files are supported.

Microsoft Word (.doc)
Digital Text Platform converts all text formatting to HTML tags and processes images automatically. We recommend that you convert Microsoft Word files to HTML inside Word and make sure the formatting looks right before uploading the already-converted HTML for best results. Please check the entry on Word conversion for further information.

Adobe PDF (.pdf)
Although PDF is supported, due to the complex nature of the format and its suitability for print or display, DTP conversion may not be ideal. See the entry on PDF conversion for more details.

Plain Text (.txt)
Digital Text Platform converts all text to HTML.

Tip: If your content exists in a format not listed above, check the program in which you created it for a "Save As…" or "Export" feature, and export into .html format for best results.
Amazon DTP Support : Supported Formats

Even if you don't convert, it says Amazon will convert to HTML anyway. As a book publisher, it is in your best interests to ensure your product is of good quality in that the layout is clean and there are no errors, and so it's a bit scary to just leave this up to some automated process by just submitting files to Amazon and having the DP do it. What I'm talking about is taking a digital book list in PDF format and making it available on Amazon's DP...without it going all awry through the conversion process. The best scenario would be to submit clean HTML books to the DP.

I've tried everything I know of: Using Adobe Acrobat, options in Quark, and a few other things. It all goes wonky. There is always something that goes wrong in the HTML code. For example, as soon as you get to italics, for some reason there is no closing HTML tag and so the rest of the book is in italics. Page breaks get all bunged up. Text sometimes overlap. WTF?

Am I missing something. Why is it so hard to go from one popular format to another?

I've heard Mobipocket's publishing utility (Mobipocket Creator, publisher edition) does a good job, but I haven't tried it yet because it's only available in Windows...our office is Mac. Anyone try this?

(The obvious question is: Why in hell does Amazon think HTML is the best format for digital books? But this won't help any of us....)

Any advice I can get it great, but what would also help me is if I can get this confirmed: Is it possible to convert from PDF (or Quark, whatever) to HTML without having to re-edit or reformat the entire document (i.e. book)?
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing?
—Bhikkhuni Pema Chödrön

Humankind cannot bear very much reality.
—From "Burnt Norton," Four Quartets (1936), T. S. Eliot

Last edited by Baraka_Guru; 01-15-2009 at 06:56 AM..
Baraka_Guru is offline  
 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360