Tilted Forum Project Discussion Community  

Go Back   Tilted Forum Project Discussion Community > Interests > Tilted Technology


 
 
LinkBack Thread Tools
Old 01-15-2009, 06:50 AM   #1 (permalink)
warrior bodhisattva
 
Baraka_Guru's Avatar
 
Super Moderator
Location: East-central Canada
PDF to HTML—Am I missing something?

If you don't already know, the Amazon Kindle format is HTML. If you want to submit books for sale through Amazon's "Digital Platform" (DP), it basically tells you to convert to HTML no matter what source your book is in:

Quote:
Digital Text Platform lets you upload and convert your content from several formats. However, for best results, we suggest that you upload your content in HTML format, as Amazon DTP converts all uploaded content into HTML first. You can export or save many documents as HTML, e.g. from inside Microsoft Word.

HTML (.html, .htm)
We recommend uploading content in HTML format, usually as a single .html file. Please note that DTP supports a number of select HTML tags, but that most complex formatting options do not translate well to the reader. By default, we suggest that you use only the supported tags for your content and avoid CSS and other formatting inside your HTML.

Zipped HTML (.zip)
If your HTML content contains images or multiple files you'll need to compress all files into a .zip file before uploading. Important: All the files in the .zip archive must be in a single folder, without any files (like images) in sub-folders. If your content is viewable in a web browser, we recommend that you save it in the web browser using the "Save As Web Page (Complete)" option or similar, which will include any images on the page. The resulting files can be then put inside a .zip file.

MobiPocket (.mobi and .prc)
Amazon DTP will handle .mobi file formatting and images very well. Please note that only unencrypted mobi files are supported.

Microsoft Word (.doc)
Digital Text Platform converts all text formatting to HTML tags and processes images automatically. We recommend that you convert Microsoft Word files to HTML inside Word and make sure the formatting looks right before uploading the already-converted HTML for best results. Please check the entry on Word conversion for further information.

Adobe PDF (.pdf)
Although PDF is supported, due to the complex nature of the format and its suitability for print or display, DTP conversion may not be ideal. See the entry on PDF conversion for more details.

Plain Text (.txt)
Digital Text Platform converts all text to HTML.

Tip: If your content exists in a format not listed above, check the program in which you created it for a "Save As…" or "Export" feature, and export into .html format for best results.
Amazon DTP Support : Supported Formats

Even if you don't convert, it says Amazon will convert to HTML anyway. As a book publisher, it is in your best interests to ensure your product is of good quality in that the layout is clean and there are no errors, and so it's a bit scary to just leave this up to some automated process by just submitting files to Amazon and having the DP do it. What I'm talking about is taking a digital book list in PDF format and making it available on Amazon's DP...without it going all awry through the conversion process. The best scenario would be to submit clean HTML books to the DP.

I've tried everything I know of: Using Adobe Acrobat, options in Quark, and a few other things. It all goes wonky. There is always something that goes wrong in the HTML code. For example, as soon as you get to italics, for some reason there is no closing HTML tag and so the rest of the book is in italics. Page breaks get all bunged up. Text sometimes overlap. WTF?

Am I missing something. Why is it so hard to go from one popular format to another?

I've heard Mobipocket's publishing utility (Mobipocket Creator, publisher edition) does a good job, but I haven't tried it yet because it's only available in Windows...our office is Mac. Anyone try this?

(The obvious question is: Why in hell does Amazon think HTML is the best format for digital books? But this won't help any of us....)

Any advice I can get it great, but what would also help me is if I can get this confirmed: Is it possible to convert from PDF (or Quark, whatever) to HTML without having to re-edit or reformat the entire document (i.e. book)?
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing?
—Bhikkhuni Pema Chödrön

Humankind cannot bear very much reality.
—From "Burnt Norton," Four Quartets (1936), T. S. Eliot

Last edited by Baraka_Guru; 01-15-2009 at 06:56 AM..
Baraka_Guru is offline  
Old 01-15-2009, 10:01 AM   #2 (permalink)
Crazy
 
fiatguy85's Avatar
 
Check out this on Adobe's web site. It's probably not implemented in most programs because HTML and PDF are fairly different formats.
fiatguy85 is offline  
Old 01-15-2009, 10:15 AM   #3 (permalink)
warrior bodhisattva
 
Baraka_Guru's Avatar
 
Super Moderator
Location: East-central Canada
Thanks for reminding me of this. I had tried this out on a small and simple PDF and it wasn't too bad. It had converted more than half of it into all bold because the headings were bold. (I'm assuming it missed some closing bold tags in there.) This could be corrected easily enough, but I"m not sure how an entire book would go over.

The thing to consider is that our digital books have editorial breaks, tabs, and other things. Some have graphics as well. Even with text-only books, there are major issues with formatting. Think of how a novel looks with paragraph breaks and indents, italics and such, and then imagine what would happen if something—just one thing—goes wrong.

When you're dealing with a back list of about 200 titles, the prospect of having to go into each one to fiddle around on a bunch of little things seems daunting.

I'll have to experiment with this Adobe tool some more. Maybe try to put a whole book through it to see how it chews on it.

Thanks again for the reminder.
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing?
—Bhikkhuni Pema Chödrön

Humankind cannot bear very much reality.
—From "Burnt Norton," Four Quartets (1936), T. S. Eliot
Baraka_Guru is offline  
Old 01-15-2009, 08:31 PM   #4 (permalink)
Mine is an evil laugh
 
spindles's Avatar
 
Location: Sydney, Australia
Do you have the source document used to create the PDF, or was it originally authored in Acrobat? For example, if the original doc was authored in Word, you might find that document is a better place to start for your HTML conversion.
__________________
who hid my keyboard's PANIC button?
spindles is offline  
Old 01-16-2009, 04:46 AM   #5 (permalink)
Darth Papa
 
ratbastid's Avatar
 
Location: Yonder
PDF is meant to be an end format, not a transport format. So, you do your work in Quark or whatever, and bake it into a PDF as an end product to send off to the printer or have people download from the web. If you've ever looked in a PDF.... I mean, it's plain text, but there's nothing at all simple about its format.

HTML makes very very good sense as a format for mobile books. It's the standard for web pages. Remember that it's not a FORMAT per se (at least, not the way PDF or MS Word .DOC is). It's a MARKUP LANGUAGE that the display device can interpret and render in whatever way is best for its own particular display idiosyncrasies.
ratbastid is offline  
Old 01-16-2009, 08:02 AM   #6 (permalink)
Addict
 
guyy's Avatar
 
Location: Cottage Grove, Wisconsin
I'd rather have pdfs than html. I guess i won't be getting on-line books from amazon. Still.

As ratbastid says pdf and postscript, from which it stems, are for printer and screen output. All my writing is in pdf and ps, but i get that output from TeX and LaTeX. If i want html instead of pdf, i run my files through a different engine. Is it possible to go back to the source files?

I wonder if an html wrapper for your pdfs would work?

<html>

link to pdf

</html>

Another possibility is to go pdf --> plain text --> html
guyy is offline  
Old 01-16-2009, 08:24 AM   #7 (permalink)
warrior bodhisattva
 
Baraka_Guru's Avatar
 
Super Moderator
Location: East-central Canada
Quote:
Originally Posted by spindles View Post
Do you have the source document used to create the PDF, or was it originally authored in Acrobat?
The source format is QuarkXpress. For some reason, the program requires you to create the layout for HTML or print beforehand. At least this is as far as I know.

Quote:
Originally Posted by ratbastid View Post
PDF is meant to be an end format, not a transport format. [...]
I understand this completely. What's amiss here is that many book publishers digitize their lists into PDF, but now we have Amazon asking for HTML.

Quote:
Originally Posted by ratbastid
HTML makes very very good sense as a format for mobile books. It's the standard for web pages. Remember that it's not a FORMAT per se (at least, not the way PDF or MS Word .DOC is). It's a MARKUP LANGUAGE that the display device can interpret and render in whatever way is best for its own particular display idiosyncrasies.
This I understand too. The Kindle isn't just for reading books.

Quote:
Originally Posted by guyy View Post
I'd rather have pdfs than html. I guess i won't be getting on-line books from amazon. Still.
Yeah, I know. But Kindle users are getting their shit in HTML whether they like it or not. Just remember that there is a difference between Kindle books and other e-books Amazon offers.

Quote:
Originally Posted by guyy
As ratbastid says pdf and postscript, from which it stems, are for printer and screen output. All my writing is in pdf and ps, but i get that output from TeX and LaTeX. If i want html instead of pdf, i run my files through a different engine. Is it possible to go back to the source files?
As I said above, Quark is wonky when it comes to this HTML "thing."

Quote:
Originally Posted by guyy
I wonder if an html wrapper for your pdfs would work?

<html>

link to pdf

</html>
This bring up what I've come to as a conclusion. When it comes to code, Kindle thrives on simplicity. It is HTML with very limited CSS compatibility. I found this as a sample as posted by an Amazon DP guy in their forum:
Your Title Here

Note the CSS "standard." Streamlined, and it uses such considerations as relative text size rather than fixed, so users can control text size no matter what.

This is the bottom line (after some more thought and research): There is no automated process for making Kindle-ready books that don't look like shit. What this means is that publishers will have to basically re-edit and/or re-jig the layouts of each and every book independently. *groan* There is currently no way around it, and this is why I'm seeing "Kindle conversion" services popping up on the Web. It's not difficult to do yourself, but it is time-consuming, especially when you have around 200 titles.

Quote:
Another possibility is to go pdf --> plain text --> html
This might be the first step of converting a PDF book into Kindle format. This I can do, but it's more or less opening up a can of worms, especially with non-fiction titles.

The thing to realize is that if we're selling these books, we must have a quality standard. The text must be readable as a book, whether it's digital or not. We can't just spit these out in txt and hope our readers are fine with it as is. I don't think it'll work on its own.

Everyone, thanks for your feedback and suggestions. I now have to go shop around for affordable Kindle conversion services.
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing?
—Bhikkhuni Pema Chödrön

Humankind cannot bear very much reality.
—From "Burnt Norton," Four Quartets (1936), T. S. Eliot

Last edited by Baraka_Guru; 01-16-2009 at 08:44 AM..
Baraka_Guru is offline  
 

Tags
html—am, missing, pdf


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -8. The time now is 03:02 AM.

Tilted Forum Project

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Search Engine Optimization by vBSEO 3.6.0 PL2
© 2002-2012 Tilted Forum Project

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360