01-15-2009, 06:50 AM | #1 (permalink) | |
warrior bodhisattva
Super Moderator
Location: East-central Canada
|
PDF to HTML—Am I missing something?
If you don't already know, the Amazon Kindle format is HTML. If you want to submit books for sale through Amazon's "Digital Platform" (DP), it basically tells you to convert to HTML no matter what source your book is in:
Quote:
Even if you don't convert, it says Amazon will convert to HTML anyway. As a book publisher, it is in your best interests to ensure your product is of good quality in that the layout is clean and there are no errors, and so it's a bit scary to just leave this up to some automated process by just submitting files to Amazon and having the DP do it. What I'm talking about is taking a digital book list in PDF format and making it available on Amazon's DP...without it going all awry through the conversion process. The best scenario would be to submit clean HTML books to the DP. I've tried everything I know of: Using Adobe Acrobat, options in Quark, and a few other things. It all goes wonky. There is always something that goes wrong in the HTML code. For example, as soon as you get to italics, for some reason there is no closing HTML tag and so the rest of the book is in italics. Page breaks get all bunged up. Text sometimes overlap. WTF? Am I missing something. Why is it so hard to go from one popular format to another? I've heard Mobipocket's publishing utility (Mobipocket Creator, publisher edition) does a good job, but I haven't tried it yet because it's only available in Windows...our office is Mac. Anyone try this? (The obvious question is: Why in hell does Amazon think HTML is the best format for digital books? But this won't help any of us....) Any advice I can get it great, but what would also help me is if I can get this confirmed: Is it possible to convert from PDF (or Quark, whatever) to HTML without having to re-edit or reformat the entire document (i.e. book)?
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing? —Bhikkhuni Pema Chödrön Humankind cannot bear very much reality. —From "Burnt Norton," Four Quartets (1936), T. S. Eliot Last edited by Baraka_Guru; 01-15-2009 at 06:56 AM.. |
|
01-15-2009, 10:01 AM | #2 (permalink) |
Crazy
|
Check out this on Adobe's web site. It's probably not implemented in most programs because HTML and PDF are fairly different formats.
|
01-15-2009, 10:15 AM | #3 (permalink) |
warrior bodhisattva
Super Moderator
Location: East-central Canada
|
Thanks for reminding me of this. I had tried this out on a small and simple PDF and it wasn't too bad. It had converted more than half of it into all bold because the headings were bold. (I'm assuming it missed some closing bold tags in there.) This could be corrected easily enough, but I"m not sure how an entire book would go over.
The thing to consider is that our digital books have editorial breaks, tabs, and other things. Some have graphics as well. Even with text-only books, there are major issues with formatting. Think of how a novel looks with paragraph breaks and indents, italics and such, and then imagine what would happen if something—just one thing—goes wrong. When you're dealing with a back list of about 200 titles, the prospect of having to go into each one to fiddle around on a bunch of little things seems daunting. I'll have to experiment with this Adobe tool some more. Maybe try to put a whole book through it to see how it chews on it. Thanks again for the reminder.
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing? —Bhikkhuni Pema Chödrön Humankind cannot bear very much reality. —From "Burnt Norton," Four Quartets (1936), T. S. Eliot |
01-15-2009, 08:31 PM | #4 (permalink) |
Mine is an evil laugh
Location: Sydney, Australia
|
Do you have the source document used to create the PDF, or was it originally authored in Acrobat? For example, if the original doc was authored in Word, you might find that document is a better place to start for your HTML conversion.
__________________
who hid my keyboard's PANIC button? |
01-16-2009, 04:46 AM | #5 (permalink) |
Darth Papa
Location: Yonder
|
PDF is meant to be an end format, not a transport format. So, you do your work in Quark or whatever, and bake it into a PDF as an end product to send off to the printer or have people download from the web. If you've ever looked in a PDF.... I mean, it's plain text, but there's nothing at all simple about its format.
HTML makes very very good sense as a format for mobile books. It's the standard for web pages. Remember that it's not a FORMAT per se (at least, not the way PDF or MS Word .DOC is). It's a MARKUP LANGUAGE that the display device can interpret and render in whatever way is best for its own particular display idiosyncrasies. |
01-16-2009, 08:02 AM | #6 (permalink) |
Addict
Location: Cottage Grove, Wisconsin
|
I'd rather have pdfs than html. I guess i won't be getting on-line books from amazon. Still.
As ratbastid says pdf and postscript, from which it stems, are for printer and screen output. All my writing is in pdf and ps, but i get that output from TeX and LaTeX. If i want html instead of pdf, i run my files through a different engine. Is it possible to go back to the source files? I wonder if an html wrapper for your pdfs would work? <html> link to pdf </html> Another possibility is to go pdf --> plain text --> html |
01-16-2009, 08:24 AM | #7 (permalink) | |||||||
warrior bodhisattva
Super Moderator
Location: East-central Canada
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Your Title Here Note the CSS "standard." Streamlined, and it uses such considerations as relative text size rather than fixed, so users can control text size no matter what. This is the bottom line (after some more thought and research): There is no automated process for making Kindle-ready books that don't look like shit. What this means is that publishers will have to basically re-edit and/or re-jig the layouts of each and every book independently. *groan* There is currently no way around it, and this is why I'm seeing "Kindle conversion" services popping up on the Web. It's not difficult to do yourself, but it is time-consuming, especially when you have around 200 titles. Quote:
The thing to realize is that if we're selling these books, we must have a quality standard. The text must be readable as a book, whether it's digital or not. We can't just spit these out in txt and hope our readers are fine with it as is. I don't think it'll work on its own. Everyone, thanks for your feedback and suggestions. I now have to go shop around for affordable Kindle conversion services.
__________________
Knowing that death is certain and that the time of death is uncertain, what's the most important thing? —Bhikkhuni Pema Chödrön Humankind cannot bear very much reality. —From "Burnt Norton," Four Quartets (1936), T. S. Eliot Last edited by Baraka_Guru; 01-16-2009 at 08:44 AM.. |
|||||||
Tags |
html—am, missing, pdf |
|
|