Sunday, May 5, 2013

pdf2htmlEX hits Top Trending Repos

As of May 6th, 2013, pdf2htmlEX hits the Monthly Trending Repos at GitHub:
It has also become the top daily & weekly trending repo. Didn't see this coming!

I realized that it might be necessary to start a blog sharing news, technical and non-technical stuffs about pdf2htmlEX. And here it is.


pdf2htmlEX (https://github.com/coolwanglu/pdf2htmlEX), just as its name, converts PDF into HTML. How does it work? Let the demos speak:

Many people wonder why they should ever convert PDF to HTML. A short answer is they should not, because they are viewers. While this tool is designed for publishers.

This is an era of Web. For many people, the Internet = the World Wide Web. When not at work, I rarely let my screen occupied by any window except for a browser. What else can you not do with a browser? I like web pages, they have become more and more elegant, but yet powerful (to use) and simple (to compose).

Despite of the development of HTML/CSS/JavaScript, what is your experience with reading PDF files online? Although PDF is always the first choice for any cases involving printing, and no need to mention LaTeX users. When you put an 'online' afterwards, I'd say terrible. Years ago, online PDF reading means
ugly, insecure, unstable and slow plugins that never releases my keyboard & mouse focus. And now browsers have started to implement their own built-in PDF viewers -- PDF is so popular, while the plugins are so not good, that Web browsers have to do this to comfort users.

Another thing I like in web pages, but not in PDF files, is about interaction, quick example: links, on Wikipedia, you may receive a rather smooth information flow while your cursor dancing among the links. Not to mention all kinds of CSS/JavaScript tricks that amaze you. The key is that everything is accessible. PDF, on the other hand, is more like a blackbox, or an <iframe>, it does have many features, but you (the hosting web page or the browser) never know what's going on inside.


This is not fair since PDF is never designed for this. But the idea is that the web technologies are powerful enough to render PDF files, and people need this -- see Crocodoc and SlideShare. pdf2htmlEX works as a bridge, and the target is turn 'Everything to PDF' into 'Everything to Web', just imagine:

  • Your careful designed resume can be published online with Google Analytics embedded.
  • Your slides can be shown online with all kinds of CSS/JavaScript eye candies.
  • PDF documents never make your web sites ugly.

Hopefully some day in the future, we will not be able to tell HTML from PDF by their appearances, just like we cannot tell JPEG from PNG. (or can you?)

No comments :

Post a Comment