Thursday, September 26, 2013

pdf2htmlEX got a logo

I managed to craft a logo with Inkscape, which is basically an emblem of "<pdf>". Perhaps it is not of much use, but I just hope that it can help visualizing the concept.

The images are located in the logo/ folder, all of them are licensed under CC-BY 3.0.

Friday, September 20, 2013

Preliminary support for Type 3 fonts

I'm happy to announce that a preliminary support for type 3 fonts has been added to pdf2htmlEX. For now 2 simple PDFs from PDF.js are passed:

https://github.com/mozilla/pdf.js/blob/master/test/pdfs/simpletype3font.pdf


https://github.com/mozilla/pdf.js/blob/master/test/pdfs/issue3188.pdf


This feature is actually one of the features that I want to implement the most, since the very beginning. Another one is generating background images in SVG, a preliminary version of which has also just been added.

Both features rely on CairoOutputDev from poppler, which further replies on cairo and freetype. Actually it might be possible to eliminate the dependency on freetype, but I don't want to touch those files in order to make it easier to merge upstream files in the future. Anyway seems that freetype is depended by poppler, so no big deal.


To enable this feature, you need the latest source code from git. Add `-DENABLE_SVG=ON` to cmake, and `--process-type3=1` when running pdf2htmlEX.


The current idea is, for each type 3 font, to dump each glyph into an SVG image and then combine them into a font with FontForge. It's actually inspired by FontCustom, I realized the capability of importing SVG glyphs of FontForge by reading the code of FontCustom.

Each glyph is drawn on a 100x100 canvas, although SVG is for vector graphics, CairoOutputDev would thicken thin strokes (for printing purpose?), which might ruin the font. Also there are cases that sampled raster images are stored in the SVG file, probably it is the behaviour of cairo due to the limitations of SVG. In such cases, 100x100 might not be large enough for a font.

The size is defined as GLYPH_DUMP_EM_SIZE in font.cc. I tried to set it to 1000, and indeed the quality for `issue3188.pdf` was improved; but for some other PDF files, the values in SVG files might be so large that FontForge would complain that those values cannot be stored into 16-bit fields. Or maybe it is the problem of TTF, and I'd better change it to another.

However due to the complexity of Type 3 fonts, (each glyph is a mini-PDF), especially the font matrix, I don't have a perfect solution for each possible cases. Right now let me just focus on `average` cases.

Wednesday, September 18, 2013

Preliminary SVG support

A preliminary SVG support has been implemented, powered by CairoOutputDev from poppler.

Since CairoOutputDev is not exposed by poppler, I have to maintain a copy of a few files inside pdf2htmlEX. Also cairo and freetype are required for this feature. This feature can be enable/disabled by the ENABLE_SVG cmake
 option.
 
A new option `--bg-format` has also been added, to specify the format for the background images. Currently only 'png' and 'svg' are supported.

(This is also a test for auto forwarding blog posts to the mailing list)


Sunday, September 15, 2013

pdf2htmlEX v0.9 released

pdf2htmlEX v0.9 is released. This version includes several bug fixes, and not much new features.

Changelog:

* Lazy loading of pages
* Show font names in debug messages
* Licensed changed
 - Additional terms for usage in online services
 - Remove GPLv2
* Bug fixes:
 - --optimize-text
 - Always use Unicode encoding for fonts
 - space width
 - disable ligature in Firefox
 - Uninitialized memory for encoding
* New options:
 --embed
 --embed-***
 --override-fstype
* Deprecated/Removed options:
 --single-html
 --remove-unused-glyph

Features planned in v0.10
- Preliminary optimization for raster images
- Preliminary support for SVG background.

Sunday, September 1, 2013

Development Log

I've been quite busy since the last article, which will be still going on for a while, so please forgive me if I'm not quite responsive. Just feel free to poke me by email if I have not reply your message in 3 days.


I had planned to add image optimizations in v0.9, while there are APIs in poppler to detect the area of changes, there is not such convenient APIs for outputting (a partial of the background), I might still need to work on this, the ideal way is to avoid bring more modified poppler code back to pdf2htmlEX (then maintain them).

Recently there are pull requests about key shortcuts and UI events. (Thank you!)
But they required to inject global handlers in the web pages, which might not be desirable for some users who need only nothing but HTML pages. I'm now considering to create a new mode, say standalone mode, which means the user is intended to use the complete package produced by pdf2htmlEX. In this way we can add more UI features without affect other users. I'm still thinking about the logic, whether this should be a new set of files (manifect and other files) or simply a switch in JS.