Behdad Esfahbod's daily notes on GNOME, Pango, Fedora, Persian Computing, Bob Dylan, and Dan Bern!

My Photo
Name:
Location: Toronto, Ontario, Canada

Ask Google.

Contact info
Google
Hacker Emblem Become a Friend of GNOME I Power Blogger
follow me on Twitter
Archives
July 2003
August 2003
October 2003
November 2003
December 2003
March 2004
April 2004
May 2004
July 2004
August 2004
September 2004
November 2004
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
October 2008
November 2008
December 2008
January 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
November 2009
December 2009
March 2010
April 2010
May 2010
June 2010
July 2010
October 2010
November 2010
April 2011
May 2011
August 2011
September 2011
October 2011
November 2011
November 2012
June 2013
January 2014
May 2015
Current Posts
McEs, A Hacker Life
Saturday, August 13, 2005
 Pango+TeX Follow-Up

A few readers asked me to elaborate on why I think a Pango-enabled TeX is useful, how does it work, and what to anticipate. In this post I'm calling such a combination PangoTeX. To understand why PangoTeX is useful, we need to know what each of them is good at, and what not.

TeX is pretty good at breaking paragraphs into lines and composing paragraphs into pages. What it is not good at is complex text layout, means, non-one-to-one character to glyph mapping and more than one text direction. e-TeX has primitives to set text direction, but not more.

Pango on the other hand, knows nothing about pages and columns. It does break paragraphs too, but not anything to envy. Patches exist that implement the TeX h&j algorithm for Pango, but that doesn't matter here. What Pango is pretty good at is the character to glyph mapping, where it implements the OpenType specification for quite a bunch of scripts. Pango contains modules for the following scripts: Arabic, Hebrew, Indic, Khmer, Tibetan, Thai, Syriac, and a Burmese module is recently proposed. Other than that, it has a module to use Uniscribe on Windows, and there's also a module available on internet (that may be integrated into Pango soon) to use the SIL Graphite engine.

So the plan is to make TeX pass streams of characters to Pango and ask it to shape them. The way XeTeX is implemented is that TeX passes to the higher-level rendering engine words of text and all it asks for is the width the word would occupy. XeTeX has backends for Apple ATSUI and ICU at this time. Pango has the advantage of abstracting OpenType in general, Uniscribe, Graphite, and hopefully ATSUI in the future. So it would be enough to only have a Pango backend.

That level of integration is pretty much what XeTeX does, which is quite useful on its own, but doesn't mean it should be the end of it. Much more can be done by using Pango's language/script detection features, it's bidirectional handling engine, etc, such that you don't have to mark left-to-right and right-to-left runs manually. Moreover, while doing this, we would introduce the Unicode Character Database to TeX, such that (for example) character category codes would be automatically set for the whole BMP range, and you may query other properties of characters should need be.

The way Omega approached the problem of Unicode+TeX was to add a push-down automaton layer that could convert the character stream at as many stages as desired. So you could have an input layer to convert from legacy character sets to Unicode, and then a complex shaping engine, and finally convert to font encoding. The problem with this approach is that it's very complex, so it introduced a zillion bugs. Of course, bugs can be fixed, but just then comes the next problem: Duplication. The powerful idea behind having shaping information in OpenType fonts was left unused there. For each font you had to implement the shaping logic (ligatures, etc) in an Omega Transformation Format file. Moreover, the whole machinery was more like Apple's AAT, rather than OpenType, which means it doesn't have any support for individual scripts: If you want to do Arabic shaping, you have to code all the joining logic in OTF. Neither did it provide Unicode character properties. If you need to know whether a character is a non-spacing mark, you have to list all NSMs in an OTF file. If you wanted to normalize the string, well, you had to code normalization in OTF, which is quite possible and interesting to code, but don't ask me about performance... Putting all these together, I believe that Omega cannot become a unified Unicode rendering engine without introducing support from outside libraries. When you do import some support from Pango and gNUicode for example, all in a sudden you do not need all that push-down automata anymore. A charset conversion input layer that uses iconv is desirable though.

About the output layer, XeTeX generates an extended DVI and converts it to PDF afterwards, using a backend-specific extended DVI driver. We can do that with Pango+Cairo too, to write to a PDF or PS backend. Or since Pango computes glyph-strings when analyzing the text, we may not even need Pango when converting the DVI to PDF. Anyway, what I'm more interested is to expand pdfTeX directly. We don't really need DVI these days. As I said before, the assumptions that Knuth made have proved to be wrong in the new millennium. It's not like the same DVI would render the same everywhere, no, you need the fonts. That's why fonts used in today's TeX systems is separate from fonts you use to render your desktop to you screen, because they are isolated and packaged separately in a TeX distribution, such that almost everyone has the same set of fonts... This should be changed too, with only having PDF output, and some kpathsea configuration. You still need the fonts to compile the TeX sources, but the output would be portable.

To conclude, I have changed my mind about cleaning up and adding UCD support to Omega and believe that we badly need a pdfTeX+Pango engine to go with our otherwise-rocking GNOME desktop.

That's all for now. I'm very much interested to get some feedback.

Comments:
There is xmlroff (http://xmlroff.sourceforge.net/) which converts XML to PDF/PostScript, and is based on PangoPDF (http://pangopdf.sourceforge.net/).

I hope it helps.
 
This is some seriously sensible musing. PangoTex would be very cool, albeit quite ambitious. Hope something gets off the ground! :)
 
I don't know whether my comment is the kind of comment that you expect. I'm not a programmer myself, but an average Gnome user that has written his PhD dissertation (in Philosophy about Gadamer and Plato) using LaTeX and managed to adapt it to Lambda to submit it to be published on CD-ROM (the file is here, just in case you wonder). With this background (not technical at all), the two fundamental shortcomings that I found in TeX are that it doesn't support Unicode and it is not easy to use TT/OT fonts with TeX. My question would be whether PangoTeX will deal with such topics (ie, would make easier for the rest of us) or not. By the way, I need Unicode only for polytonic Greek (which is typeset as other Westeuropean languages).
 
Pablo Rodríguez:
It's amazing that you are using polytonic Greek :)
I am from Greece myself and I mostly encounter few locals who are interested in Greek Polytonic.

There is a recent Greek Polytonic font from http://www.ellak.gr/fonts/mgopen (MgOpen Canonica). It's properly free font.

Also, to type Greek Polytonic in Linux (GNOME), you can follow the tips from
http://simos.info/blog/?p=342
It works "out of the box".

Also, I tried last year using DocBook XML to write Greek Polytonic and here are my results:
http://simos.info/blog/?p=288
http://www.advogato.org/person/simos/diary.html?start=4 (same page, but in English).
It shows examples using xmlroff and the Apache offerings.
 
Pablo Rodríguez:
It's amazing that you are using polytonic Greek :)
I am from Greece myself and I mostly encounter few locals who are interested in Greek Polytonic.

There is a recent Greek Polytonic font from http://www.ellak.gr/fonts/mgopen (MgOpen Canonica). It's properly free font.

Also, to type Greek Polytonic in Linux (GNOME), you can follow the tips from
http://simos.info/blog/?p=342
It works "out of the box".

Also, I tried last year using DocBook XML to write Greek Polytonic and here are my results:
http://simos.info/blog/?p=288
http://www.advogato.org/person/simos/diary.html?start=4 (same page, but in English).
It shows examples using xmlroff and the Apache offerings.
 
Thanks for all the comments. Pretty encouraging.
 
Longer than six months after this post, what are you plans about PangoTeX?

The XeTeX documentation includes plans for portability (I guess not before releasing version 1.0).

Wouldn't be interesting to be be able to use XeTeX in Unix-like systems?
 
Six months, sigh...

I still plan to work on PangoTeX when I get a bit more free time. In the mean time I've been learning the internals of Pango more, and following both XeTeX and pdfTeX's developments. The main problem with XeTeX is that it's not based on pdfTeX, and it's already obvious that pdfTeX is *the* TeX engine of the future.

I also think that there are parts of XeTeX that can use a better design... Anyway, I don't think they are quite interested in exploring Pango. I did send a link to my post the their mailing list. So it seems like if somebody's to experiment with PangoTeX, it's got to be myself, and that has unfortunately got to wait a bit more.
 
I just submitted a project proposal for the Google SoC, called 'Integrate Pango with TeX'. Maybe you don't have to make PangoTeX all by yourself after all...

XeTeX released a prototype Linux version about a week ago. No Pango support, though. And I haven't been able to play with it yet, so I have no idea how well it works.
 
Six months again, Behdad (I'm afraid... ;-))

I have been using the XeTeX implementation for Linux and I have replaced TeX with XeTeX (actually pdfLaTeX with XeLaTeX).

Do you have plans to integrate some XeTeX parts with Pango and pdfTeX in the near future?

Is LuaTeX relevant for your integration work? (I wonder whether LuaTeX will be eventually merged in pdfTeX.)
 
Hi Pablo :)

I'm planning to have a look into Pango+XeTeX during our Text Layout summit:

http://live.gnome.org/Boston2006/TextLayout

As for LuaTeX, I read it's the successor of pdfTeX, and they are adding support for 16-bit chars. That's interesting stuff too.
 
Is PangoPDF any good at doing XML Conversion? Man, I really need to learn how to do a decent job with xml conversion. The problem is that I can't find anyone to teach me. Pango works though?
 
Post a Comment



<< Archive
<< Home