Sunday, September 21, 2008

Glyphs to Character Codes - revisited

My original plan to map glyphs to character codes appears doomed. There main flaw in this plan is that it looks like the glyph names are not reliable - what I mean by this is that some glyphs do not have names that can be mapped to a character code. For example the font "Li Song Pro" has glyph names names that apper to be based on glyph codes rather than character codes.

At this point I took stock what I was trying to achieve and why the character codes are necessary. Character codes are required because there are occasions in the output where a font is output that is "complete" with a correct character mapping is required. To satisfy this all the characters that are part of the "normal" mapping must be present - as well as any additional character that may be used by specific text in the output (fonts are shared across objects). Other parts of the output requires partial fonts - that is just the glyphs in a font that are actually used. In this case the character codes don't really matter. As there can be overlap when glyphs are output if they are a part of the "Standard character set" then they should be identified in order that they can have the correct character codes attributed to them.

Looking at things in this way it simplifies things. We need to identify if glyphs are a part of the standard character set. The solution that seems to present it self is this:
  1. Locate the character(s) resonsible for a paticular glyph
  2. Check if that character is a part of the stabdard character set - this can be done by mapping the character to a glyph code using the font's CMAP and checking if we get the same glyph code.

No comments: