Skip to content

Tofu and the NoTo font project

Perhaps you’ve seen it? Some document on the web or PDF or a word processing document with characters that look like little boxes of tofu? That’s what happens when a called for font or the substitute for it does not have a proper glyph for the character code. All text is to a computer is string of character codes that indexes display instructions. Those display instructions can be a bitmap or a mathematical description that must be processed to create something to put on the display device or printer in the space allocated and within the specifications of the device.

Google didn’t realize how big a problem this was until Android and ChromeOS became popular in wold wide markets.

The Noto font project (it’s a mashup of ‘NO more TOfu’) has been something of a labor of love, taking five years to reach its conclusion. But the result is an open source Noto font family which Google says includes “every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters”.

When we began, we did not realize the enormity of the challenge. It required design and technical testing in hundreds of languages, and expertise from specialists in specific scripts. In Arabic, for example, each character has four glyphs (i.e., shapes a character can take) that change depending on the text that comes after it. In Indic languages, glyphs may be reordered or even split into two depending on the surrounding text”.

Mark Wilson: Google releases open source font Noto to eliminate the tofu problem.

Fonts on computers have always been a challenge. Consider Donald Knuth’s Tex [wikipedia] released in 1978. Adobe got into the act in 1984 when it released its page description language [wikipedia] and Apple, due in part to licensing costs, came up with TrueType in 1991. But when you get into fonts, you get into typefaces and glyphs and rendering methods and pixelated display issues. That’s the software facet. Another facet gets into presentation, readability, and human factors. What Google got into was another facet that is the mapping of glyphs to alphabets to language constructions. Unicode was developed to help code these things but artwork was needed to fill in a representation of the indexed glyph.

That’s a long way from the character generator chip on my TRS-80 model 1 where I had to piggyback another bit of memory in order to get lower case in a 5×8 pixel array mapping. I can image what would happen if I tried to use that computer to map a modern font glyph onto its screen. I’m not sure it even had enough memory to store the code needed for the glyph description in its memory much less the code to translate that description to a screen image. And how long would the processing take?