Production Type (2017), Spectral

Spectral is a parametric serif font with true italic, bold and small caps, a number of weights, and the desired Latin Extended Additional diacritics. It’s free to download and is available for use under the SIL Open Font license 1.1. Personal take: Spectral is a big advance on what’s out there, offering unprecedented typesetting flexibility, but it’s not yet clear how well it is suited for the printed page. See some informed criticism.

git: https://github.com/productiontype/Spectral


Specimen of Sanskrit text with diacritics set in Spectral (source: Vāṅmaṇḍalanamaskāraḥ, dsbcproject.org/canon-text/book/745)

Modular Infotech’s Unicode Devanāgarī fonts

High-quality Devanāgarī fonts suitable for professional typesetting are still hard to come by. One foundry producing fonts to something like the required standard is Pune-based Modular Infotech. They offer true bold faces and true italics. Refer to their specimen (published 2004, but apparently still current):

Modular Infotech Typefaces Catalog (2004:2)
Modular Infotech Typefaces Catalog (2004:2)

Since I haven’t used any of Modular Infotech’s fonts at the time of writing – they don’t come cheap – this is not yet a recommendation. Meanwhile, it’s possible to do some limited testing at fonts.com by clicking ‘TRY IT’ and typing a Unicode Devanāgarī string.

The Brill font (2011)

The Brill is a Unicode font containing most of the characters used to typeset Sanskrit in roman script. It’s based on Baskerville, which has been used widely in typesetting Brill’s books. From the official pamphlet:

The Brill.
The Brill.

It’s plain to see that it blows fonts like Gentium out of the water, though that certainly isn’t saying much. I don’t love The Brill; some of its features evoke Bulmer’s inelegant take on Baskerville, and it lacks breathing space (which was a design requirement, “allowing Brill to reduce its environmental footprint” – like that’s going to happen). Which is just as as well, because The Brill can only be used for non-commercial purposes – i.e., nowhere in a published PDF or in the browsed or printed page without prior written permission. If you agree, download it here.

Pandey’s Siddham Script in Unicode proposal (2012/8)

Anshuman Pandey. ‘Proposal to Encode the Siddham Script in ISO/IEC 10646’. ISO/IEC JTC1/SC2/WG2 N4294 L2/12-234R. PDF. 2012/08/01.

Mr. Pandey’s proposal – now no longer preliminary – promises to fill yet another gaping hole in the standard encoding of important Indic scripts. Now would be an appropriate time to comment, if you haven’t already commented.

(I would hope, at minimum, for the addition of a full set of ten digits in the final proposal. Often such basics fall through the gaps because the corpus of readily available primary material is so limited. Here‘s a nice “7-8th century” bilingual manuscript with a varṇamālā (no digits, though) which is both in good condition and readable online, thanks to the care of its Japanese custodians. Incidentally, this clearly confirms that two of the “Punctuation and ornaments” in Pandey’s Fig. 33 are ornamental final anusvāra [अं字].)

Comments should be emailed to Anshuman Pandey, whose address is given in the N4294 proposal (link above) and at the bottom of his personal website (link).

ISO/IEC JTC1/SC2/WG2 N4294 Fig.1. Proposed code chart.

Source Sans Pro: An open source Unicode font that works

A new and ambitious font, Source Sans Pro, which has glyphs in the Latin Extended Additional codeblock (required for most Indological publishing in Unicode), was released by Adobe earlier this month.

Why ambitious? Because free, open source, high quality and produced by a stalwart of design in the digital era, all at once. Its letterforms riff on News Gothic, a typeface of enduring appeal. And it comes with an inspiringly comprehensive set of weights, from Extra Light to Black, and true italics. Anyone who knows what a proper font needs to have will know how rare and remarkable this is. Although it’s optimised for user interfaces, I’ve tested it in XeTeX and found that it works superbly. Here’s a snippet of how it looks, from the specimen:

Source Sans Pro Capitals, regular weight (specimen, p.9).

What’s the catch? None other than the fact that just by using it and pulling apart the source, you might be more inclined to contribute to its development. A reason for releasing the font as open source (and hence free) is to demystify the increasingly complicated process of creating multiple-weight Unicode OpenType fonts, thereby encouraging the production and proliferation of fonts that meet contemporary standards. Open source lets all that complexity communally come to light, as Paul D. Hunt (and his commenters) reveal in Adobe’s official announcement of the font.

It’s downloadable from SourceForge.

Nepalese Script in Unicode, 2: More on JTC1/WG2 N4184

Refer to: Anshuman Pandey, ‘N4184 Proposal to Encode the Newar Script in ISO/IEC 10646’, February 29, 2012 [PDF]. Previous discussion: here.

0. On the Name ‘Newar’

The name ‘Newar’ is preferable simply because most other options can be ruled out. ‘Nepalese’ is untenable, because it falsely implies a one-to-one relationship with the present-day nation-state, even though it is accurate within a certain (historically earlier) context. ‘Newari’ is a (now deprecated) name for the language – not the script, nor anything else; ‘Nevārī’ is quite meaningless, except to some Indologists.

The proposal, as I understand it, indeed deals with the Pracalita script, but has enough hooks to allow unification with proposals for other Newar scripts, such as Bhujiṅmola – hence ‘Newar’. (NB: It is not yet clear whether unification with Rañjanā – which is, strictly speaking, Indo-Nepalese, and which has a user base that includes many non-Newars, such as Tibetans – is feasible. In any case, much of the present and previous discussion about the Pracalita script is also applicable to Rañjanā.)

1. Additional Information On Glyph Names

11442 NEWAR FINAL ANUSVARA: Although this mark originates with the m-virāma mark used by East Indian scribes, in Nepal it has multivalent significance and in many contexts has nothing to do with nasalization (often being interchangeable with 1144B NEWAR GAP FILLER). Recommendation: Minimise phonetic/semantic description in favour of graphic description – maybe NEWAR SEMICOLON for want of a better term. Classify under Punctuation or Various Signs.

11443 NEWAR SIDDHI = शुभचिं (Shrestha NS 1132:21). There is no uniform name for this mark in Newar (esp. not the neologism bhiṃciṃ), nor is siddhi/añji recommended (not just because this designation is unknown in Nepal, but because usage may also vary; confusion with NEWAR OM is common). Recommendation: NEWAR AUSPICIOUSNESS MARK or similar.

11448 NEWAR COMMA = अर्धविराम (Shrestha NS 1132:24).

11449 NEWAR DOUBLE COMMA: I now think this mark can be represented with two adjacent NEWAR COMMAs. Its usual behaviour of stacking diagonally (see Fig.3) rather than horizontally should however be specified. Recommendation: Remove from the repertoire.

1144B NEWAR HIGH SPACING DOT = अल्पविराम (ibid.).

1144C NEWAR ABBREVIATION SIGN CIRCLE = संक्षेपीकरण यानाः च्वयातःगु थासय् थुगु चिं (ibid.).

1145A NEWAR FLOWER = स्वांथें ज्याःगु चिं (ibid.).

1145C NEWAR PLACEHOLDER MARK is the line-width equivalent of the NEWAR GAP FILLER (see below). Recommendation: Change name to NEWAR LINE FILLER MARK.

2. Morphology of the Gap Filler Mark

Following comments on earlier drafts of N4184, especially those of Kashinath Tamot, it should be clarified that the primary function of 1144B NEWAR GAP FILLER is not that of indicating a break in a word (as per the previous name SANDHI MARK), but rather of filling space up to the end of a line margin. (A hyphen indeed performs a space-filling operation as well as functioning as a word-breaking mark. However, I suggest that ‘hyphenation’ be dropped from the formal description of this mark to avoid confusion.)

The purpose of this mark has been obvious enough to specialists – recently see, e.g. Ishida (2011:ix), where it is called a ‘line-filler character’, Zeilenfüllzeichen. (In fact, this mark does not fill a line – this is the function of 1145C NEWAR PLACEHOLDER MARK; rather, it fills a space of less than one full glyph-width at the end of a margin, not necessarily the end of a line.) Nonetheless, it is easily seen that the mark could be confused with, e.g., a visarga, daṇḍa or similar. In earlier discussion on the proposal, its purpose has remained unclear to the user community, perhaps due to its unstable shape. Significantly, the NEWAR GAP FILLER MARK changes according to the width of the glyph. Its behaviour may be represented as follows:

Fig.1: Morphology of the Indo-Nepalese gap filler mark.

Variations in this mark may therefore be regarded as contextual alternatives, rather than separate code points. I suggest, as per the diagram, that no more than three variants need be represented; although the glyph could conceivably incorporate four or more variations (e.g., five vertically stacked dots, at 20% character width), this is probably excessive.

Recommendation: It may be implemented as one code point with contextual alternates, or 3 or more code points corresponding to each quantum of width.

3. Swash Forms

Several glyphs may be alternatively represented with swash forms, created by extending elements of the glyph into surrounding white space. These forms do not require dedicated representation in an encoded repertoire; however, they should be included in any full description of Indo-Newar scribal culture, and font designers might want to incorporate them. Swash forms are often contextually invoked: they are used at the top line of a block of text (upward extension), but may also be seen on the bottom line (downward extension), and even more rarely at the right and left margins, and within interlinear white space. An example:

Fig.2: Swash forms in MS University of Tokyo (Matsunami) 419, f.132r.

Characters routinely represented as swash forms include:

  • 11432 NEWAR VOWEL SIGN U, 11433 NEWAR VOWEL SIGN UU, 11439 NEWAR VOWEL SIGN AI, 1143B NEWAR VOWEL SIGN AU, (superscribed) 11428 NEWAR LETTER RA, 1143D NEWAR SIGN CANDRABINDU, 1143E NEWAR SIGN ANUSVARA – upward extension;
  • 11402 NEWAR LETTER I, 11403 NEWAR LETTER II, (subscribed) 11417 NEWAR LETTER NYA, 1141D NEWAR LETTER TA, 11423 NEWAR LETTER PHA, 11425 NEWAR LETTER BHA, 11429 NEWAR LETTER LA, 1142D NEWAR LETTER SA, 1142E NEWAR LETTER HA, 1143C NEWAR SIGN VIRAMA – downward extension.

4. Revisions To Standard Forms

The following changes to standard forms are recommended – see glyphs highlighted in Fig.3, in which all glyphs have been redrawn from scratch to accord with common scribal practice. The most widespread change is that the headstroke no longer extends past the right descender (which is inconsistent with almost all scribal practice). Standard forms for VOCALIC R, VOCALIC RR, GA, SHA, dependent VOWEL SIGN II, VOCALIC R, VOCALIC RR as well as *VOCALIC L, VOCALIC LL (these should certainly be specified and named) should be altered accordingly. DIGIT ONE should also be changed in order to avoid confusion with SIDDHI.

Fig. 3. Recommended changes to forms in N4184.

5. Some Remaining Questions

5.2 Letter-Numerals: “There are at least 27 such Newar ‘letter numerals’… It may be possible to unify Newar letter-numbers with corresponding Brahmi characters.” The issue here, as far as I can see, is: which letter-numeral conjuncts differ from non-numeral conjuncts of the same letters (all differences should be specified). To put it another way: which letter-numeral conjuncts uniquely signify letter numerals, if any? Perhaps our European colleagues, with their extensive access to funding, institutional support and manuscript sources, could clarify the matter. (Don’t worry, we won’t hold our breath.)

5.3 “Should editorial marks be encoded on a per script basis or would be it reasonable to unify such marks in a pan-Indic block?” (Pandey 2012:13). Out of our hands, but if they aren’t unified, they should be included in the Newar block.

[rev 0.1: 2012/06/19]

Emerging Unicode codeblocks

Recent discussion on the proposed Newar Unicode codeblock has been met with silence (signifying disinterest, ignorance, or unqualified approval – or all three, one must assume). Those who did more than glance at the discussion would have been aware that that several areas of the Unicode codespace are expanding rapidly, many of which are going to infringe upon a far wider chunk of Asianists’ and philologists’ territories. In an age of character-constrained discourse, when just a few letters can reveal something important about you – OMG!* – and a picture tells the thousand words you don’t have the time to text, demand for emoticons, emoji and pictographs soars.

@mrJUSTINMARTIN yeah man!!! Be there in 5!

The Symbola font [download] has reasonably good coverage of the newer codeblocks. Other fonts by the designer, George Doulos, show how much work (non-Indological) classicists are putting into the codification of the premodern repertoire.

Symbola specimen, p.10, showing glyphs from the Miscellaneous Symbols and Pictographs, Emoticons, Transport and Map Symbols, and Alchemical Symbols codeblocks.

Other fonts which cover recent additions to the standard, like Quivira, are easy to find online.
Continue reading “Emerging Unicode codeblocks”

Nepalese Script in Unicode, 1: JTC1/WG2 N4184 Open Thread

Your comments are invited on a proposal to encode the script ‘prevalent’/’in vogue’ (pracalita) in Nepal since the late fourteenth century, and which since the Shah period has continued in use in the scribal and print culture of the Newars. The proposal under discussion was submitted a month ago by Anshuman Pandey to the international standards body for character sets, WG2 under JTC1 of the ISO. Download it here:

Anshuman Pandey. ‘Proposal to Encode the Newar Script in ISO/IEC 10646’. ISO/IEC JTC1/SC2/WG2 proposal N4184 [PDF]. January 5, 2012. [Supersedes N4038, ‘Preliminary Proposal to Encode the Prachalit Nepal Script’]

Anyone can submit a proposal for consideration by WG2. However, this is not a trivial process; documents need to comply with the group’s requirements, and if I observe correctly, there are very few competing complete proposals for historic scripts. No proposal has come from the Nepalese government, Newar culture having little, if any, official status in the Shah and post-Shah nation-state. The proposal under discussion (hereafter “N4184”) is that of a private individual, in collaboration with the Script Encoding Initiative at Berkeley. Mr. Pandey has graciously agreed to consider informed feedback on his proposal, which I hope will be incorporated into future documents submitted to WG2. It is in this constructive spirit that your feedback is requested; anyone may add comments via the form the end of this post.

1. Intended scope of these comments: focus on repertoire

The present discussion should focus on the completeness and accuracy of the glyph repertoire represented in the present proposal. Matters such as the proposed name and classification of the script, the description of interaction between glyphs (e.g. conjunct formation, §4.8.1), issues related to other Nepalese or Indic scripts (except where strictly relevant) and so on should notbe discussed here. If there is sufficient interest, these matters can be addressed in separate posting(s). Here I will offer some of my own preliminary, informal feedback on the proposal, on which comments are also welcome.

N4184 aims to “encode a core set of Newar characters” (p.17). This invites the question of how “core” should be defined. I will not discuss this in depth, other to say that the standard should include those characters which are most common and most useful in this form of writing. Specifically, I propose that the characters depicted in Figs.6 and 7 below should be part of the standard. This is the repertoire proposed in N4184:

Pandey 2012:24, Fig.1

Continue reading “Nepalese Script in Unicode, 1: JTC1/WG2 N4184 Open Thread”

EB Garamond: A better class of open-source font

Georg Duffner’s EB Garamond, according to its official website, “is an open source project to create a revival of Claude Garamont’s famous humanist typeface from the mid-16th century.”

It has true italics, true bold (more like semi-bold), true subscripts and superscripts, true swash caps and true small caps (including true capital ß – see Ralf Herrmann’s crystal-clear presentation on this). There are old style figures, discretionary ligatures, and work-in-progress initials. And in particular, there is coverage of the Unicode Latin Extended Additional codeblock.

This is not only actually all in a free font, but in one that looks pretty good, as the specimen [PDF] shows:

EB Garamond specimen, p.9
EB Garamond specimen: just... wow.

Although I haven’t given EB Garamond a full tryout yet, I can confirm that it works out of the box in XeTeX, which is probably the tool that can exploit its advantages to the fullest.

A caveat: EB Garamond is work in progress; Cyrillic italics, for example, are clearly provisional at the time of writing, and some outlines were updated as recently as a couple of weeks ago on github. Nonetheless, it will be good enough to set camera-ready copy for many projects as it stands; it is certainly miles ahead of the unspeakable G****** U****** and its ilk. Thankyou, Mr. Duffner.

Files

https://github.com/georgd/EB-Garamond/blob/master/otf/EBGaramond.otf?raw=true
https://github.com/georgd/EB-Granjon/raw/master/OTF/EBGaramondItalic.otf
https://github.com/georgd/EB-Granjon/blob/master/OTF/EBGaramondBold.otf

Unicode Siddham symbol

᠀ नमो वागीश्वराय ।

Indologists still haven’t moved the Devanagari code block much beyond the inadequate ISCII-1988. Meanwhile, Michael Everson’s ill-informed “Newari” (sic) proposal — the only rañjanā-lipi proposal out there — hasn’t gone anywhere since the 1990s. Today, something like the siddham symbol in Unicode 6.0 [test page] has to be found in the Mongolian code block:

U+1800 MONGOLIAN BIRGA