**************************************************************** * Program a2ac * **************************************************************** version 1, June 95 Petr Olsak (Ol\v{s}\'ak) Program a2ac (Afm To Afm and add Composites) converts the afm files (Adobe Font Metrics) to new files in the same format. Program reads so called "description file" during its work. The changes, we want to made, are defined in this file. The main feature of the program is adding the new composites by clear descriptions and kern pairs by patterns. The command line looks like: a2ac input.afm desc.tab output.afm The first parameter is the name of input afm file, second one is the name of description file and third one is the name of output afm file. The extensions (.afm, .tab) must be written---the program has no algorithms to add the extensions automatically. If we need to create the log file and we need not to see more information on the terminal then we can write: a2ac input.afm desc.tab output.afm > logfile The command line with program a2ac caling is usually placed as a part of a script or a batch file (see the "mkfnt" UNIX script in a2ac package). The file cscorr.tab is included to the package of a2ac. It includes definitions of composites for Czech and Slovak letters (i.e. the output font includes Czech and Slovak letters and appropriate kerns for this letters). The input font has to include all the accents for composites letters. The necessary and sufficient condition is to include the acute, caron, ring, quoteright, dieresis, circumflex accents and standard 26 letters (in lower- and upper-case) for Czech and Slovak alphabet. If the Adobe font in StandardEncoding is used then the condition is satisfied. The output afm file includes the composite characters for all Czech and Slovak alphabet and new kern information. This file can be used as an input for typesetting systems (for example TeX). Therefore, the whole Czech and Slovak alphabet is available for this systems. To prepare the new PostScript font for TeX you can use the script "mkfnt" or you have to follow next steps. For example, the standard PostScript font has afm metric with the name font.afm and the dvips program for dvi to PostScript is used: 1. a2ac font.afm cscorr.tab cfont.afm ... the metric csfont.afm is created with whole Czech/Slovak alphabet. 2. afm2tfm cfont.afm -t xl2.enc -v cfont rfont ... the metric rfont.tfm is created (it is needed for dvips handling) and the virtual property list cfont.vpl is made by encoding definition file xl2.enc. 3. vptovf cfont.vpl cfont.vf cfont.tfm ... the metric (for TeX) cfont.tfm is created and the virtual script cfont.vf (for dvips driver) is made. 4. We store the cfont.tfm, rfont.tfm and cfont.vf to appropriate directories: cfont.tfm for TeX input, rfont.tfm and cfont.vf for dvips input. We add the new line to the configuration file psfont.map of dvips: rfont The-Full-Name-Of-PostScript-Font If the PostScript font is no resident in the PostScript RIP of the output device, we have to store the pfb (pfa) format of PostScript font in our computer. Therefore the line in psfont.map looks like: rfont The-Full-Name-Of-PostScript-Font csr10) (MAPFONT D 0 (FONTNAME csr10) (FONTCHECKSUM D 0) (FONTAT R 1.0) ) - rename the new version of file csr10.pl to cfont.vpl - vptovf csfont.vpl csfont.vf csfont.tfm The vf and tfm of csfont is created. We remove the tfm file because the original tfm created by afm2tfm must be used for TeX. The vf file will be read by dvi driver without PostScript output. Therefore the substitute font csr10.pk will be used in this case. Use this feature only for proof reading, no for final output(!). More information ================ It is easy to see that two files have main role in preparing the new PostScript font for TeX---the cscorr.tab and xl2.enc in examples above. The following text gives more information about these files. The afm metric file describes information using symbolic names of characters (Aacute is A with acute, for example). Each name can be present in one of two variants. First, the name is bonded to the definite encoding position and to the PostScript procedure to rendering the image of character. Second, the name is described as so called ``composite character''. In this case the encoding position of the character is set to -1 and the description of making the character by elements is stored in afm. The elements are usually characters from first variant and only symbolic names are used. The main idea of a2ac program is to describe all requested composite characters in description file. Only the symbolic names are used, therefore the description file is totally independent on the encoding of PostScript font and encoding used by typesetting system. Program a2ac adds new composite characters into output afm file by information in description file. In addition the new kern pairs are added (usually for new composite characters). These data are described in description file too. The program has nonzero intelligence during the reading of description file. You can declare and use so called "variables", you can write metric and composite information by simple expressions and you can add new kerns by patterns in which the information of similar kern pairs is used (but exceptions are possible). Preparing of the font for TeX goes on the standard way (as for english language), after a2ac is used. You can use the afm2tfm which reads the converted afm file and arbitrary *.enc file to define the internal TeX encoding. The result is a virtual font which include two kinds of information: the information for re-encoding from internal TeX encoding to raw PostScript font encoding and the information about building the accented letters from elements. The first kind of information is a result of the *.enc file (used during afm2tfm processing) and the second one is result of the description file (used during a2ac processing). The behavior of the program =========================== The program works in three steps: 1. The input file is read and information is stored into memory. 2. The changes are performed by the description file in memory. 3. The contents of memory is written into output file. The following operations are performed in the second step: a) The variables are defined and values are set. b) The new composite information is calculated. c) Some metric corrections are performed. d) The number of kern information is reduced (prospectively). e) The new kern pair are defined. The operations are performed in the same order as written in the description file. If the font parameter IsFixedPitch is true, operations of type c), d) and e) are ignored. For example, the typewriter style fonts have IsFixedPitch=true, therefore the kern information and different metric width are irrelevant. Description file format ======================= It is recommended to use the cscorr.tab file as starting point for creating new description files. The description file has text format. Each line is comment line or execute line. The execute line is started from begin (without spaces) by a prefix. Prefix consists usually from two or three characters, see below. If the line does not start by any prefix, it is the comment line and will be ignored. It is recommended to start every comment line by some special (comment) character in order to never match the comment with any prefix. The space is sufficient comment character, but more suitable is to use character % or #. Comments are not allowed in the execute line. Summary of prefixes: >> .......... the variable definition (see a) NC, RC, !C ... the composite character definition (see b) RWX .......... the WX parameter correction ReduceKerns .. reducing the amount of kern data (see d) NK, RK ....... generation new kern data (see e) The order of execute lines is arbitrary. It gives the order of operations performed by the program. Operation can be performed in the time of all symbolic names used in execute line are known. First, the symbolic names are defined in input afm file a second they are defined by definition of new composite character. For example, we have to write the definition of Rcaron character as a composite before we define the new kern data with Rcaron character. If this condition is not satisfied then the error "Undefined identifier" occurs. a) The define-variables field ----------------------------- The line has following format: >> NameOfVariable = expression where ">>" is the prefix, "NameOfVariable" is variable identifier and the "expression" has some limited syntax in comparison with a common algebraic expression. We will call this expression "limited expression". The limited expression is a sum of terms. Each term can be one of the following syntactic objects: - the decimal number - the value defined earlier - the function b, w, h or W - the product of number and variable - the product of number and function Values of terms of addition are integers and the sum is computed with integers only. The value of variables can be integer only. The number can be expressed with decimal point. The usage of this case is reasonable only for product of number with variable or function. The result of the product is rounded to integer immediately. The product is written without presence of any "multiply" character, i.e. the common multiply character "*" is not allowed. The parentheses (with exception of usage parentheses around parameter of function), nesting of operations, product of two variables and fractions are not allowed. The identifier of variable can include any alphanumeric character and character "_" (underscore). The first character must be alphabetic. The length of identifier is not limited. Identifiers are case sensitive. Function b, w, h, W or k is written by its one-character identifier immediately followed by its argument in parentheses. In case of function b or k, there are two parameters separated by comma in parentheses. The parameter is the symbolic name of the font character (in case of function b, the second parameter is the integer from 1 to 4). Functions return following values: b(char,i) ... i-th value from BoudingBox parameter of char. More exactly: b(char,1) ... left-bottom corner, x coordinate, b(char,2) ... left-bottom corner, y coordinate, b(char,3) ... right-top corner, x coordinate, b(char,4) ... right-top corner, y coordinate. w(char) ... the width of char = b(char,3) - b(char,1). h(char) ... the height of char = b(char,4) - b(char,2). W(char) ... the WX value: The x coordinate of vector of moving the actual typesetting point after rendering the character. The WY value is no supported and it is zero for European languages. k(char1,char2) . the kern value of the pair char1 char2 (zero, if kern does not exist). Examples can be found in cscorr.tab file. The variables CapHeight, XHeight, Ascender and Descender have known values at the start od processing. Of course, these quantities must be set in the input afm file. !! The important: Spaces can be written in an expression only in define-variable line. Spaces have a special meaning as delimiter of expression in another types of lines and they must not be present in expression in this situation. b) The composite character definition ------------------------------------- The line defines one composite character by a syntax similar as in afm format. There can be used one of three prefixes in the line: NC ... New composite. If the character exists in the input afm (defined as composite character or natural character), the original definition takes precedence. RC ... Rewrite Composite. If the character exists in the input afm defined as composite character, the new definition rewrite the old one. If the character is defined as natural character, the old definition takes precedence. !C ... Rewrite Composite. The new definition is used in any circumstance. The symbolic name of the character follows after the prefix. Next the number of elements (at the most 10) is written. The PCC symbol follows after semicolon. Next there is the symbolic name of the first (main) element of composite character and two numeric parameters. All objects are separated by spaces. The same information repeats for another elements of composite character and are separated by semicolon (spaces must be before and after semicolon). All numeric parameters (with exception of number of elements) can be written as a restricted expression (see above). The expressions are separated by spaces. You can replace the PCC symbol by following symbols: PAC, PCT and PAT. The symbol declare the way of interpretation of next numeric parameters. The shift of the element with respect to origin is given by the numeric parameters. The P in symbol is constant letter, the first C stands for no re-interpretation of first numeric parameter (x coordinate) and the second C stands for y coordinate. If the first C is replaced by A (axis), the element is shifted (in x coordinate) with respect to the axis of the main (first) element. For example, "PAC caron 0 350" means the axis of the caron accent will be equivalent to the axis of the main character (see the first zero parameter) and shifted up by 350. The positive number in first numeric parameter means to shift to the right, the negative one means to shift to the left. Now we define the "axis" of character exactly. Suppose the WX, WY vector of the character with the starting point on origin. The axis of the character cross this vector in its middle point with the angle 90+ItalicAngle degrees. If the second C is replaced by T (top), the element is shifted (in y coordinate) with respect to top. In this case, the second numeric parameter gives the position of the top border of the element. For example, the value of Acutetop is used for each accents in cscorr.tab in order to save the same height position of accents without respect of different height of primary characters (O is some higher than E for optical illusion correction). If the symbol PAC, PCT or PAT is used, program calculates the element position and writes all numeric information using PCC in the output. !! Notice: Spaces are significant. Any space cannot be removed (for example before or after semicolon). Two or more consecutive spaces are interpreted as one space. You can place more spaces in order to obtain more readable listing of description file. c) The WX parameter correction ------------------------------ Typesetting systems uses a parameter of shifting the actual point during typesetting each character. This parameter cannot have a connection to BoudingBox of the character. The shift of the actual typesetting point in the x-coordinate direction is given by WX for each character in afm format and the shift in the y-coordinate direction is zero for European languages. For example, the WX is converted into TeX metric as the width of the character box. The height and depth of box are calculated (by afm2tfm) from BoudingBox information and they are rounded to at most 16 different amounts for both (depth and height) by the vptovf program. This limitation is a feature of the tfm format. The WX parameter from first (main) element is used as new WX parameter for new composite character. Program a2ac calculates a new BoudingBox information of composite character from BoudingBox parameters of each element. Program calculates minimum or maximum of appropriates parameters. The feature of copying the WX data from WX parameter of first (main) element can be problematic in some cases. For example, we have d' as one character of alphabet. The composite construction for this character is needed, but the width of the result is some greater than the width of character d itself. Therefore, it is possible to correct the WX parameters of (usually new defined) characters by the line with RWX prefix. The line has the following format: RWX name expression where the "RWX" is a prefix, "name" is the symbolic name of the character and "expression" is a limited expression which returns the new value of WX for named character. You can use the W(name) function in the expression. This feature gives possibility to calculate new WX parameter from old one. d) Reducing the number of kern data ----------------------------------- The line has the form: ReduceKern expression If absolute value of kern in kern-pair is less or equal to "expression" value, the information about this kern-pair will be removed. The "expression" is limited expression and includes no spaces. This command is senseful for zero or small amounts of kerns. These data are redundant and takes place in font metrics for typesetting systems. It is useful to write this command at two points in description file. First at start (to save the time and to reduce the number of new computed kerns) and second at the end (to remove the new redundant kern pairs). It is sufficient to place the ReduceKern command at the end only. It is recommended to write at least "ReduceKern 0". e) New kerning information -------------------------- To define new kern information you can write a line with one of two prefixes: NK ... New Kern. If the kern info for given pair exists, it is unchanged. RK ... Rewrite Kern. New kern info can rewrite the old one. The contents of the line can vary, see next paragraphs (i) to (v). (i) Fixed definition of one kern pair has a form: ------------------------------------------------- prefix first second expression where "prefix" is NK or RK, "first" is the name of the first character of pair and "second" stands for second character. The "expression" (limited expression without spaces) gives the kern value. (ii) Definition of one kern pair by another kern pair value ----------------------------------------------------------- The line has the form: prefix first second : third fourth expression where the pair of characters "first" and "second" will take the value of kern pair "third", "fourth" advanced by a value of the "expression". The "expression" (limited expression without spaces) can be omitted. In such case none is advanced. The symbol "*" can be written instead of the name of "third" and/or "fourth". The appropriate name from "first" and "second" will be substituted. ("first"->"third" and "second"->"fourth"). For example: NK Anew B : A * is the same as: NK Anew B : A B If we want to advance the kern value to the same kern pair, we can write (for example): RK A B : * * +c (iii) More new kern data by pattern: ------------------------------------ The line has the same form as above (ii), but the symbol "*" is written instead of "first" exclusive or "second". If the "first" is replaced, the "third" must be replaced too and if the second is replaced, the "fourth" must be replaced too. You can write the "*" or the "." instead names. The replaced symbol ("*" or "." must be the same). The "*" stands for all characters and the "." stands for the lowercase characters only. If the "first" is replaced (by "*" for example), the new kerns are calculated from all values of kern pairs of type "*" "fourth". If the "second" is replaced, the kern pair data of type "third" "*" will be used. The example shows the algorithm better: Let the kern pair values are stored for pairs A b and A C and A d. Let more kern data of type A * don't exist. In such case the line NK Anew * : A * is equivalent to NK Anew b : A b NK Anew C : A C NK Anew d : A d and the line NK Anew . : A . is equivalent to NK Anew b : A b NK Anew d : A d Left from the colon, there is possible only one symbol "*" or "." but right of the colon, there is possible two such symbols. The algorithm from (ii) will be performed first. For example the line RK Anew * : * * +c is equivalent to RK Anew * : Anew * +c i.e all kerns of type Anew * will be enlarged by c. (iv) One line instead two ones ----------------------------- The line prefix new : old expression is equivalent to two lines prefix new * : old * expression prefix * new : * old expression The expression can be omitted. It means the character new takes the same kerns as the character old have. (v) List of names instead single name ------------------------------------- You can replace the name of character by list of names separated by commas at several positions. The list is closed in parenthesis and has no spaces and further lists. The positions, you can replace a name by a list, are summarized by the following table. The "list" stands for position where the list is possible and the "single" stands for position where the list is prohibited. The table summarizes all syntactic constructions of lines for definitions new kerns. (i) prefix list list expression (ii) prefix list list : single single expression prefix list list : single * expression prefix list list : * single expression prefix list list : * * expression (iii) prefix list * : single * expression prefix list * : * * expression prefix * list : * single expression prefix * list : * * expression (iv) prefix list : single You can replace the "*" by "." in table (iii). You can see, the list is prohibited after colon. The list is expanded to more lines with single names before the algorithms (ii), (iii) and (iv) is done. If there are two lists in single line, the expansion is done in two levels. For example: NK (A,B,C) (x,y) : * one is expanded to six lines: NK A x : * one NK A x : A one NK A y : * one NK A y : A one NK B x : * one and it is the same as (see ii): NK B x : B one NK B y : * one NK B y : B one NK C x : * one NK C x : C one NK C y : * one NK C y : C one Ligatures ========= Information about ligatures is written in input afm file at the end of lines with C prefix. For example: C 102 ; WX 333 ; N f ; B 20 0 383 683 ; L i fi ; L l fl ; This information is completely rewritten to output afm file and it is sufficient for ligtable generation for typesetting systems. Really, the afm2tfm program (for example) reads these data and generates appropriate information in ligtable of tfm format. In addition, it is possible to declare new ligatures (specially for TeX). For example the line: % LIGKERN hyphen hyphen =: endash ; endash hyphen =: emdash ; in the file xl2.enc defines new ligatures for "--" and "---". The notice about cscorr.tab file ================================ The names dquoteright and tquteright would be used for characters d' and t'. We use the names dcaron and tcaron instead. The reason is, the characters d' and t' has its uppercase alternative \v{D} and \v{T}, i.e Dcaron and Tcaron respectively. The parameter -V (for afm2tfm to make the small caps variant of a font) does not work for names dquoteright and tqouteright. We use Lcaron instead Lquoteright, because the semantic of these accents is the same as for Dcaron, dcaron, Tcaron and tcaron. The notice to xl2.enc file ========================== The file defines CSencoding vector which is superset of CS-font encoding. The CS-font encoding is superset of Computer Modern text font encoding (by the norm ISO-8859-2). There are little exceptions. The Computer Modern text font encoding is not definite. Two alternatives of characters are present at some positions depending on lig/non-lig font (fi or downarrow for example) and on rm/it font (the dollar or sterling). It is recommended to use two *.enc files: xl2.enc for fonts with ligatures and xt2.enc for typewriter-like fonts. The dollar is at position 36 in every circumstances and the sterling has a position number 132. The position number 32 in xl2.enc file is not defined because the cross for Polish L and l is not included in Adobe StandardEncoding. The \L and \l itself are included at positions 163 and 179 respectively. Note, the three-letters ligatures and uppercase greek letters are not present in Adobe StandardEncoding but they are present in Computer Modern fonts and in CS-fonts. If you want to typeset the math by PostScript font, you have to edit the *.vpl file to include the uppercase Greek characters from Symbol font (usually). Sorry, it is not obvious to typeset the math by PostScript font; some more hack must be done (at plainTeX macro level, for example). The three-letters ligatures are not used in czech but it is possible to use them. You can use some hack at *.vpl level and use the PostScript font coded by ExpertEncoding vector. For more information of czech font encoding see the [3] and the appendix F in [2]. History ======= Version 0 - The program was created and placed on anonymous ftp for Czech and Slovak TeX vizards. Only czech documentation was done. Version 1 - New format of description file serves an arbitrary order of commands. The old format wok if and only if the >> prefixes are added to variable definition lines. - More formats of kern definition line are possible ("lists" for example). - The function "k" for kern value introduced. - Some bugs removed: . The unstability of unix-compiled program on input files in DOS format is corrected (the ^M character at the end of line is ignored). . If the field "Composites" is not present in input afm, it is created in output. . If an error occurs in input, the output is not touched. . The Czech documentation corrected and (pseudo) English added. . The cscorr.tab is rewritten. Some data are added to this file. Reference ========= [1] Donald Knuth: Virtual fonts, a more fun for grand wizards. TUGboat 11(1):13--23, April 1990. [2] Petr Ol\v{s}\'ak: Typografick\'y syt\'em TeX (Typesetting System TeX). CSTUG 1995, 270 pages. ISBN 80-901950-0-8. [3] Petr Ol\v{s}\ak: \'Uvaha o fontech v CSTeXu (A Reflection about fonts in CSTeX). TeXbulletin 3/93 (121--131).