generate patterns for TeX hyphenation
dictionary_file pattern_file patout_file
add an example, a script, a trick and tips
patch ................. patch
patgen .................. TeX
pathchk ............ sh-utils
pathto ................ smail
page is not meant to be exhaustive. See also the Info file
or manual Web2C: A TeX implementation.
patgen program reads the dictionary_file
containing a list of hyphenated words and the
pattern_file containing previously-generated patterns
(if any) for a particular language (not a complete TeX
source file; see below), and produces the patout_file
with (previously- plus newly-generated) hyphenation patterns
for that language. The translate_file defines
language specific values for the parameters
left_hyphen_min and right_hyphen_min used by
TeX’s hyphenation algorithm and the external
representation of the lower and upper case version(s) of all
`letters’ of that language. Further details of the
pattern generation process such as hyphenation levels and
pattern lengths are requested interactively from the
user’s terminal. Optionally patgen creates a
new dictionary file pattmp.n showing the good
and bad hyphens found by the generated patterns, where
n is the highest hyphenation level.
generated by patgen can be read by initex for
use in hyphenating words. For a real-life example of
patgen’s output, see
contains the patterns TeX uses for English by default. At
some sites, patterns for (many) other languages may be
available, and the local tex programs may have them
must be complete; no adding of default extensions or path
searching is done.
The original hyphenation patterns for English, by Donald Knuth
and Frank Liang.
Maximal hyphenation patterns for English, extended by Gerard
Patterns and support for many other languages
When initex digests hyphenation patterns, TeX first
expands macros and the result must entirely consist of digits
(hyphenation levels), dots (`.’, edge of a word), and
letters. In pattern files for non-English languages letters are
often represented by macros or other expandable constructs. For
the purpose of patgen these are just character sequences,
subject to the condition that no such sequence is a prefix of
A dictionary file contains a weighted list of hyphenated words,
one word per line starting in column 1. A digit in column 1
indicates a global word weight (initially =1) applicable to all
following words up to the next global word weight. A digit at
some intercharacter position indicates a weight for that position
The hyphens in a word are indicated by `-’, `*’, or
`.’ (or their replacements as defined in the translate
file) for hyphens yet to be found, `good’ hyphens
(correctly found by the patterns), and `bad’ hyphens
(erroneously found by the patterns) respectively; when reading a
dictionary file `*’ is treated like `-’ and `.’
A pattern file contains only patterns in the format above, e.g.,
from a previous run of patgen. It may not contain any TeX
comments or control sequences. For instance, this is not a valid
% this is a pattern file read by TeX.
It can only contain the actual patterns, i.e., the
A translate file starts with a line containing the values of
left_hyphen_min in columns 1-2, right_hyphen_min in
columns 3-4, and either a blank or the replacement for one of the
"hyphen" characters `-’, `*’, and `.’ in
columns 5, 6, and 7. (Input lines are padded with blanks as for
many TeX related programs.)
Each following line defines one `letter’: an arbitrary
delimiter character in column 1, followed by one or more external
representations of that character (first the `lower’ case
one used for output), each one terminated by the delimiter and
the whole sequence terminated by another delimiter.
If the translate file is empty, the values
left_hyphen_min=2, right_hyphen_min=3, and the 26
lower case letters a...z with their upper case
representations A...Z are assumed.
After reading the translate_file and any
previously-generated patterns from pattern_file,
patgen requests input from the user’s terminal.
First the integer values of hyph_start and
hyph_finish, the lowest and highest hyphenation level for
which patterns are to be generated. The value of
hyph_start should be larger than any hyphenation level
already present in pattern_file.
Then, for each hyphenation level, the integer values of
pat_start and pat_finish, the smallest and largest
pattern length to be analyzed, as well as good weight,
bad weight, and threshold, the weights for good and
bad hyphens and a weight threshold for useful patterns.
Finally the decision (`y’ or `Y’ vs. anything else)
whether or not to produce a hyphenated word list.
Frank Liang and
Peter Breitenlohner, patgen.web.
Word hy-phen-a-tion by com-puter, STAN-CS-83-977,
Stanford University Ph.D. thesis, 1983,
Knuth, The TeXbook, Addison-Wesley, 1986, ISBN
0-201-13447-0, Appendix H.
wrote the first version of this program. Peter Breitenlohner
made a substantial revision in 1991 for TeX 3. The first
version was published as the appendix to the TeXware
technical report. Howard Trickey originally ported it to