1 %#!lualatex ajt-devel-ltja
4 %%% Packages used in this paper
8 \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipam.ttf:jfm=ujis}{}
9 \DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=ujis}{}
10 % quick hack: monospaced Japanese font by \ttfamily
11 \DeclareKanjiFamily{JY3}{\ttdefault}{}{}
12 \DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=mono}{}
14 %%% for LTXexample environment
15 \usepackage{showexpl,lltjlisting}
16 \lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em}
18 \usepackage{mflogo,booktabs}
19 \definecolor{gray10}{gray}{0.9}
21 %%% Verbatim environment
23 \CustomVerbatimEnvironment{code}{Verbatim}%
24 {numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
25 \CustomVerbatimEnvironment{codewithoutnum}{Verbatim}%
26 {xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
27 \CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}%
28 {xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize}
31 %%% Mandatory article metadata %%%
32 \title{Development of the \LuaTeX-ja package}
33 \author{Hironori Kitagawa {\normalsize 北川 弘典}}
34 \address{The \LuaTeX-ja project team}
35 \email{h\_kitagawa2001@yahoo.co.jp}
37 \keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese}
39 The \LuaTeX-ja package is a macro package for typesetting Japanese
40 documents under \LuaTeX. This packages has more flexibility of
41 typesetting than p\TeX, and corrected some unwanted features of p\TeX.
42 In this paper, we describe specifications, the current status and some
43 internal processing methods of \LuaTeX-ja.
46 \newcommand{\parname}[1]{\textsf{#1}}
47 \newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp}
48 \newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi%
49 \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0
50 \smash{\vrule width \wd0 height 0.4pt depth0.4pt}}}}
53 %%% Do not forget to start with \maketitle!
56 \section{Introduction}
58 To typeset Japanese documents with \TeX, ASCII p\TeX~\cite{ptex} has
59 been widely used in Japan. There are other methods---for example, using Omega
60 and OTP~\cite{omega}, or with the CJK package---to do so, however,
61 these alternative methods did not become a majority. On the one hand,
62 p\TeX\ enables us to produce high-quality documents.
64 On the other hand, p\TeX\ is left behind from the extensions of \TeX\
65 such as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In
66 recent years, the situation become better, because of developments
67 of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
68 $\varepsilon$-p\TeX~\cite{eptex} by the author,~and up\TeX~\cite{uptex}
69 by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, to develop
70 an engine extension localized for Japanese, is not wise. This approach
71 needs lots of work for \emph{each} engine, and \LuaTeX\ has an ability
72 to hook \TeX's internal process by using Lua callbacks.
75 There were several experimental attempts to typeset
76 Japanese documents with \LuaTeX\ before. Here we cite three examples:
78 \item |luaums.sty|~\cite{luaums} developed by the author. This
79 experimental package is for creating a certain Japanese-based presentation
81 \item the \emph{luajalayout} package~\cite{luajalayout}, formerly known as the
82 \emph{jafontspec} package, by Kazuki Maeda (前田一貴). This package is based on
83 \LaTeXe\ and \emph{fontspec} package.
84 \item the \emph{luajp-test} package~\cite{luajp-test}, a test package made by
85 Atsuhito Kohda (香田温人), based on articles on the web page~\cite{joylua}.
87 However, these packages are based on \LaTeXe, and do not have much
88 ability to control the typesetting rule. And it is inefficient that more
89 than one people separately develop similar packages. Development of the
90 \LuaTeX-ja package is started initially by the author and Kazuki Maeda, because of
93 \subsection{Development Policy of \LuaTeX-ja}
95 The first aim of the \LuaTeX-ja project is to implement features (from the
96 'primitive' level) of p\TeX\ as macros under \LuaTeX, so \LuaTeX-ja is
97 much affected by p\TeX. However, as development proceeds, some
98 technical/conceptual difficulties are arisen. Hence we changed the aim
99 of the project as follows:
101 \item\emph{\LuaTeX-ja offers at least the same flexibility of
102 typesetting that p\TeX\ has.}
104 We think that the ability of producing outputs conformed to
105 JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for
106 typesetting, or to a technical note~\cite{w3c} by W3C is not enough;
107 if one wants to produce very incoherent outputs for some reason, it
109 In this point, previous attempts of Japanese typesetting with \LuaTeX\
110 which we cited in the previous subsection are inadequate.
112 p\TeX\ has some flexibility of typesetting, by changing internal
113 parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using
114 custom JFM (Japanese TFM).
116 \item\emph{\LuaTeX-ja isn't mere re-implementation or porting of p\TeX;
117 some (technically and/or conceptually) inconvenient features of
118 p\TeX\ are modified.}
120 We describe this point in more detail at the next section.
124 \subsection{Contents of this Paper}
125 Here we describe the contents of the rest of this paper briefly. In
126 Section~2, we describe major differences between p\TeX\ and \LuaTeX-ja.
127 In Section~3, we show the current status of the \LuaTeX-ja package. In
128 Section~4, we describe some internal routines of \LuaTeX-ja. We hope
129 that the materials in this section have good applications.
131 \subsection*{About the Project}
132 This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
134 \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage}. There is
135 no stable version at Oct.\ 11, 2011, however the development source can be
136 obtained from the git repository. Members of the project are as follows
137 (in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
138 Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda,
142 \section{Major differences with \pTeX}
143 In this section, we look at several major differences between p\TeX\
144 and our \LuaTeX-ja. For general information of Japanese typesetting and the
145 overview of p\TeX, please see Okumara~\cite{ptexjp}.
148 \subsection{Names of Control Sequences}
149 \label{ssec-csname} Since p\TeX\ is an engine modification of Knuth's
150 original \TeX82 engine, some primitives added by it take a form that is
151 very difficult to be simulated by a macro. For example, an additional
152 primitive |\prebreakpenalty|$\langle\hbox{\it
153 char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in p\TeX\
154 sets the amount of penalty inserted before a character whose code is
155 $\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it
156 penalty}\rangle$, and |\prebreakpenalty|$\langle\hbox{\it
157 char\_code}\rangle$ can be also used for retrieving the value.
159 Moreover, there are some parameters which values of them at the end of a
160 horizontal box or that of a paragraph are effective in whole box or
161 paragraph. These parameters were implemented as additional internal
162 parameters in \pTeX. However, the implementation of these parameters in
163 \LuaTeX-ja is not so easy; we will discuss on it in
164 Subsection~\ref{ssec-stack}.
166 From above 2~problems we discussed above, the assignment and retrieval
167 of most parameters in \LuaTeX-ja are summarized into the following
170 \item |\ltjsetparameter{|$\langle\hbox{\it
171 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
173 \item |\ltjglobalsetparameter|: for global assignment. These two control
174 sequences obey the value of |\globaldefs| primitive.
175 \item |\ltjgetparameter{|$\langle\hbox{\it
176 name}\rangle$|}[{|$\langle\hbox{\it optional
177 argument}\rangle$|}]|: for retrieval. The returned value is always
181 \subsection{Line Break after a Japanese Character}
184 Japanese texts can break lines almost everywhere, in contrast with
185 alphabetic texts can break lines only between words (or use
186 hyphenation). Hence, p\TeX's input processor is modified so that a
187 line break after a Japanese character doesn't emit a space. However,
188 there is no way to customize the input processor of \LuaTeX, other than
189 to hack its CWEB-source. All a macro package can do is to modify an input line before
190 when \LuaTeX\ begin to process it, inside the |process_input_buffer|
193 Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this
194 purpose) will be appended to an input line, if this line ends with a Japanese
195 character\footnote{Strictly speaking, it also requires that the catcode
196 of the end-line character is 5~(\emph{end-of-line}). This condition is
197 useful under the verbatim environment.}. One might jump to a conclusion
198 that the treatment of a line break by p\TeX\ and that of \LuaTeX-ja are
199 totally same, however they are different in the respect that \LuaTeX-ja's
200 judgement whether a comment letter will be appended the line is done
201 \emph{before} the line is actually processed by \LuaTeX.
203 Figure~\ref{fig-linebreak} shows an example of this situation; the
204 command at the first line marks most of Japanese characters as
205 non-Japanese characters. In other words, from that command onward, the
206 letter `あ' will be treated as an alphabetic character by
207 \LuaTeX-ja. Then, it is natural to occur a space between `あ' and `y' in
208 the output, where the actual output in the figure does not so. This is
209 because `あ' is considered to be a Japanese character by \LuaTeX-ja,
210 when \LuaTeX-ja does a decision whether U+FFFFF will be added to the
216 \ltjsetparameter{jacharrange={-6}}xあ
219 \caption{A notable sample showing the treatment of a line break after a
220 Japanese character.}\label{fig-linebreak}
223 \subsection{Separation between `real' fonts and Metrics}
226 Traditionally, most Japanese fonts used in typesetting are not
227 proportional, that is, most glyphs have same size (in most cases,
228 square-shaped). Hence, it is not rare that the contents of different
229 JFMs are totally same, and only differ in their names. For example,
230 |min10.tfm| and |goth10.tfm|, which are JFMs shipped with p\TeX\ for
231 seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family,
232 differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and
233 |jisg.tfm|, which consists a parts of \emph{jis} font metric, which is
234 used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦),
235 are totally same as binary files. Considering this situation, we
236 decided to separate `real' fonts and metrics used for them in
237 \LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain
238 \TeX\ are shown in Figure~\ref{fig-jfdef}.
240 \item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
241 \item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so
242 |file:| and |name:| prefixes, and various font features can be
243 used as the line~1 in Figure~\ref{fig-jfdef}.
244 \item The |jfm| key specifies the metric for the font. In
245 Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a
246 Lua script named |jfm-ujis.lua|. This metric is the standard
247 metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf}
249 \item The |psft:| prefix can be used to specify name-only, non-embedded
250 fonts. When one display a pdf with these fonts, actual fonts which
251 will be used for them depend on a pdf reader.
253 The specification of a metric used in \LuaTeX-ja is similar to that of a
254 JFM (see \cite{ptexjp}); characters are grouped into several classes,
255 the size information of characters are specified for each class, and
256 glue/kern insertions are specified for each pair of classes. Although
257 the author have not tried, it may be possible to develop a program that
258 `converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three
259 metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the
260 \emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|.
262 Note that |-kern| in features
263 is important, since kerning information from real font itself will
264 clash with glue/kern informations from the metric.
268 \jfont\foo=file:ipam.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt
269 \jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt
271 \caption{Typical declarations of Japanese fonts.}
275 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Timing}
278 As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
279 processes are totally different from those of \TeX82. \TeX82's process is
280 done just when a (sequence of) character is appended to current
281 list. Thus we can interrupt this process by writing as
282 |f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the
283 process will be done when a horizontal box or a paragraph is ended, so
284 |f{}irm| and |firm| yield same outputs under \LuaTeX.
286 The situation for Japanese characters is more complicated.
287 Glues (and kerns) which are needed for Japanese
288 typesetting will be divided into the following three categories:
290 \item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue},
293 \item Default glue between a Japanese character and an alphabetic
294 character (\emph{xkanjiskip}, for short), usually 1/4 of
295 full-width (\emph{shibuaki}) with some stretch and shrink for
296 justifying each line.
297 \item Default glue between two consecutive Japanese characters
298 (\emph{kanjiskip}, for short). The main reason of this glue is to
299 enable breaking lines almost everywhere in Japanese texts. In most
300 cases, its natural width is zero, and some stretch/shrink for
301 justifying each line.
303 In p\TeX, these three kinds of glues are treated differently. A JFM glue
304 is inserted when a (sequence of) Japanese character is appended to
305 current list, same as the case of alphabetic characters in \TeX82. This
306 means that one can interrupt the insertion process by saying |{}|. A
307 \emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a
308 paragraph; this timing is somewhat similar to that of \LuaTeX's kerning
309 process. Finally, A \emph{kanjiskip} is not appeared as a node anywhere;
310 only appears implicitly in calculation of the width of a horizontal box,
311 that of breaking lines, and the actual output process to a DVI
312 file. These specifications made p\TeX's behavior very hard to
315 \LuaTeX-ja inserts glues in all three categories simultaneously inside
316 |hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of
317 this specification are to behave like alphabetic characters in \LuaTeX\
318 (as described in the first paragraph), and to clarify the specification
319 for \LuaTeX-ja's process.
321 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Spec}
323 \caption{Examples of differences between p\TeX\ and \LuaTeX-ja,}
326 \begin{tabular}{llllllll}
328 &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\
329 Input &|あ】{}【〙\/〘| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule
330 p\TeX &あ】\hbox{}【〙\hbox{}〘&い』\/a &う)\hbox{}( &え]\hbox{}[\\
331 \LuaTeX-ja &あ】{}【〙\/〘 &い』\/a &う)\hbox{}( &え]\special{}[\\
339 \fontsize{40}{40}\selectfont
341 \imagfm{\jstrut 】\inhibitglue}%
342 \imagfm{\jstrut\kern.5\zw}%
343 \imagfm{\jstrut\kern.5\zw}%
344 \imagfm{\jstrut\inhibitglue【}%
345 \imagfm{\jstrut 〙\inhibitglue}%
346 \imagfm{\jstrut\kern.5\zw}%
347 \imagfm{\jstrut\kern.5\zw}%
348 \imagfm{\jstrut\inhibitglue〘}%
350 \caption{Detail of (1) in Table~\ref{tab-jfmglue}.}
354 Now we will take a look inside the insertion process itself, and describe 4~points.
358 As noted in the previous subsection, the insertion process in p\TeX\ can be
359 interrupted by saying |{}| or anything else\footnote{This is
360 why some tricks like \texttt{ちょ\char`\{\char`\}っと} that
361 are needed when we use \texttt{min10.tfm} work.}. This leads
362 the second row in Table~\ref{tab-jfmglue}, or
363 Figure~\ref{fig-ptexjfm}. `The process is interrupted' means
364 that p\TeX\ does not think the letter `】\inhibitglue' is
365 followed by `\inhibitglue【', hence two half-width glues are
366 inserted between between `】\inhibitglue' and `\inhibitglue【',
367 where one is from `】\inhibitglue' and another is from
370 On the other hand, in \LuaTeX-ja, the process is done inside
371 |hpack_filter| and |pre_linebreak_filter| callbacks. Hence,
372 \emph{anything that does not make any node will be
373 ignored}\ in \LuaTeX-ja, as shown in (1) in
374 Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes
375 which does not make any contribution to current horizontal
376 list---\emph{ins\_node}, \emph{adjust\_node},
377 \emph{mark\_node}, \emph{whatsit\_node} and
378 \emph{penalty\_node}---, as shown in (4).
380 By the way, around a \emph{glyph\_node} $p$ there may be some nodes
381 attached to $p$. These are an accent and kerns for
382 positioning it, and a kern from italic
383 correction\footnote{\TeX82 (and \LuaTeX) does not distinguish
384 between explicit kern and a kern for italic correction. To
385 distinguish them, \LuaTeX-ja uses an additional attribute and
386 redefines \texttt{\char`\\/}.} for $p$. It is natural that
387 these attachments should be ignored in the process. Hence
388 \LuaTeX-ja takes this approach, as the latest version of
389 p\TeX\ (p3.2). This explains (2) in the figure.
391 Summerizing above, one should put an empty horizontal box |\hbox{}| to
392 where he wants to interrupt the insertion process in
393 \LuaTeX-ja as (3) in the figure.
395 \item[Fonts with the Same Metric]
396 Recall that \LuaTeX-ja separated `real' fonts and metrics, as in Subsection~\ref{ssec-sepmet}.
397 Consider the following input, where all Japanese fonts use same metric
398 (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family for
399 the current Japanese font family:
405 If the above input is processed by p\TeX, since the insertion process is
406 interrupt by |\gt|, the result looks like
408 \mc 明朝)\hbox{}\gt (ゴシック
410 However this seems to be unnatural, since two Japanese fonts in the
411 output uses the same metric, \emph{i.e.}, the same
412 typesetting rule. Hence, we decided that Japanese fonts with
413 the same metric are treated as one font in the insertion
414 process of \LuaTeX-ja. Thus, the output from the above input
419 One might have the situation that this default behavior is not
420 suitable. \LuaTeX-ja offers a way to cope with this case, but
421 we leave it to the manual~\cite{man}.
423 \item[Fonts with Different Metrics]
424 In the case where two consecutive Japanese characters use different metrics and/or
425 different size is similar. Consider the following input where
426 the \emph{mincho} family and the \emph{gothic} family use
433 As the previous paragraph, this input yields the following, by p\TeX:
435 \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大
437 We thought that amounts of spaces between parentheses in above output
438 are too much. So we changed the default behavior of
439 \LuaTeX-ja so that the amount of a glue between two Japanese
440 characters with different metrics is the average of a glue
441 from the left character and that from the right
442 character. For example, Figure~\ref{fig-diffmet} shows the
443 output from above input. The width of glue indicated `(1)' is
444 $(a/2 + a/2)/2 = 0.5a$, and the width of glue indicated `(2)'
445 is $(a/2 + 1.2a/2)/2 = 0.55a$. This default behavior can be
446 changed by \textsf{diffrentmet} parameter of \LuaTeX-ja.
450 \fontsize{40}{40}\selectfont
451 \imagfm{\jstrut\smash{%
452 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr漢\cr
453 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\
454 \hrulefill\vrule height .5ex depth .5ex\cr}}}}%
455 \imagfm{\jstrut )\inhibitglue}%
456 \imagfm{\jstrut\hbox to .5\zw{\hss\normalsize (1)\hss}}%
457 \imagfm{\jstrut\inhibitglue\gt (}%
458 \imagfm{\jstrut\gt 漢}%
459 \imagfm{\jstrut\gt )\inhibitglue}%
460 \imagfm{\jstrut\hbox to .55\zw{\hss\normalsize (2)\hss}}%
461 \imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}%
462 \imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{%
463 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr
464 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$1.2a$}\
465 \hrulefill\vrule height .5ex depth .5ex\cr}}}}
467 \caption{Fonts with different metrics.}
471 \item[\emph{kanjiskip} and \emph{xkanjiskip}]
472 In p\TeX, the value of \emph{xkanjiskip} is controlled by a skip named
473 |\xkanjiskip|. A defect of this implementation is that the
474 value of \emph{xkanjiskip} is not connected with the size of
475 the currnt Japanese font. It seems that |EXTRASPACE|,
476 |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are
477 reserved for specifying the default value of
478 \emph{xkanjiskip} in a unit of the design size, but p\TeX\
479 did not use these parameters.
481 Considering this situation of p\TeX, \LuaTeX-ja can use the value of
482 \emph{xkanjiskip} that specified in a metric. If the value of
483 \emph{xkanjiskip} on the user side (this is the
484 \textsf{xkanjiskip} parameter in |\ltjsetparameter|) is
485 |\maxdimen|, then the \LuaTeX-ja use the specification from
486 the current used metric as the actual value of
488 This description also applies for \emph{kanjiskip}.
492 \section{Current Status of Development}
493 At the moment, \LuaTeX-ja can be used under plain \TeX, and under
494 \LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by |\input|
495 command or |\usepackage| (in~\LaTeXe), if you merely want to typeset
496 Japanese character. We look more detail by parts.
498 \subsection{`Engine Extension'}
499 The lowest part of \LuaTeX-ja corresponds the p\TeX\ extension as
500 \emph{an engine extension of \TeX}. We, the project menbers, think that
501 this part is almost done. Other features of \LuaTeX-ja which we have not
502 described are the followings:
504 \item[Setting the Range of `Japanese characters'] This feature is
505 inspired by up\TeX. up\TeX\ has an additional primitive named
506 |\kcatcode| for setting how a character is treated among an
507 alphabetic character, \emph{kana}, \emph{kanji},
508 \emph{Hangul}, or~\emph{an other CJK character}. and the
509 assignment of |\kcatcode| can be done by a Unicode
510 block\footnote{There are some exceptions. For example,
511 U+FF00--FFEF (Halfwidth and Fullwidth Forms) are divided into
512 three blocks in up\TeX.}.
514 \LuaTeX-ja uses a slightly different approach. Because there are many
515 Unicode blocks already in Basic Multilingual Plane which are
516 not included in most Japanese fonts, so it would be
517 inefficient to toggle by a Unicode block. Furthermore, the
518 basic Japanese character set JIS~X~0208 are not just union of
519 Unicode blocks; for example, the intersection of JIS~X~0208
520 and Latin-1 Supplement is shown in Table~\ref{tab-inter}.
521 Considering these two points, to customize the range of
522 Japanese characters in \LuaTeX-ja, one has to define
523 character ranges in his source in advance.
525 \item[Shifting Baseline]
526 In order to make a match between Japanese fonts and alphabetic fonts,
527 sometimes shifting the baseline of alphabetic characters may
528 be needed. p\TeX\ has a dimension |\ybaselineshift|, which
529 corresponds the amount of shifting down the baseline of alphabetic
530 characters. This is useful for Japanese-based documents, but
531 not for documents mainly in languages with alphabetic
534 Hence, \LuaTeX-ja extends p\TeX's |\ybaselineshift| to Japanese
535 characters. Namely, \LuaTeX-ja offers two parameters,
536 \textsf{yjabaselineshift} and \textsf{yalbaselineshift}, for the
537 amount of shifting the baseline of Japanese characters and
538 that of alphabetic characters, respectively.
542 \fontsize{40}{40}\selectfont\fboxsep0mm
543 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
544 \hbox to 0.9\linewidth{%
546 \raise-10pt\imagfm{\jstrut 漢}%
547 \raise-10pt\imagfm{\jstrut 字}\hskip.25\zw%
552 \imagfm{\jstrut 字}\hskip.25\zw%
553 \raise-10pt\imagfm{p}%
554 \raise-10pt\imagfm{h}%
559 \caption{First example of shifting baseline.}
565 \fontsize{30}{30}\selectfont\fboxsep0mm
566 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
567 \hbox to 0.9\linewidth{%
570 \imagfm{b}\hskip.25\zw%
572 \imagfm{\jstrut 文}\hskip.33333\zw%
573 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut\inhibitglue (}%
574 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 注}%
575 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 釈}\hskip.1666667\zw%
576 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont c}%
577 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont o}%
578 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
579 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
580 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont e}%
581 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont n}%
582 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont t}%
583 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut )\inhibitglue}%
591 \caption{Second example of shifting baseline.}
595 An example output is shown in Figure~\ref{fig-bls}. The left half is the
596 output when \textsf{yjabaselineshift} is positive, hence the
597 baseline of Japanese characters is shifted down. On the other
598 hand, the right half is the output when
599 \textsf{yalbaselineshift} is positive, hence the baseline of
600 alphabetic characters is shifted. Figure~\ref{fig-small}
601 shows an intresting use of these parameters.
604 Note that \LuaTeX-ja doesn't support for vertical typesetting, \emph{tategaki}, for now.
607 \caption{Intersection of JIS~X~0208 and Latin-1 Supplement.}
610 \begin{tabular}{llll}
623 \subsection{Patches for plain \TeX\ and \LaTeXe}
624 p\TeX\ has a patch for plain \TeX, namely |ptex.tex|, that for \LaTeXe\
625 macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and
626 |kinsoku.tex| which includes the default setting of \emph{kinsoku
627 shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except
628 the codes related to vertical typesetting, since \LuaTeX-ja doesn't
629 support vertical typesetting yet. We remark two points related to the
632 \item[Default Range of Japanese Characters]
633 As described in the previos subsection, \LuaTeX-ja can customize the
634 range of Japanese characters. \LuaTeX-ja predefines 8~character ranges,
635 as shown in Table~\ref{tab-chrrng}. Almost of these ranges are just the
636 union of Unicode blocks, and determined from the Adobe-Japan1-6 character
637 correction, and JIS~X~0208. And, among these 8~ranges, the ranges~2, 3, 6, 7,
638 and~8 are considered ranges of Japanese characters, and others are
639 considered ranges of alphabetic characters.
641 This default setting is suitable for Japanese-based documents, however it
642 causes that other packages which use Unicode fonts do not work
643 correctly. For example, |\times| provided by the
644 |unicode-math| package is the character U+00D7, which belongs
645 to the range~8, and |\textendash| provided by the |EU2|
646 encoding used in the \emph{fontspec} package is the
647 character U+2013, which belongs to the range~3. hence, these
648 charatcer cannot be typeset with the default range setting.
651 \caption{Predefined ranges in \LuaTeX-ja}
654 \begin{tabular}{@{\bf}rl}
655 1&(Additional) Latin characters which is not belonged in the range~8.\\
656 2&Greek and Cyrillic letters.\\
657 3&Punctuations and miscellaneous symbols.\\
658 4&Unicode blocks which does not intersect with Adobe-Japan1.\\
659 5&Surrogates and supplementary private use Areas.\\
660 6&Characters used in Japanese typesetting.\\
661 7&Characters possibly used in CJK typesetting, but not in Japanese.\\
662 8&Characters in Table~\ref{tab-inter}.
668 \item[Behavior of\/ {\tt\char92fontfamily\/}]
669 The control sequence |\fontfamily| in p\LaTeXe\ changes the current alphabetic
670 font family and/or the current Japanese font family,
671 depending the argument. More concretely,
672 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
673 current alphabetic font family to $\langle\hbox{\it
674 arg\/}\rangle$, if and only if one of the following
675 conditions are satisfied:
677 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
678 \emph{some} alphabetic encoding already defined in the document.
679 \item There exists an alphabetic encoding $\langle\hbox{\it
680 enc\/}\rangle$ already defined in the document such that a font
681 definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
682 arg\/}\rangle$|.fd| exists.
684 The same criterion is used for changing Japanese font family.
686 To work this behavior well, a list of all encodings defined already in the
687 document is needed. Since \LuaTeX-ja is loaded as a package,
688 \LuaTeX-ja cannot have this list. Hence \LuaTeX-ja adopted different
689 approach, namely |\fontfamily{|$\langle\hbox{\it
690 arg\/}\rangle$|}| changes the current alphabetic font family
691 to $\langle\hbox{\it arg\/}\rangle$, if and only if:
693 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
694 the current alphabetic encoding $\langle\hbox{\it enc\/}\rangle$.
695 \item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
696 arg\/}\rangle$|.fd| exists.
704 \subsection{Classes for Japanese Documents}
705 To produce `high-quality' Japanese documents, we need not only that
706 Japanese characters are correctly placed, but also class files for
707 Japanese documents. In p\TeX, there are two major families of classes:
708 \emph{jclasses} which is distributed with the official p\LaTeXe\ macros,
709 and \emph{jsclasses}. At the present, \LuaTeX-ja
710 simply contains their counterparts: \emph{ltjclasses} and
711 \emph{ltjsclasses}. However, the policy on classess is not determined
712 now, and we hope to have another family of classes which are useful in
713 commercial printing. In the author's opinion, \emph{ltjclasses} is
714 better to stay as an example of porting of class files for \pTeX\ to
717 \subsection{Patches for Packages}
718 Apart from patches for the \LaTeXe~kernel and classes for Japanese
719 documents, we need to make patches for several packages. At the present,
720 we considered the following packages, and made patches or porting for
721 the former two packages.
724 \item[The \emph{fontspec} package] The \emph{fontspec} package is built
725 on NFSS2, hence control sequences offered by the
726 \emph{fontspec} package, such as |\setmainfont|, are only
727 effective for alphabetic fonts if \LuaTeX-ja is
728 loaded. |luatexja-fontspec.sty| offers these counterparts for
729 Japanese fonts, with additional `j' in the name of control
730 sequences, such as |\setmainjfont|.
732 \item[The \emph{otf} package]
733 This package is widely used for characters which is
734 not in JIS~X~0208, and for using more than one weight in \emph{mincho}
735 and \emph{gothic} font families. Therefore \LuaTeX-ja supports features
736 in the \emph{otf} package, by loading |luatexja-otf.sty|. Note that
737 characters by |\UTF{xxxx}| and |\CID{xxxx}| are not appended to the
738 current list as a \emph{glyph\_node}, so they are not affected by
739 callbacks by the \emph{luaotfload} package. We have another remark; |\CID| does not work
742 \item[The \emph{listings} package]
743 It is well-known that there is a patch of the \emph{listings} package for
744 p\LaTeXe,\ called |jlisting.sty|. Generally speaking, it also
745 can be used in \LuaTeX-ja. However, it seems to be that a
746 Japanese character after a space does not recieve any process
747 of the \emph{listings} package; this is inconvinient when we
748 use the \emph{showexpl} package.
753 \section{Implementation}
754 \subsection{Handling of Japanese Fonts}
755 In p\TeX, there are three slots for maintaining current fonts, namely
756 |\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal
757 direction) and |\tfont| for Japanese font (in vertical direction). With
758 these slots, we can manage the current font for alphabetic characters
759 and that for Japanese characters separately in p\TeX. However, \LuaTeX\
760 has only one slot for maintaining the current font, as \TeX82. This
761 situation leads a problem: how can we maintain the `current Japanese
764 There are three approaches for this problem. One approach is to make a
765 mapping table from alphabetic fonts to corresponding Japanese fonts
766 (here we don't assume that NFSS2 is available). Another approach is
767 that we always use composite fonts with alphabetic fonts and Japanese
768 fonts. The third approach is that the information of the current
769 Japanese font is stored in an attribute. We adopted the third approach,
770 since \LuaTeX-ja is much affected by p\TeX\ as we noted in
771 Subsection~\ref{ssec-pol}.
773 As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining
774 Japanese font, as p\TeX. However, since the information of the current
775 Japanese font is stored into an attribute, control sequences defined by
776 |\jfont| (\emph{e.g.},~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
777 not representing a font by the means of \TeX82. In other words, each of
778 these control sequences is just an assignment to an attribute, therefore
779 they cannot be an argument of |\the|, |\fontname|, or |\textfont|.
781 \subsection{Overview of the Processes}
782 Now we describe an outline of the \LuaTeX-ja's process briefly.
784 \item[Treatment of Linebreaks after Japanese Characters] This part is
785 described already in Subsection~\ref{ssec-line}. Done in the
786 |process_input_buffer| callback.
787 \item[Font Replacement] In the |hyphenate| callback, \LuaTeX-ja looks
788 into for each \textit{glyph\_node}~$p$ in the list. If the
789 character represented by $p$ is considered as a Japanese
790 character, the font used in $p$ is replaced by the value of
791 |\ltj@curjfnt|, `the current Japanese font' at~$p$. Also the
792 character class of the character is looked up at this time.
794 Furthermore the subtype of $p$ is subtracted by 1 to suppress
795 hyphenation around it by \LuaTeX, since later processes of
796 \LuaTeX-ja take care of all things about Japanese charaters.
799 Following processes are all executed in |pre_linebreak_filter| and
800 |hpack_filter| callback. These processes are main routines of \LuaTeX-ja.
803 \item[Examination of Stack Level] The horizontal list which
804 is the content of a horizontal box is traversed,
805 to determine what is the level of \LuaTeX-ja's internal stack at the end
806 of the list. This is needed because of the place of
807 |hpack_filter| callback in the source of \LuaTeX. We will discuss more
808 detail in Subsection~\ref{ssec-stack}.
810 \item[Insertion of Glues/Kerns for Japanese Typesetting]
811 This part is already described at Subsection~\ref{ssec-jglue}.
813 \item[Adjustument of the Position of (Japanese) Characters]
814 We will discuss the detail about this in Subsection~\ref{ssec-width}.
817 The callbacks by the \emph{luaotfload} package, e.g., replacement of
818 glyphs according to font features, are executed just after `Examination
819 of Stack Level' above.
821 \subsection{Stack Management}
824 As we noted in Subsection~\ref{ssec-csname}, parameters that the values
825 at the end of a horizontal box or that of a paragraph are effective in
826 whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented
827 by internal integers or registers of other types in \TeX. We explain it
838 if (cur_list.mode_field == -hmode) {
839 cur_box = filtered_hpack(cur_list.head_field,
840 cur_list.tail_field, saved_value(1),
841 saved_level(1), grp, saved_level(2));
842 subtype(cur_box) = HLIST_SUBTYPE_HBOX;
845 \caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX}
849 Figure~\ref{fig-ltsrc} is an extract of a CWEB-source
850 \texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function
851 is called just when explicit |\hbox{...}| or |\vbox{...}| is ended, and
852 the function |filtered_hpack()| is where the |hpack_filter| and then the
853 actual `hpack' process are performed. Notice that the |unsave()|
854 function is called before |filtered_hpack()|. This is the problem;
855 because of |unsave()|, we can retrive only the values of registers
856 \emph{outside} the box, even in the |hpack_filter| callback.
858 To cope with this problem, \LuaTeX-ja has its own stack system, based on
859 Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose
860 \emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be
861 appended to the current horizontal list each time the current stack
862 level is incremented, and their values are the values of
863 |\currentgrouplevel| at that time. In the beginning of |hpack_filter|
864 callback, the list in question is traversed to determine whether the
865 stack level at the end of the list and that outside the box coincides.
867 Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current
868 stack level, both inside the |hpack_filter| callback, i.e., outside a
869 horizontal box. Consider a list which represents the content of the box,
872 \item A \emph{stack\_node} whose value is $x+1$ (since all materials in
873 the box are included in a group |\hbox{...}|, the value is at
874 least $x+1$) in the list represents an assignment related to the
875 stack system in just top-level of the list, like
878 \hbox{...(assignment)...}
881 In this case, the current stack level is incremented to $y+1$ after the assignment.
882 \item A \emph{stack\_node} whose value is more than $x+1$ in the list represents
883 an assignment inside another group contained in the box. For example,
884 the following input creates
885 a \emph{stack\_node} whose value is $x+3=(x+1)+2$:
888 \hbox{...{...{...(assignment)}...}...}
892 Thus, we can conclude that the stack level at the end of the list is
893 $y+1$, if and only if there is a \emph{stack\_node} whose value is
894 $x+1$. Otherwise, the stack level is just $y$.
896 \subsection{Adjustment of the Position of Japanese Characters}
899 The size of a glyph specified in a metric and that of the real font
900 usually differ. For example, the letter `\inhibitglue【' is half-width
901 in |jfm-ujis.lua| or |jis.tfm|, while this letter is full-width like `【'
902 in most TrueType fonts used in Japanese typesetting, such as
903 IPA~Mincho. Hence the adjustment of position of such glyphs is
904 needed. In the context of p\TeX, this process was performed using virtual fonts.
906 On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph
907 into a horizontal box. There are two main reasons why we adopted this
908 method; one is that we feared Lua codes for coexisting with callback by
909 |luaotfload| package would be large if we use virtual fonts, and the
910 other is to cope with the shifting of the baseline of characters at the
914 \begin{center}\unitlength=9pt\small
915 \begin{picture}(15,12)(-1,-3)
917 \color{gray10}% real glyph
918 \put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
920 \color{black}% real glyph :step1
922 \put(-1,-1.5){\line(0,1){7}\line(0,-1){2.5}}
923 \put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}}
924 \put(-1,5.5){\line(1,0){6}}
925 \put(-1,-4){\line(1,0){6}}
928 \put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}}
929 \put(12,9){\makebox(0,0)[rt]{\strut$M$\,}}
930 \put(12,0){\line(0,1){9}\vector(0,-1){3}}
931 \put(0,9){\line(1,0){12}}
932 \put(0,-3){\line(1,0){12}}
933 \put(0.2,4.5){\makebox(0,0)[l]{\texttt{height}}}
934 \put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}}
935 \put(6,0.2){\makebox(0,0)[b]{\texttt{width}}}
938 \put(3,0){\line(0,1){7}\line(0,-1){2.5}\line(1,0){6}}
939 \put(9,0){\line(0,1){7}\line(0,-1){2.5}}
940 \put(3,7){\line(1,0){6}}
941 \put(3,-2.5){\line(1,0){6}}
943 \savebox{\eqdist}(0,0)[c]{%
945 \put(-0.08,0.2){\line(0,-1){0.4}}%
946 \put(0.08,0.2){\line(0,-1){0.4}}}
947 \put(1.5,0){\usebox{\eqdist}}
948 \put(10.5,0){\usebox{\eqdist}}
951 \put(3,-1.5){\vector(-1,0){4}}
952 \put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}}
953 \put(3,0){\vector(0,-1){1.5}}
954 \put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}}
957 \caption{The position of the `real' glyph.}
961 Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is
962 the imaginary body which is specified in the metric, and a vertical
963 rectangle is the imaginary body of a real glyph. First, the real glyph
964 is aligned with respect to the width of $M$. In the figure, the real
965 glyph is aligned `middle'; this setting is useful for the full-width
966 middle dot `・'. We have other settings, namely, `left' and `right'.
967 After that, it is shifted according to the value of |left| and |down|,
968 which are specified in the metric. The final position of the real glyph
969 is shown by the gray rectangle. If the amount of shifting baseline is
970 not zero, $M$ (and hence the real glyph) is shifted by that amount.
972 We would like to remark briefly about the vertical position of a glyph.
973 A JFM (or the metric used in \LuaTeX-ja) and the real font used for it
974 may have different height or depth. In that case, it may look better if
975 the real glyph is shifted vertically to match the height-depth ratio
976 specified in the metric. This situation is carefully studied by
977 Otobe~\cite{min10}. Here the policy on this problem is not determined
978 now, however we want to offer several solutions by \LuaTeX-ja.
981 We have discussed about our \LuaTeX-ja package, which is much affected
982 by p\TeX. For now, it can be used for experimental use, however there
983 are much refinements which are needed for regular use. The author hopes
984 that this paper and this project contribute the typesetting Japanese,
985 and possibly other Asian languages, under \LuaTeX.
989 %%% The style of the bibiliogrphy is `amsplain'.
990 \providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
991 \providecommand{\href}[2]{#2}
992 \begin{thebibliography}{99}
995 ASCII MEDIA WORKS,アスキー日本語\TeX\ (p\TeX).\url{http://ascii.asciimw.jp/pb/ptex/}
998 Jin-Hwan~Cho and Haruhiko Okumura, \emph{Typesetting CJK Languages with Omega},
999 \TeX, XML, and Digital Typography, Lecture Notes in Computer Science, vol.~3130,
1000 Springer, 2004, 139--148.
1003 Yannis Haralambous. \emph{The Joy of \LuaTeX}. \url{http://luatex.bluwiki.com/}
1006 Japanese Industrial Standards Committee. \emph{JIS~X~4051: Formatting
1007 rules for Japanese documents}, 1993, 1995, 2004.
1010 北川弘典,$\varepsilon$-p\TeX についてのwiki.
1011 \url{http://sourceforge.jp/projects/eptex/wiki/FrontPage}
1015 \url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378}
1018 \LuaTeX\ development team, \emph{The \LuaTeX\ reference}.
1019 \url{http://www.luatex.org/svn/trunk/manual/luatexref-t.pdf} (snapshot of SVN trunk)
1022 The \LuaTeX-ja project team, \emph{The \LuaTeX-ja package}.
1023 Not completed for now. Available at |doc/man-en.pdf| (in English) or
1024 |doc/man-ja.pdf| (in Japanese)
1025 in the Git repository.
1027 \bibitem{luajp-test}
1029 \url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html}
1031 \bibitem{luajalayout}
1032 前田一貴,luajalayout パッケージ---Lua\LaTeX によ
1034 \url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/}
1037 奥村晴彦,p\LaTeXe 新ドキュメントクラス.
1038 \url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/}
1041 Haruhiko Okumura, \emph{p\TeX\ and Japanese Typesetting},
1042 The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51.
1046 \url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf}
1049 齋藤修三郎,Open Type Font用VF.
1050 \url{http://psitau.kitunebi.com/otf.html}
1052 \bibitem{stack-mail}
1053 Jonathan Sauer, \emph{[Dev-luatex] tex.currentgrouplevel}.
1054 \url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html}
1057 Takuji Tanaka, \emph{up\TeX, up\LaTeX---unicode version of p\TeX, p\LaTeX}.
1058 \url{http://homepage3.nifty.com/ttk/comp/tex/uptex_en.html}
1061 Nobuyuki Tsuchimura, \emph{Development of a Japanese \TeX\ Distribution~`ptetex3'},
1062 Computer Software\ \textbf{24} (2007), no.~4, 40--50, (in Japanese).
1065 W3C Working Group, \emph{Requirements for Japanese Text Layout}.
1066 \url{http://www.w3.org/TR/jlreq/}
1067 \end{thebibliography}