1 %#!lualatex ajt-devel-ltja
4 %%% Packages used in this paper
8 \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipam.ttf:jfm=ujis}{}
9 \DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=ujis}{}
10 % quick hack: monospaced Japanese font by \ttfamily
11 \DeclareKanjiFamily{JY3}{\ttdefault}{}{}
12 \DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=mono}{}
14 %%% for LTXexample environment
15 \usepackage{showexpl,lltjlisting}
16 \lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em}
18 \usepackage{mflogo,booktabs}
19 \definecolor{grayx}{gray}{0.85}
21 %%% Verbatim environment
23 \CustomVerbatimEnvironment{code}{Verbatim}%
24 {numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
25 \CustomVerbatimEnvironment{codewithoutnum}{Verbatim}%
26 {xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
27 \CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}%
28 {xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize}
31 %%% Mandatory article metadata %%%
32 \title{Development of the \LuaTeX-ja package}
33 \author{Hironori Kitagawa {\normalsize 北川 弘典}}
34 \address{The \LuaTeX-ja project team}
35 \email{h\_kitagawa2001@yahoo.co.jp}
37 \keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese}
39 The \LuaTeX-ja package is a macro package for typesetting Japanese
40 documents under \LuaTeX. This packages has more flexibility of
41 typesetting than p\TeX, and corrected some unwanted features of p\TeX.
42 In this paper, we describe specifications, the current status and some
43 internal processing methods of \LuaTeX-ja.
46 \newcommand{\parname}[1]{\textsf{#1}}
47 \newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp}
48 \newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi%
49 \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0
50 \smash{\vrule width \wd0 height 0.4pt depth0.4pt}}}}
53 %%% Do not forget to start with \maketitle!
56 \section{Introduction}
58 To typeset Japanese documents with \TeX, ASCII p\TeX~\cite{ptex} has
59 been widely used in Japan. There are other methods---for example, using
60 Omega and OTP~\cite{omega}, or with the CJK package---to do so, however,
61 these alternative methods did not become a majority. The author thinks
62 that this is because p\TeX\ enables us to produce high-quality documents
63 (e.g.,~supporting vertical typesetting), and the appearance of p\TeX\ is
64 earlier than alternatives described above.
66 However, p\TeX\ has been left behind from the extensions of \TeX\
67 such as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In
68 recent years, the situation become better, because of developments
69 of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
70 $\varepsilon$-p\TeX~\cite{eptex} by the author,~and up\TeX~\cite{uptex}
71 by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, to develop
72 an engine extension localized for Japanese, is not wise. This approach
73 needs lots of work for \emph{each} engine, and since \LuaTeX\ has an ability
74 to hook \TeX's internal process by using Lua callbacks, the necessity of
75 an engine extension is getting smaller.
78 There were several experimental attempts to typeset
79 Japanese documents with \LuaTeX\ before. Here we cite three examples:
81 \item |luaums.sty|~\cite{luaums} developed by the author. This
82 experimental package is for creating a certain Japanese-based presentation
84 \item the \emph{luajalayout} package~\cite{luajalayout}, formerly known as the
85 \emph{jafontspec} package, by Kazuki Maeda (前田一貴). This package is based on
86 \LaTeXe\ and \emph{fontspec} package.
87 \item the \emph{luajp-test} package~\cite{luajp-test}, a test package made by
88 Atsuhito Kohda (香田温人), based on articles on the web page~\cite{joylua}.
90 However, these packages are based on \LaTeXe, and do not have much
91 ability to control the typesetting rule. And it is inefficient that more
92 than one people separately develop similar packages. Development of the
93 \LuaTeX-ja package is started initially by the author and Kazuki Maeda, because of
96 \subsection{Development Policy of \LuaTeX-ja}
98 The first aim of the \LuaTeX-ja project is to implement features (from the
99 'primitive' level) of p\TeX\ as macros under \LuaTeX, so \LuaTeX-ja is
100 much affected by p\TeX. However, as development proceeds, some
101 technical/conceptual difficulties are arisen. Hence we changed the aim
102 of the project as follows:
104 \item\emph{\LuaTeX-ja offers at least the same flexibility of
105 typesetting that p\TeX\ has.}
107 We think that the ability of producing outputs conformed to
108 JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for
109 typesetting, or to a technical note~\cite{w3c} by W3C is not enough;
110 if one wants to produce very incoherent outputs for some reason, it
112 In this point, previous attempts of Japanese typesetting with \LuaTeX\
113 which we cited in the previous subsection are inadequate.
115 p\TeX\ has some flexibility of typesetting, by changing internal
116 parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using
117 custom JFM (Japanese TFM). Therefore we decided to include these
118 functionality to \LuaTeX-ja.
120 \item\emph{\LuaTeX-ja isn't mere re-implementation or porting of p\TeX;
121 some (technically and/or conceptually) inconvenient features of
122 p\TeX\ are modified.}
124 We describe this point in more detail at the next section.
128 \subsection{Contents of this Paper}
129 Here we describe the contents of the rest of this paper briefly. In
130 Section~2, we describe major differences between p\TeX\ and \LuaTeX-ja.
131 In Section~3, we show the current status of the \LuaTeX-ja package. In
132 Section~4, we describe some internal routines of \LuaTeX-ja. We hope
133 that the materials in this section have good applications.
135 \subsection*{About the Project}
136 This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
138 \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage}. There is
139 no stable version at Oct.\ 15, 2011, however the development source can be
140 obtained from the git repository. Members of the project are as follows
141 (in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
142 Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda,
146 \section{Major differences with \pTeX}
147 In this section, we look at several major differences between p\TeX\
148 and our \LuaTeX-ja. For general information of Japanese typesetting and the
149 overview of p\TeX, please see Okumura~\cite{ptexjp}.
152 \subsection{Names of Control Sequences}
153 \label{ssec-csname} Since p\TeX\ is an engine modification of Knuth's
154 original \TeX82 engine, some primitives added by it take a form that is
155 very difficult to be simulated by a macro. For example, an additional
156 primitive |\prebreakpenalty|$\langle\hbox{\it
157 char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in p\TeX\
158 sets the amount of penalty inserted before a character whose code is
159 $\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it
160 penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it
161 char\_code}\rangle$ can be also used for retrieving the value.
163 Moreover, there are some parameters which values of them at the end of a
164 horizontal box or that of a paragraph are effective in whole box or
165 paragraph. These parameters were implemented as additional internal
166 parameters in \pTeX. However, the implementation of these parameters in
167 \LuaTeX-ja is not so easy; we will discuss on it in
168 Subsection~\ref{ssec-stack}.
170 From above 2~problems we discussed above, the assignment and retrieval
171 of most parameters in \LuaTeX-ja are summarized into the following
174 \item |\ltjsetparameter{|$\langle\hbox{\it
175 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
177 \item |\ltjglobalsetparameter|: for global assignment. These two control
178 sequences obey the value of |\globaldefs| primitive.
179 \item |\ltjgetparameter{|$\langle\hbox{\it
180 name}\rangle$|}[{|$\langle\hbox{\it optional
181 argument}\rangle$|}]|: for retrieval. The returned value is always
185 \subsection{Line-break after a Japanese Character}
188 Japanese texts can break lines almost everywhere, in contrast with
189 alphabetic texts can break lines only between words (or use
190 hyphenation). Hence, p\TeX's input processor is modified so that a
191 line-break after a Japanese character doesn't emit a space. However,
192 there is no way to customize the input processor of \LuaTeX, other than
193 to hack its CWEB-source. All a macro package can do is to modify an input line before
194 when \LuaTeX\ begin to process it, inside the |process_input_buffer|
197 Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this
198 purpose) will be appended to an input line, if this line ends with a Japanese
199 character\footnote{Strictly speaking, it also requires that the catcode
200 of the end-line character is 5~(\emph{end-of-line}). This condition is
201 useful under the verbatim environment.}. One might jump to a conclusion
202 that the treatment of a line break by p\TeX\ and that of \LuaTeX-ja are
203 totally same, however they are different in the respect that \LuaTeX-ja's
204 judgement whether a comment letter will be appended the line is done
205 \emph{before} the line is actually processed by \LuaTeX.
207 Figure~\ref{fig-linebreak} shows an example of this situation; the
208 command at the first line marks most of Japanese characters as
209 `non-Japanese characters'. In other words, from that command onward, the
210 letter `あ' will be treated as an alphabetic character by
211 \LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in
212 the output, where the actual output in the figure does not so. This is
213 because `あ' is considered a Japanese character by \LuaTeX-ja,
214 when \LuaTeX-ja does a decision whether U+FFFFF will be added to the
220 \ltjsetparameter{jacharrange={-6}}xあ
223 \caption{A notable sample showing the treatment of a line break after a
224 Japanese character.}\label{fig-linebreak}
227 \subsection{Separation between `real' fonts and Metrics}
230 Traditionally, most Japanese fonts used in typesetting are not
231 proportional, that is, most glyphs have same size (in most cases,
232 square-shaped). Hence, it is not rare that the contents of different
233 JFMs are essentially same, and only differ in their names. For example,
234 |min10.tfm| and |goth10.tfm|, which are JFMs shipped with p\TeX\ for
235 seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family,
236 differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and
237 |jisg.tfm|, which consists a parts of \emph{jis} font metric, which is
238 used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦),
239 are totally same as binary files. Considering this situation, we
240 decided to separate `real' fonts and metrics used for them in
241 \LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain
242 \TeX\ are shown in Figure~\ref{fig-jfdef}. We would like to add several
245 \item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
246 \item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so
247 |file:| and |name:| prefixes, and various font features can be
248 used as the line~1 in Figure~\ref{fig-jfdef}.
249 \item The |jfm| key specifies the metric for the font. In
250 Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a
251 Lua script named |jfm-ujis.lua|. This metric is the standard
252 metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf}
254 \item The |psft:| prefix can be used to specify name-only, non-embedded
255 fonts. When one display a pdf with these fonts, actual fonts which
256 will be used for them depend on a pdf reader.
258 The specification of a metric for \LuaTeX-ja is similar to that of a JFM
259 (see \cite{ptexjp}); characters are grouped into several classes, the
260 size information of characters are specified for each class, and
261 glue/kern insertions are specified for each pair of classes. Although
262 the author have not tried, it may be possible to develop a program that
263 `converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three
264 metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the
265 \emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|.
267 Note that |-kern| in features
268 is important, since kerning information from real font itself will
269 clash with glue/kern informations from the metric.
273 \jfont\foo=file:ipam.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt
274 \jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt
276 \caption{Typical declarations of Japanese fonts.}
280 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Timing}
283 As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
284 processes are totally different from those of \TeX82. \TeX82's process is
285 done just when a (sequence of) character is appended to the current
286 list. Thus we can interrupt this process by writing as
287 |f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the
288 process will be done when a horizontal box or a paragraph is ended, so
289 |f{}irm| and |firm| yield same outputs under \LuaTeX.
291 The situation for Japanese characters is more complicated.
292 Glues (and kerns) which are needed for Japanese
293 typesetting will be divided into the following three categories:
295 \item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue},
298 \item Default glue between a Japanese character and an alphabetic
299 character (\emph{xkanjiskip}, for short), usually 1/4 of
300 full-width (\emph{shibuaki}) with some stretch and shrink for
301 justifying each line.
302 \item Default glue between two consecutive Japanese characters
303 (\emph{kanjiskip}, for short). The main reason of this glue is to
304 enable breaking lines almost everywhere in Japanese texts. In most
305 cases, its natural width is zero, and some stretch/shrink for
306 justifying each line.
308 In p\TeX, these three kinds of glues are treated differently. A JFM glue
309 is inserted when a (sequence of) Japanese character is appended to the
310 current list, same as the case of alphabetic characters in \TeX82. This
311 means that one can interrupt the insertion process by saying |{}|. A
312 \emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a
313 paragraph; this timing is somewhat similar to that of \LuaTeX's kerning
314 process. Finally, A \emph{kanjiskip} is not appeared as a node anywhere;
315 only appears implicitly in calculation of the width of a horizontal box,
316 that of breaking lines, and the actual output process to a DVI
317 file. These specifications made p\TeX's behavior very hard to
320 \LuaTeX-ja inserts glues in all three categories simultaneously inside
321 |hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of
322 this specification are to behave like alphabetic characters in \LuaTeX\
323 (as described in the first paragraph), and to clarify the specification
324 for \LuaTeX-ja's process.
326 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Spec}
328 \caption{Examples of differences between p\TeX\ and \LuaTeX-ja,}
331 \begin{tabular}{llllllll}
333 &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\
334 Input &|あ】{}【〙\/〘| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule
335 p\TeX &あ】\hbox{}【〙\hbox{}〘&い』\/a &う)\hbox{}( &え]\hbox{}[\\
336 \LuaTeX-ja &あ】{}【〙\/〘 &い』\/a &う)\hbox{}( &え]\special{}[\\
344 \fontsize{40}{40}\selectfont
346 \imagfm{\jstrut 】\inhibitglue}%
347 \imagfm{\jstrut\kern.5\zw}%
348 \imagfm{\jstrut\kern.5\zw}%
349 \imagfm{\jstrut\inhibitglue【}%
350 \imagfm{\jstrut 〙\inhibitglue}%
351 \imagfm{\jstrut\kern.5\zw}%
352 \imagfm{\jstrut\kern.5\zw}%
353 \imagfm{\jstrut\inhibitglue〘}%
355 \caption{Detail of (1) in Table~\ref{tab-jfmglue}.}
359 Now we will take a look inside the insertion process itself, and describe 4~points.
363 As noted in the previous subsection, the insertion process in p\TeX\ can
364 be interrupted by saying |{}| or anything else\footnote{This
365 is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for
366 \texttt{min10.tfm} and other `old' JFMs work.}. This leads
367 the second row in Table~\ref{tab-jfmglue}, or
368 Figure~\ref{fig-ptexjfm}. `The process is interrupted' means
369 that p\TeX\ does not think the letter `】\inhibitglue' is
370 followed by `\inhibitglue【', hence two half-width glues are
371 inserted between between `】\inhibitglue' and `\inhibitglue【',
372 where one is from `】\inhibitglue' and another is from
375 On the other hand, in \LuaTeX-ja, the process is done inside
376 |hpack_filter| and |pre_linebreak_filter| callbacks. Hence,
377 \emph{anything that does not make any node will be
378 ignored}\ in \LuaTeX-ja, as shown in (1) in
379 Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes
380 which does not make any contribution to current horizontal
381 list---\emph{ins\_node}, \emph{adjust\_node},
382 \emph{mark\_node}, \emph{whatsit\_node} and
383 \emph{penalty\_node}---, as shown in (4).
386 By the way, around a \emph{glyph\_node} $p$ there may be some nld odes
387 attached to $p$. These are an accent and kerns for
388 positioning it, and a kern from the italic
389 correction\footnote{\TeX82 (and \LuaTeX) does not distinguish
390 between explicit kern and a kern for italic correction. To
391 distinguish them, an additional subtype for kern is introduced
392 in p\TeX. On the other hand, \LuaTeX-ja uses an additional attribute and
393 redefines \texttt{\char`\\/}.} for $p$. It is natural that
394 these attachments should be ignored inside the process. Hence
395 \LuaTeX-ja takes this approach, as the latest version of
396 p\TeX\ (p3.2). This explains (2) in the figure.
398 Summerizing above, one should put an empty horizontal box |\hbox{}| to
399 where he wants to interrupt the insertion process in
400 \LuaTeX-ja as (3) in the figure.
402 \item[Fonts with the Same Metric]
403 Recall that \LuaTeX-ja separated `real' fonts and metrics, as in Subsection~\ref{ssec-sepmet}.
404 Consider the following input, where all Japanese fonts use same metric
405 (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family for
406 the current Japanese font family:
412 If the above input is processed by p\TeX, since the insertion process is
413 interrupt by |\gt|, the result looks like
415 \mc 明朝)\hbox{}\gt (ゴシック
417 However this seems to be unnatural, since two Japanese fonts in the
418 output use the same metric, i.e.,~the same
419 typesetting rule. Hence, we decided that Japanese fonts with
420 the same metric are treated as one font in the insertion
421 process of \LuaTeX-ja. Thus, the output from the above input
422 in \LuaTeX-ja looks like:
426 One might have the situation that this default behavior is not
427 suitable. \LuaTeX-ja offers a way to cope with this case, but
428 we leave it to the manual~\cite{man}.
430 \item[Fonts with Different Metrics]
431 In the case where two consecutive Japanese characters use different metrics and/or
432 different size is similar. Consider the following input where
433 the \emph{mincho} family and the \emph{gothic} family use
440 As the previous paragraph, this input yields the following, by p\TeX:
442 \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大
444 We thought that amounts of spaces between parentheses in above output
445 are too much. So we changed the default behavior of
446 \LuaTeX-ja so that the amount of a glue between two Japanese
447 characters with different metrics is the average of a glue
448 from the left character and that from the right
449 character. For example, Figure~\ref{fig-diffmet} shows the
450 output from above input. The width of glue indicated `(1)' is
451 $(a/2 + a/2)/2 = 0.5a$, and the width of glue indicated `(2)'
452 is $(a/2 + 1.2a/2)/2 = 0.55a$. This default behavior can be
453 changed by \textsf{diffrentmet} parameter of \LuaTeX-ja.
457 \fontsize{40}{40}\selectfont
458 \imagfm{\jstrut\smash{%
459 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr漢\cr
460 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\
461 \hrulefill\vrule height .5ex depth .5ex\cr}}}}%
462 \imagfm{\jstrut )\inhibitglue}%
463 \hbox to .5\zw{\hss\normalsize (1)\hss}%
464 \imagfm{\jstrut\inhibitglue\gt (}%
465 \imagfm{\jstrut\gt 漢}%
466 \imagfm{\jstrut\gt )\inhibitglue}%
467 \hbox to .55\zw{\hss\normalsize (2)\hss}%
468 \imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}%
469 \imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{%
470 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr
471 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$1.2a$}\
472 \hrulefill\vrule height .5ex depth .5ex\cr}}}}
474 \caption{Fonts with different metrics.}
478 \item[\emph{kanjiskip} and \emph{xkanjiskip}]
479 In p\TeX, the value of \emph{xkanjiskip} is controlled by a skip named
480 |\xkanjiskip|. A defect of this implementation is that the
481 value of \emph{xkanjiskip} is not connected with the size of
482 the currnt Japanese font. It seems that |EXTRASPACE|,
483 |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are
484 reserved for specifying the default value of
485 \emph{xkanjiskip} in a unit of the design size, but p\TeX\
486 did not use these parameters.
488 Considering this situation of p\TeX, \LuaTeX-ja can use the value of
489 \emph{xkanjiskip} that specified in a metric. If the value of
490 \emph{xkanjiskip} on user side (this is the
491 \textsf{xkanjiskip} parameter in |\ltjsetparameter|) is
492 |\maxdimen|, then the \LuaTeX-ja use the specification from
493 the current used metric as the actual value of
495 This description also applies for \emph{kanjiskip}.
499 \section{Current Status of Development}
500 At the moment, \LuaTeX-ja can be used under plain \TeX, and under
501 \LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by |\input|
502 command or |\usepackage| (in~\LaTeXe), if you merely want to typeset
503 Japanese characters. We look more detail by parts.
505 \subsection{`Engine Extension'}
506 The lowest part of \LuaTeX-ja corresponds the p\TeX\ extension as
507 \emph{an engine extension of \TeX}. We, the project menbers, think that
508 this part is almost done. Other features of \LuaTeX-ja which we have not
509 described are the followings:
511 \item[Setting the Range of `Japanese characters'] This feature is
512 inspired by up\TeX. up\TeX\ has an additional primitive named
513 |\kcatcode| for setting how a character is treated among an
514 alphabetic character, \emph{kana}, \emph{kanji},
515 \emph{Hangul}, or~\emph{an other CJK character}. and the
516 assignment of |\kcatcode| can be done by a Unicode
517 block\footnote{There are some exceptions. For example,
518 U+FF00--FFEF (Halfwidth and Fullwidth Forms) are divided into
519 three blocks in recent up\TeX.}.
521 \LuaTeX-ja uses a slightly different approach. Because there are many
522 Unicode blocks already in Basic Multilingual Plane which are
523 not included in most Japanese fonts, so it would be
524 inefficient to toggle by a Unicode block. Furthermore, the
525 basic Japanese character set JIS~X~0208 are not just union of
526 Unicode blocks; for example, the intersection of JIS~X~0208
527 and Latin-1 Supplement is shown in Table~\ref{tab-inter}.
528 Considering these two points, to customize the range of
529 Japanese characters in \LuaTeX-ja, one has to define
530 character ranges in his source in advance.
532 \item[Shifting Baseline]
533 In order to make a match between Japanese fonts and alphabetic fonts,
534 sometimes shifting the baseline of alphabetic characters may
535 be needed. p\TeX\ has a dimension |\ybaselineshift|, which
536 corresponds the amount of shifting down the baseline of alphabetic
537 characters. This is useful for Japanese-based documents, but
538 not for documents mainly in languages with alphabetic
541 Hence, \LuaTeX-ja extends p\TeX's |\ybaselineshift| to Japanese
542 characters. Namely, \LuaTeX-ja offers two parameters,
543 \textsf{yjabaselineshift} and \textsf{yalbaselineshift}, for the
544 amount of shifting the baseline of Japanese characters and
545 that of alphabetic characters, respectively.
549 \fontsize{40}{40}\selectfont\fboxsep0mm
550 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
551 \hbox to 0.9\linewidth{%
553 \raise-10pt\imagfm{\jstrut 漢}%
554 \raise-10pt\imagfm{\jstrut 字}\hskip.25\zw%
559 \imagfm{\jstrut 字}\hskip.25\zw%
560 \raise-10pt\imagfm{p}%
561 \raise-10pt\imagfm{h}%
566 \caption{First example of shifting baseline.}
572 \fontsize{30}{30}\selectfont\fboxsep0mm
573 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
574 \hbox to 0.9\linewidth{%
577 \imagfm{b}\hskip.25\zw%
579 \imagfm{\jstrut 文}\hskip.33333\zw%
580 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut\inhibitglue (}%
581 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 注}%
582 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 釈}\hskip.1666667\zw%
583 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont c}%
584 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont o}%
585 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
586 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
587 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont e}%
588 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont n}%
589 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont t}%
590 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut )\inhibitglue}%
598 \caption{Second example of shifting baseline.}
602 An example output is shown in Figure~\ref{fig-bls}. The left half is the
603 output when \textsf{yjabaselineshift} is positive, hence the
604 baseline of Japanese characters is shifted down. On the other
605 hand, the right half is the output when
606 \textsf{yalbaselineshift} is positive, hence the baseline of
607 alphabetic characters is shifted. Figure~\ref{fig-small}
608 shows an intresting use of these parameters.
611 Note that \LuaTeX-ja doesn't support for vertical typesetting, \emph{tategaki}, for now.
614 \caption{Intersection of JIS~X~0208 and Latin-1 Supplement.}
617 \begin{tabular}{llll}
630 \subsection{Patches for plain \TeX\ and \LaTeXe}
631 p\TeX\ has a patch for plain \TeX, namely |ptex.tex|, that for \LaTeXe\
632 macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and
633 |kinsoku.tex| which includes the default setting of \emph{kinsoku
634 shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except
635 the codes related to vertical typesetting, since \LuaTeX-ja doesn't
636 support vertical typesetting yet. We remark two points related to the
639 \item[Default Range of Japanese Characters]
640 As described in the previos subsection, \LuaTeX-ja can customize the
641 range of Japanese characters. \LuaTeX-ja predefines 8~character ranges,
642 as shown in Table~\ref{tab-chrrng}. Almost of these ranges are just the
643 union of Unicode blocks, and determined from the Adobe-Japan1-6 character
644 collection~\cite{aj16}, and JIS~X~0208. And, among these 8~ranges, the
645 ranges~2, 3, 6, 7, and~8 are considered ranges of Japanese
646 characters, and others are considered ranges of alphabetic
649 This default setting is suitable for Japanese-based documents, however it
650 causes that other packages which use Unicode fonts do not work
651 correctly. For example, |\times| provided by the
652 |unicode-math| package is the character U+00D7, which belongs
653 to the range~8, and |\textendash| provided by the |EU2|
654 encoding used in the \emph{fontspec} package is the
655 character U+2013, which belongs to the range~3. hence, these
656 character cannot be typeset correctly with the default range setting.
659 \caption{Predefined ranges in \LuaTeX-ja}
662 \begin{tabular}{@{\bf}rl}
663 1&(Additional) Latin characters which is not belonged in the range~8.\\
664 2&Greek and Cyrillic letters.\\
665 3&Punctuations and miscellaneous symbols.\\
666 4&Unicode blocks which does not intersect with Adobe-Japan1-6.\\
667 5&Surrogates and supplementary private use Areas.\\
668 6&Characters used in Japanese typesetting.\\
669 7&Characters possibly used in CJK typesetting, but not in Japanese.\\
670 8&Characters in Table~\ref{tab-inter}.
676 \item[Behavior of\/ {\tt\char92fontfamily\/}]
677 The control sequence |\fontfamily| in p\LaTeXe\ changes the current alphabetic
678 font family and/or the current Japanese font family,
679 depending the argument. More concretely,
680 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
681 current alphabetic font family to $\langle\hbox{\it
682 arg\/}\rangle$, if and only if one of the following
683 conditions are satisfied:
685 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
686 \emph{some} alphabetic encoding already defined in the document.
687 \item There exists an alphabetic encoding $\langle\hbox{\it
688 enc\/}\rangle$ already defined in the document such that a font
689 definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
690 arg\/}\rangle$|.fd| exists.
692 The same criterion is used for changing Japanese font family.
694 To work this behavior well, a list of all (alphabetic) encodings defined
695 already in the document is needed. However, since \LuaTeX-ja
696 is loaded as a package, \LuaTeX-ja cannot have this list.
697 Hence \LuaTeX-ja adopted a different approach, namely
698 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
699 current alphabetic font family to $\langle\hbox{\it
700 arg\/}\rangle$, if and only if:
702 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
703 the current alphabetic encoding $\langle\hbox{\it enc\/}\rangle$.
704 \item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
705 arg\/}\rangle$|.fd| exists.
713 \subsection{Classes for Japanese Documents}
714 To produce `high-quality' Japanese documents, we need not only that
715 Japanese characters are correctly placed, but also class files for
716 Japanese documents. In p\TeX, there are two major families of classes:
717 \emph{jclasses} which is distributed with the official p\LaTeXe\ macros,
718 and \emph{jsclasses}. At the present, \LuaTeX-ja
719 simply contains their counterparts: \emph{ltjclasses} and
720 \emph{ltjsclasses}. However, the policy on classess is not determined
721 now, and we hope to have another family of classes which are useful in
722 commercial printing. In the author's opinion, \emph{ltjclasses} is
723 better to stay as an example of porting of class files for \pTeX\ to
726 \subsection{Patches for Packages}
727 Apart from patches for the \LaTeXe~kernel and classes for Japanese
728 documents, we need to make patches for several packages. At the present,
729 we considered the following packages, and made patches or porting for
730 the former two packages.
733 \item[The \emph{fontspec} package] The \emph{fontspec} package is built
734 on NFSS2, hence control sequences offered by the
735 \emph{fontspec} package, such as |\setmainfont|, are only
736 effective for alphabetic fonts if \LuaTeX-ja is loaded. The
737 optional package \texttt{luatexja-\penalty0fontspec.sty}
738 offers these counterparts for Japanese fonts, with additional
739 `j' in the name of control sequences, such as
742 \item[The \emph{otf} package]
743 This package is widely used in p\TeX\ for characters which is
744 not in JIS~X~0208, and for using more than one weight in \emph{mincho}
745 and \emph{gothic} font families. Therefore \LuaTeX-ja supports features
746 in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty}
747 manually. Note that characters by |\UTF{xxxx}| and
748 |\CID{xxxx}| are not appended to the current list as a
749 \emph{glyph\_node}, so they are not affected by callbacks by
750 the \emph{luaotfload} package. We have another remark; |\CID|
751 does not work with TrueType fonts.
753 \item[The \emph{listings} package]
754 It is well-known that there is a patch |jlisting.sty| of the
755 \emph{listings} package for p\LaTeXe. Generally speaking, it
756 also can be used in \LuaTeX-ja. However, it seems to be that
757 a Japanese character after a space does not recieve any
758 process of the \emph{listings} package; this is inconvinient
759 when we use the \emph{showexpl} package.
764 \section{Implementation}
765 \subsection{Handling of Japanese Fonts}
766 In p\TeX, there are three slots for maintaining current fonts, namely
767 |\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal
768 direction) and |\tfont| for Japanese font (in vertical direction). With
769 these slots, we can manage the current font for alphabetic characters
770 and that for Japanese characters separately in p\TeX. However, \LuaTeX\
771 has only one slot for maintaining the current font, as \TeX82. This
772 situation leads a problem: how can we maintain the `current Japanese
775 There are three approaches for this problem. One approach is to make a
776 mapping table from alphabetic fonts to corresponding Japanese fonts
777 (here we don't assume that NFSS2 is available). Another approach is
778 that we always use composite fonts with alphabetic fonts and Japanese
779 fonts. The third approach is that the information of the current
780 Japanese font is stored in an attribute. We adopted the third approach,
781 since \LuaTeX-ja is much affected by p\TeX\ as we noted in
782 Subsection~\ref{ssec-pol}.
784 As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining
785 Japanese font, as p\TeX. However, since the information of the current
786 Japanese font is stored into an attribute, control sequences defined by
787 |\jfont| (e.g.,~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
788 not representing a font by the means of \TeX82. In other words, each of
789 these control sequences is just an assignment to an attribute, therefore
790 they cannot be an argument of |\the|, |\fontname|, or |\textfont|.
792 \subsection{Overview of the Processes}
793 Now we describe an outline of the \LuaTeX-ja's process briefly.
795 \item[Treatment of Line-breaks after a Japanese Character] This part is
796 described already in Subsection~\ref{ssec-line}. Done in the
797 |process_input_buffer| callback.
798 \item[Font Replacement] In the |hyphenate| callback, \LuaTeX-ja looks
799 into for each \textit{glyph\_node}~$p$ in the list. If the
800 character represented by $p$ is considered as a Japanese
801 character, the font used in $p$ is replaced by the value of
802 |\ltj@curjfnt|, `the current Japanese font' at~$p$. Also the
803 character class of the character is looked up at this time.
805 Furthermore the subtype of $p$ is subtracted by 1 to suppress
806 hyphenation around it by \LuaTeX, since later processes of
807 \LuaTeX-ja take care of all things about Japanese characters.
810 Following processes are all executed in |pre_linebreak_filter| and
811 |hpack_filter| callback. These processes are main routines of \LuaTeX-ja.
814 \item[Examination of Stack Level] The horizontal list which
815 is the content of a horizontal box is traversed,
816 to determine what is the level of \LuaTeX-ja's internal stack at the end
817 of the list. This is needed because of the place of
818 the |hpack_filter| callback in the source of \LuaTeX. We will discuss more
819 detail in Subsection~\ref{ssec-stack}.
821 \item[Insertion of Glues/Kerns for Japanese Typesetting]
822 This part is already described at Subsection~\ref{ssec-jglue}.
824 \item[Adjustument of the Position of (Japanese) Characters]
825 We will discuss the detail about this in Subsection~\ref{ssec-width}.
828 The callbacks by the \emph{luaotfload} package, e.g.,~replacement of
829 glyphs according to font features, are executed just after `Examination
830 of Stack Level' above.
832 \subsection{Stack Management}
835 As we noted in Subsection~\ref{ssec-csname}, parameters that the values
836 at the end of a horizontal box or that of a paragraph are effective in
837 whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented
838 by internal integers or registers of other types in \TeX. We explain it
849 if (cur_list.mode_field == -hmode) {
850 cur_box = filtered_hpack(cur_list.head_field,
851 cur_list.tail_field, saved_value(1),
852 saved_level(1), grp, saved_level(2));
853 subtype(cur_box) = HLIST_SUBTYPE_HBOX;
856 \caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX}
860 Figure~\ref{fig-ltsrc} is an extract of a CWEB-source
861 \texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function
862 is called just when an explicit |\hbox{...}| or |\vbox{...}| is ended, and
863 the function |filtered_hpack()| is where the |hpack_filter| and then the
864 actual `hpack' process are performed. Notice that the |unsave()|
865 function is called before |filtered_hpack()|. This is the problem;
866 because of |unsave()|, we can retrive only the values of registers
867 \emph{outside} the box, even in the |hpack_filter| callback.
869 To cope with this problem, \LuaTeX-ja has its own stack system, based on
870 Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose
871 \emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be
872 appended to the current horizontal list each time the current stack
873 level is incremented, and their values are the values of
874 |\currentgrouplevel| at that time. In the beginning of the |hpack_filter|
875 callback, the list in question is traversed to determine whether the
876 stack level at the end of the list and that outside the box coincides.
878 Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current
879 stack level, both inside the |hpack_filter| callback, i.e.,~outside a
880 horizontal box. Consider a list which represents the content of the box,
883 \item A \emph{stack\_node} whose value is $x+1$ (since all materials in
884 the box are included in a group |\hbox{...}|, the value is at
885 least $x+1$) in the list represents an assignment related to the
886 stack system in just top-level of the list, like
889 \hbox{...(assignment)...}
892 In this case, the current stack level is incremented to $y+1$ after the assignment.
893 \item A \emph{stack\_node} whose value is more than $x+1$ in the list represents
894 an assignment inside another group contained in the box. For example,
895 the following input creates
896 a \emph{stack\_node} whose value is $x+3=(x+1)+2$:
899 \hbox{...{...{...(assignment)}...}...}
903 Thus, we can conclude that the stack level at the end of the list is
904 $y+1$, if and only if there is a \emph{stack\_node} whose value is
905 $x+1$. Otherwise, the stack level is just $y$.
907 \subsection{Adjustment of the Position of Japanese Characters}
910 The size of a glyph specified in a metric and that of the real font
911 usually differ. For example, the letter `\inhibitglue【' is half-width
912 in |jfm-ujis.lua| or |jis.tfm|, while this letter is full-width like `【'
913 in most TrueType fonts used in Japanese typesetting, such as
914 IPA~Mincho. Hence the adjustment of position of such glyphs is
915 needed. In the context of p\TeX, this process was performed using virtual fonts.
917 On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph
918 into a horizontal box. There are two main reasons why we adopted this
919 method; one is that we feared Lua codes for coexisting with callback by
920 |luaotfload| package would be large if we use virtual fonts, and the
921 other is to cope with the shifting of the baseline of characters at the
925 \begin{center}\unitlength=9pt\small
926 \begin{picture}(15,12)(-1,-3)
928 \color{grayx}% real glyph
929 \put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
931 \color{black}% real glyph :step1
933 \put(-1,-1.5){\line(0,1){7}\line(0,-1){2.5}}
934 \put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}}
935 \put(-1,5.5){\line(1,0){6}}
936 \put(-1,-4){\line(1,0){6}}
937 \put(-1,0){\makebox(0,0)[r]{\strut$R$\,}}
940 \put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}}
941 \put(12,9){\makebox(0,0)[rt]{\strut$M$\,}}
942 \put(12,0){\line(0,1){9}\vector(0,-1){3}}
943 \put(0,9){\line(1,0){12}}
944 \put(0,-3){\line(1,0){12}}
945 \put(0.2,4.5){\makebox(0,0)[l]{\texttt{height}}}
946 \put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}}
947 \put(6,0.2){\makebox(0,0)[b]{\texttt{width}}}
950 \put(3,0){\line(0,1){7}\line(0,-1){2.5}\line(1,0){6}}
951 \put(9,0){\line(0,1){7}\line(0,-1){2.5}}
952 \put(3,7){\line(1,0){6}}
953 \put(3,-2.5){\line(1,0){6}}
955 \savebox{\eqdist}(0,0)[c]{%
957 \put(-0.08,0.2){\line(0,-1){0.4}}%
958 \put(0.08,0.2){\line(0,-1){0.4}}}
959 \put(1.5,0){\usebox{\eqdist}}
960 \put(10.5,0){\usebox{\eqdist}}
963 \put(3,-1.5){\vector(-1,0){4}}
964 \put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}}
965 \put(3,0){\vector(0,-1){1.5}}
966 \put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}}
969 \caption{The position of the `real' glyph.}
973 Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is
974 the imaginary body which is specified in the metric, and a vertical
975 rectangle is the imaginary body of a real glyph. First, the real glyph
976 is aligned with respect to the width of $M$. In the figure, the real
977 glyph is aligned `middle'; this setting is useful for the full-width
978 middle dot `・'. We have other settings, namely, `left' and `right'.
979 After that, it is shifted according to the value of |left| and |down|,
980 which are specified in the metric. The final position of the real glyph
981 is shown by the gray rectangle~$R$. If the amount of shifting the baseline is
982 not zero, $M$ (and hence the real glyph) is shifted by that amount.
984 We would like to remark briefly about the vertical position of a glyph.
985 A JFM (or the metric used in \LuaTeX-ja) and the real font used for it
986 may have different height or depth. In that case, it may look better if
987 the real glyph is shifted vertically to match the height-depth ratio
988 specified in the metric. This situation is carefully studied by
989 Otobe~\cite{min10}. Here the policy on this problem is not determined
990 now, however we want to offer several solutions by \LuaTeX-ja.
993 We have discussed about our \LuaTeX-ja package, which is much affected
994 by p\TeX. For now, it can be used for experimental use, however there
995 are much refinements which are needed for regular use. The author hopes
996 that this paper and this project contribute the typesetting Japanese,
997 and possibly other Asian languages, under \LuaTeX.
1000 %%% The style of the bibiliogrphy is `amsplain'.
1001 \providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
1002 \providecommand{\href}[2]{#2}
1003 \begin{thebibliography}{99}
1006 Adobe Systems Incorporated, \emph{Adobe-Japan1-6 Character Collection
1007 for CID-Keyed Fonts}, Technical Note~\#5078, 2004.
1008 \url{http://partners.adobe.com/public/developer/en/font/5078.Adobe-Japan1-6.pdf}
1011 ASCII MEDIA WORKS,アスキー日本語\TeX\ (p\TeX).\url{http://ascii.asciimw.jp/pb/ptex/}
1014 Jin-Hwan~Cho and Haruhiko Okumura, \emph{Typesetting CJK Languages with Omega},
1015 \TeX, XML, and Digital Typography, Lecture Notes in Computer Science, vol.~3130,
1016 Springer, 2004, 139--148.
1019 Yannis Haralambous. \emph{The Joy of \LuaTeX}. \url{http://luatex.bluwiki.com/}
1022 Japanese Industrial Standards Committee. \emph{JIS~X~4051: Formatting
1023 rules for Japanese documents}, 1993, 1995, 2004.
1026 北川弘典,$\varepsilon$-p\TeX についてのwiki.
1027 \url{http://sourceforge.jp/projects/eptex/wiki/FrontPage}
1031 \url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378}
1034 \LuaTeX\ development team, \emph{The \LuaTeX\ reference}.
1035 \url{http://www.luatex.org/svn/trunk/manual/luatexref-t.pdf} (snapshot of SVN trunk)
1038 The \LuaTeX-ja project team, \emph{The \LuaTeX-ja package}.
1039 Not completed for now. Available at |doc/man-en.pdf| (in English) or
1040 |doc/man-ja.pdf| (in Japanese)
1041 in the Git repository.
1043 \bibitem{luajp-test}
1045 \url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html}
1047 \bibitem{luajalayout}
1048 前田一貴,luajalayout パッケージ---Lua\LaTeX によ
1050 \url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/}
1053 奥村晴彦,p\LaTeXe 新ドキュメントクラス.
1054 \url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/}
1057 Haruhiko Okumura, \emph{p\TeX\ and Japanese Typesetting},
1058 The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51.
1062 \url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf}
1065 齋藤修三郎,Open Type Font用VF.
1066 \url{http://psitau.kitunebi.com/otf.html}
1068 \bibitem{stack-mail}
1069 Jonathan Sauer, \emph{[Dev-luatex] tex.currentgrouplevel}.
1070 \url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html}
1073 Takuji Tanaka, \emph{up\TeX, up\LaTeX---unicode version of p\TeX, p\LaTeX}.
1074 \url{http://homepage3.nifty.com/ttk/comp/tex/uptex_en.html}
1077 Nobuyuki Tsuchimura, \emph{Development of a Japanese \TeX\ Distribution~`ptetex3'},
1078 Computer Software\ \textbf{24} (2007), no.~4, 40--50, (in Japanese).
1081 W3C Working Group, \emph{Requirements for Japanese Text Layout}.
1082 \url{http://www.w3.org/TR/jlreq/}
1083 \end{thebibliography}