1 %#!lualatex ajt-devel-ltja
4 %%% Packages used in this paper
8 \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipaexm.ttf:jfm=ujis}{}
9 \DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipaexg.ttf:jfm=ujis}{}
10 % quick hack: monospaced Japanese font by \ttfamily
11 \DeclareKanjiFamily{JY3}{\ttdefault}{}{}
12 \DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipaexg.ttf:jfm=mono}{}
14 %%% for LTXexample environment
15 \usepackage{showexpl,lltjlisting}
16 \lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em}
19 \usepackage{mflogo,booktabs}
21 %%% Verbatim environment
23 \CustomVerbatimEnvironment{code}{Verbatim}%
24 {numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
25 \CustomVerbatimEnvironment{codewithoutnum}{Verbatim}%
26 {xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
27 \CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}%
28 {xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize}
31 %%% Mandatory article metadata %%%
32 \title{Development of the \LuaTeX-ja package}
33 \author{Hironori Kitagawa {\normalsize 北川 弘典}}
34 \address{The \LuaTeX-ja project team}
35 \email{h\_kitagawa2001@yahoo.co.jp}
37 \keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese}
39 The \LuaTeX-ja package is a macro package for typesetting Japanese
40 documents under \LuaTeX. This packages has much flexibility of
41 typesetting than p\TeX, and corrected some unwanted features of p\TeX.
42 In this paper, we describe specifications, the current status and some
43 internal processing codes of \LuaTeX-ja.
46 \newcommand{\parname}[1]{\textsf{#1}}
47 \newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp}
48 \newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi%
49 \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0
50 \vrule width \wd0 height 0.4pt depth0.4pt}}}
53 %%% Do not forget to start with \maketitle!
56 \section{Introduction}
58 To typeset Japanese documents with \TeX, ASCII p\TeX~\cite{ptex} has
59 been widely used. There are other methods---for example, using Omega
60 and OTP~\cite{omegaj}, or with the CJK package---to do so, however,
61 these alternative methods did not became a majority. On the one hand,
62 p\TeX\ enables us to produce high-quality documents, but on the other
63 hand, p\TeX\ is left behind from the extensions of \TeX\ such as \eTeX\
64 and \pdfTeX, and the diffusion of UTF-8 encoding. In recent years, the
65 situation become better, because of the developments of
66 |ptexenc|~\cite{ptexenc} by Nobuyuki~Tsuchimura,
67 $\varepsilon$-p\TeX~\cite{eptex} by the author,~and up\TeX~\cite{uptex}
70 However, there are still lag now.
73 Before this \LuaTeX-ja package, there were several attempts to typeset
74 Japanese documents under \LuaTeX. Here we cite three examples:
76 \item |luaums.sty|~\cite{luaums} developed by the author. This
77 experimental package is for creating a Japanese-based presentation
79 \item |luajalayout| package\cite{luajalayout}, formerly known as the
80 |jafontspec| package, by Kazuki Maeda. This package is based on
81 \LaTeXe\ and |fontspec| package.
82 \item |luajp-test| package\cite{luajp-test}, a test package made by
83 Atsuhito Kohda, based on articles on the web page~\cite{joylua}.
87 \subsection{Development Policy of \LuaTeX-ja}
89 The first aim of the project is to implement features (from the
90 ''primitive'' level) of p\TeX as macros under \LuaTeX, so \LuaTeX-ja is
91 much affected by p\TeX. However, as the development proceeds, some
92 technical/conceptual difficulties are arisen. Hence we changed the aim
95 \item\emph{\LuaTeX-ja offers more flexibility of typesetting than that by
98 We think that the ability of producing outputs conformed to
99 JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for
100 typesetting, is not enough; if one wants to produce very
101 incoherent outputs for some reason, it should be possible.
102 In this point, previous attempts of Japanese typesetting with \LuaTeX\
103 which we cited in the previous subsection are inadequate.
105 p\TeX has some flexibility of typesetting, by changing internal
106 parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using
107 custom JFM (Japanese TFM). ...
109 \item\emph{\LuaTeX-ja isn't mere re-implementation or porting of p\TeX;
110 some (technically and/or conceptually) inconvenient features of
111 p\TeX\ are modified.}
113 We describe this point in more detail at the next section.
117 \subsection{Contents of this Paper}
118 Here we describe the contents of the rest of this paper briefly. In
119 Section~2, we describe major differences between p\TeX\ and \LuaTeX-ja,
120 which is introduced. Some of them are due to specifications of callbacks
121 in \LuaTeX\ (\emph{i.e.}, technical reason), and others are which we
122 thought which are better to be changed, for ``natural''
123 specifications. In Section~3, we show the current status of the
126 For implementing features into \LuaTeX-ja, we had to use some tricks in
127 Lua scripts. In Section~4, we describe several these tricks and
128 internal processing methods. We hope that the materials in this section
129 have good applications.
131 \subsection*{About the Project}
132 This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
134 \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage}. There is
135 no stable version at Oct.\ 6, 2011, but the development source can be
136 obtained from the git repository.
137 Members of the project are as follows (in random order):
138 Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
139 Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda, and~Shuzaburo Saito.
142 \section{Major differences with \pTeX}
143 In this section, we briefly look at ** major differences between p\TeX\
144 and \LuaTeX-ja. For general information of Japanese typesetting and the
145 facts about p\TeX, please see Okumara~\cite{ptexjp}.
148 \subsection{Names of Control Sequences}
150 Since p\TeX\ is a engine modification of Knuth's original \TeX82 engine,
151 some primitives added in it takes a form that cannot be simulated by a
152 macro. For example, an additional primitive
153 |\prebreakpenalty|$\langle\hbox{\it
154 char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in p\TeX\
155 sets the amount of penalty inserted before $\langle\hbox{\it
156 char\_code}\rangle$ to $\langle\hbox{\it penalty}\rangle$, and
157 |\prebreakpenalty|$\langle\hbox{\it char\_code}\rangle$ can be also used
158 for retrieving the value.
160 Moreover, there are some parameters for Japanese typesetting which were
161 mere internal integers, dimensions, or~skips in p\TeX\ that cannot be
162 implemented by same approaches in \LuaTeX-ja. These parameters have a
163 common point; the values at the end of a horizontal box or that of a
164 paragraph are effective in whole box or paragraph. A good example of
165 them is |\kanjiskip|, the default amount of a skip which will be
166 inserted between two consecutive Japanese characters by default. The
167 reason of this is the place of |hpack_filter| in the \LuaTeX's
168 CWEB-source code, and we will discuss on it in
169 Subsection~\ref{ssec-stack}.
171 From above 2~problems we discussed above, the assignment and retrieval
172 of most parameters in \LuaTeX-ja are summarized into 3~control sequences:
174 \item |\ltjsetparameter{|$\langle\hbox{\it
175 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
177 \item |\ltjglobalsetparameter|: for global assignment. These two control
178 sequences obey the value of |\globaldefs| primitive.
179 \item |\ltjgetparameter{|$\langle\hbox{\it
180 name}\rangle$|}[{|$\langle\hbox{\it optional
181 argument}\rangle$|}]|: for retrieval. The returned value is always
185 \subsection{Line break after a Japanese Character}
188 Japanese texts can break lines almost everywhere, in contrast with
189 alphabetic texts can break lines only between words (or use
190 hyphenation). Hence, p\TeX's input processor is modified so that a
191 line break after a Japanese character doesn't emit a space. However,
192 there is no way to customize the input processor of \LuaTeX, other than
193 hack its CWEB-source. All we can do is to modify an input line before
194 when \LuaTeX\ begin to process it, inside the |process_input_buffer|
197 Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this
198 purpose) will be appended to an input line, if this ends with a Japanese
199 character\footnote{Strictly speaking, it also requires that the catcode
200 of the end-line character is 5~(\emph{end-of-line}). This condition is
201 useful under the verbatim environment.}. One might jump to a conclusion
202 that the treatment of a line break by p\TeX\ and that of \LuaTeX-ja is
203 totally same, but they are different in the respect that \LuaTeX-ja's
204 judgement whether a comment letter will be appended the line is done
205 \emph{before} the line is actually processed by \LuaTeX.
207 Figure~\ref{fig-linebreak} shows an example; the command at the first
208 line marks most of Japanese characters as ``non-Japanese character''. In
209 other words, from this command onward, the letter `あ' will be treated
210 as an alphabetic character by \LuaTeX-ja. Then, it is natural to occur a
211 space between `あ' and `y' in the output, where the actual output in the
212 figure does not so. This is because `あ' is considered to be a Japanese
213 character by \LuaTeX-ja, when \LuaTeX-ja does a decision whether U+FFFFF
214 will be added to the input line~2.
218 \ltjsetparameter{jacharrange={-6}}xあ
221 \caption{A notable sample showing the treatment of a line break after a
222 Japanese character.}\label{fig-linebreak}
225 \subsection{Separation between ``real'' fonts and Metrics}
228 Traditionally, most Japanese fonts used in typesetting are not proportional,
229 that is, most glyphs have same size (in most cases,
230 square-shaped). Hence, it is not rare that the contents of different
231 JFMs are totally same, and only differ in their names. For example, the
232 difference between |min10.tfm| and |goth10.tfm|, which are JFMs shipped
233 with p\TeX\ for seriffed \emph{mincho} family and sans-seriffed
234 \emph{gothic} family, are their |FAMILY| and |FACE| only. Moreover,
235 |jis.tfm| and |jisg.tfm|, which consists a parts of \emph{jis} font
236 metric which is used in Haruhiko Okumura's
237 \emph{jsclasses}~\cite{jsclasses}, are totally same as binary files.
238 Another example is: if one want to use many fonts which are not
239 installed in his \TeX\ distribution, of course he needs to prepare TFMs
240 for them. But, as long as he wants to use Japanese fonts with p\TeX, he
241 has to only copy and rename some JFM (\emph{e.g.},~copy |jis.tfm| to
244 Considering this situation, we decided to separate ``real'' fonts and
245 metrics in \LuaTeX-ja, as shown in Figure~\ref{fig-jfdef};
247 \item a control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
248 \item \LuaTeX-ja automatically loads the |luaotfload| package, so
249 |file:| prefix and features can be used as the line~1 in
250 Figure~\ref{fig-jfdef}.
251 \item The |jfm| key specifies the metric for the font. In
252 Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a
253 Lua script named |jfm-ujis.lua|. This metric is the standard
254 metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf}
256 \item The |psft:| prefix can be used to specify name-only, non-embedded
259 We note that |-kern| in features is important, since if kerning
260 information from real font itself will clash with spacing from the
265 \jfont\foo=file:ipaexm.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt
266 \jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt
268 \caption{Typical declarations of Japanese fonts.}
272 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Timing}
275 As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
276 process is totally different from that of \TeX82.
277 \TeX82's process is done just when a (sequence of) character is appended
278 to current list. Thus we can interrupt this process by writing |f{}irm|
279 (this gives `f\hbox{}irm' in \TeX82). However, \LuaTeX's process is
280 \emph{node-based}, that is, the process will be done when a horizontal
281 box of a paragraph is ended, so |f{}irm| and |firm| yield the same
282 output under \LuaTeX.
284 The situation for Japanese characters is basically same, but not
285 entirely. Glues (and kerns) those will be needed for Japanese
286 typesetting will be divided into the following three categories:
288 \item[Glue (or Kern) from the Metric of Japanese Fonts]
289 \item[Default Glue Between a Japanese Character and an Alphabetic Character]
290 Usually 1/4 of full-width with some stretch and shrink for justifying
292 \item[Default Glue Between Two Consecutive Japanese Characters]
293 The main reason of this glue is to enable line-breaking almost
294 everywhere in Japanese texts. In most cases, its natural
296 some stretch/shrink for justifying each line.
298 In p\TeX, these three kinds of glues are treated differently. The first
299 category (\emph{JFM glue}, for short) is inserted when a (sequence of)
300 Japanese character is appended to current list, same as alphabetic
301 characters in \TeX82. This means that one can interrupt the insertion
302 process by saying |{}|. The second category (\emph{xkanjiskip}, for
303 short) is inserted just before `hpack' or line-breaking of a paragraph;
304 this timing is somewhat similar to that of \LuaTeX's kerning
305 process. The third category (\emph{kanjiskip}, for short) is not
306 appeared as a node anywhere; only appears implicitly in calculation of
307 the width of a horizontal box or that of breaking lines. These
308 specifications made p\TeX's behavior very hard to understand.
310 \LuaTeX-ja inserts glues in all three categories simultaneously inside
311 |hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of
312 this specification are to behave like alphabetic characters in \LuaTeX\
313 (as described in the first paragraph), and to clarify the specification
314 for \LuaTeX-ja's process.
316 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: the Spec}
318 \caption{Examples of differences between p\TeX\ and \LuaTeX-ja,}
321 \begin{tabular}{llllllll}
323 &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\
324 Input &|あ】{}【〙\/〘| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule
325 p\TeX &あ】\hbox{}【〙\hbox{}〘&い』\/a &う)\hbox{}( &え]\hbox{}[\\
326 \LuaTeX-ja &あ】{}【〙\/〘 &い』\/a &う)\hbox{}( &え]\special{}[\\
334 \fontsize{40}{40}\selectfont
336 \imagfm{\jstrut 】\inhibitglue}%
337 \imagfm{\jstrut\kern.5\zw}%
338 \imagfm{\jstrut\kern.5\zw}%
339 \imagfm{\jstrut\hbox{}\inhibitglue【}%
340 \imagfm{\jstrut 〙\inhibitglue}%
341 \imagfm{\jstrut\kern.5\zw}%
342 \imagfm{\jstrut\kern.5\zw}%
343 \imagfm{\jstrut \hbox{}\inhibitglue〘}%
345 \caption{Detail of (1) in Table~\ref{tab-jfmglue}.}
349 Now we will take a look inside the insertion process itself, and describe three points.
353 As noted in the previous subsection, the insertion process in p\TeX\ is
354 interrupted by saying |{}| or anything else. This leads the
355 second row in Table~\ref{tab-jfmglue}, or
356 Figure~\ref{fig-ptexjfm}. ``The process is interrupted''
357 means that p\TeX\ does not think the letter `】\inhibitglue'
358 is followed by `\inhibitglue【', hence two half-width glues
359 are inserted between between `】\inhibitglue' and
360 `\inhibitglue【', where one is from `】\inhibitglue' and
361 another is from `\inhibitglue【'.
363 On the other hand, in \LuaTeX-ja, the process is done inside
364 |hpack_filter| and |pre_linebreak_filter| callbacks. Hence,
365 \emph{anything that does not make any nodes will be
366 ignored,}\ in \LuaTeX-ja, as shown in (1) in
367 Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes
368 which does not make any contribution to current horizontal
369 list---\emph{ins\_node}, \emph{adjust\_node},
370 \emph{mark\_node}, \emph{whatsit\_node} and
371 \emph{penalty\_node}---, as shown in (4).
373 By the way, around a \emph{glyph\_node} $p$ there may be some nodes
374 attached to $p$. These are an accent and kerns for
375 positioning it, and kerns from italic correction for $p$, and
376 it is natural that these attachments should be ignored in the
377 process. Hence \LuaTeX-ja takes this approach, as the latest
378 version of p\TeX\ (p3.2). This explains (2) in the figure.
382 \item[Fonts with the Same Metric]
383 Recall that \LuaTeX-ja separated ``real'' fonts and metrics, as in Subsection~\ref{ssec-sepmet}.
384 Consider the following input, where all Japanese fonts
385 use same metric (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family:
391 If the above input is processed by p\TeX, since the insertion process is
392 interrupt by |\gt|, the result looks like
394 \mc 明朝)\hbox{}\gt (ゴシック
396 But this is unnatural, since two Japanese fonts in the output uses the
397 same metric, \emph{i.e.}, the same typesetting rule. Hence, we decided
398 that Japanese fonts with the same metric are treated as one font in the
399 insertion process of \LuaTeX-ja. Thus, the output from the above input
404 One might have the situation that this specification is not
405 suitable. \LuaTeX-ja offers a way to cope with this case, but
406 we leave it to the manual~\cite{man} of \LuaTeX-ja.
408 \item[Fonts with Different Metrics]
409 In the case where two Japanese characters with different metrics and/or
410 different size is similar. Consider the following input where
411 the \emph{mincho} fmaily and the \emph{gothic} family use
418 As he previous point, this input yields an output like the following by p\TeX:
420 \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大
422 We thought that amounts of spaces between parentheses in above
423 output. So we changed the default behavior of \LuaTeX-ja that
424 the amount of a glue between two Japanese characters with
425 different metrics is the average of a glue from the left
426 character and that from the right character. For example,
427 Figure~\ref{fig-diffmet} shows the output from above
428 input. The width of glue indicated `①' is half-width , and
429 the width of glue indicated `②' is about 0.55 times of
430 fullwidth. This default behavior can be changed by
431 |diffrentmet| parameter of \LuaTeX-ja.
435 \fontsize{40}{40}\selectfont
437 \imagfm{\jstrut )\inhibitglue}%
438 \imagfm{\jstrut\hbox to .5\zw{\hss\Large ①\hss}}%
439 \imagfm{\jstrut\hbox{}\inhibitglue\gt (}%
440 \imagfm{\jstrut\gt 漢}%
441 \imagfm{\jstrut\gt )\inhibitglue}%
442 \imagfm{\jstrut\hbox to .55\zw{\hss\Large ②\hss}}%
443 \imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\hbox{}\inhibitglue (}%
444 \imagfm{\fontsize{48}{48}\selectfont\jstrut\gt 漢}%
446 \caption{Fonts with Different Metrics.}
452 \section{Current Status of the Development}
453 At the moment, \LuaTeX-ja can be used under plain \TeX, and under
454 \LaTeXe. Generally speaking, one has to read |luatexja.sty|, by |\input|
455 command or |\usepackage|~(\LaTeXe) if you merely want to typeset
456 Japanese character. We look more detail by parts.
458 \subsection{``Engine Extension''}
459 The lowest part of \LuaTeX-ja corresponds the p\TeX\ extension as
460 \emph{\TeX\ engine}. We, the project menbers, think that this part is almost
461 done. Other features of \LuaTeX-ja which we have not described are the
464 \item[Setting the range of ``Japanese characters''] This feature is
465 inspired by up\TeX. up\TeX\ has an additional primitive named
466 |\kcatcode| for setting a character is treated as alphabetic
467 character, \emph{kana}, \emph{kanji}, \emph{Hangul},
468 or~\emph{other CJK character}, and the assignment of
469 |\kcatcode| can be done by a block of Unicode\footnote{There
470 are some exceptions. For example, U+FF00--FFEF (Halfwidth and
471 Fullwidth Forms) are divided into three blocks in up\TeX.}.
473 \LuaTeX-ja uses a slightly different approach. Because there are many
474 Unicode blocks in Basic Multilingual Plane which are not
475 included in most Japanese fonts, ... Furthermore, the basic
476 Japanese character set JIS~X~0208 are not just union of
477 Unicode blocks. For example, the intersection of JIS~X~0208
478 and Latin-1 Supplement is shown in Table~\ref{tab-inter}.
479 Considering these two points, to customize the range of
480 Japanese characters in \LuaTeX-ja, one must follow the
483 \item Assign a range number to character codes. For example, the following
484 input assigns the number~10 to a unicode block ``Halfwidth and
485 Fullwidth Forms'' and ``\char"A7'' (the Section Sign):
488 \ltjdefcharrange{10}{"FF00-"FFEF,"A7}
491 \item Assigning to \textsf{jacharrange} ...
494 \item[Baseline Shifting]
495 In order to make a match between Japanese fonts and alphabetic fonts,
496 sometimes shifting the baseline of alphabetic characters is
497 needed. p\TeX\ has a dimension |\ybaselineshift|, which
498 corresponds the amount of shifting the baseline of alphabetic
501 \LuaTeX-ja extends p\TeX's |\ybaselineshift| to Japanese
502 characters. Namely, \LuaTeX-ja offers two parameters,
503 \emph{yjabaselineshift} and \emph{yalbaselineshift} for the
504 amount of shifting the baseline of Japanese characters and
505 that of alphabetic characters, respectively. The example
506 output is shown in Figure~\ref{fig-bls}. The left half is the
507 output when \emph{yjabaselineshift} is positive, hence the
508 baseline of Japanese characters is shifted down. On the other
509 hand, the right half is the output when
510 \emph{yalbaselineshift} is positive, hence the baseline of
511 alphabetic characters is shifted.
515 \fontsize{40}{40}\selectfont\fboxsep0mm
516 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
517 \hbox to 0.9\linewidth{%
519 \raise-10pt\imagfm{\jstrut 漢}%
520 \raise-10pt\imagfm{\jstrut 字}\hskip.25\zw%
525 \imagfm{\jstrut 字}\hskip.25\zw%
526 \raise-10pt\imagfm{p}%
527 \raise-10pt\imagfm{h}%
532 \caption{Baseline shifting.}
537 Note that \LuaTeX-ja doesn't support for vertical typesetting, \emph{tategaki}, for now.
540 \caption{Intersection of JIS~X~0208 and Latin-1 Supplement.}
543 \begin{tabular}{llll}
556 \subsection{Patches for plain \TeX\ and \LaTeXe}
557 p\TeX\ has patches for plain \TeX, namely |ptex.tex|, that for \LaTeXe\
558 macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and
559 |kinsoku.tex| which includes the default setting of \emph{kinsoku
560 shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except
561 the codes related to vertical typesetting. We remark two points related to the porting:
563 \item[Default Range of Japanese Characters]
564 As described in the previos subsection, \LuaTeX-ja can customize the
565 range of Japanese characters. \LuaTeX-ja predefines 8~character ranges,
566 as shown in Table~\ref{tab-chrrng}. Almost of these ranges are just the
567 union of Unicode blocks, and determined from the Adobe-Japan1 character
568 set, and JIS~X~0208. And, among these 8~ranges, the ranges~2, 3, 6, 7,
569 and~8 are considered ranges of Japanese characters, and others are
570 considered ranges of alphabetic characters.
572 This default setting is suitable for Japanese-based documents, but it
573 causes that other packages with Unicode fonts do not work
574 correctly. For example, |\times| provided by the
575 |unicode-math| package is the character U+00D7, which belongs
576 to the range~8, and ...
577 , the |fontspec| package, ...
581 \caption{Predefined Ranges in \LuaTeX-ja}
584 \begin{tabular}{@{\bf}rl}
585 1&(Additional) Latin characters which is not belonged in the range~8.\\
586 2&Greek and Cyrillic letters.\\
587 3&Punctuations and miscellaneous symbols.\\
588 4&Unicode blocks which does not intersect with Adobe-Japan1.\\
589 5&Surrogates and supplementary private use Areas.\\
590 6&Characters used in Japanese typesetting.\\
591 7&Characters possibly used in CJK typesetting, but not in Japanese.\\
592 8&Characters in Table~\ref{tab-inter}.
598 \item[The behavior of\/ {\tt\char92fontfamily\/} command]
599 The |\fontfamily| command in p\LaTeXe\ changes the current alphabetic
600 font family and/or the current Japanese font family,
601 depending the argument. More concretely,
602 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
603 current alphabetic font family to $\langle\hbox{\it
604 arg\/}\rangle$, if and only if one of the following
605 conditions are satisfied:
607 \item Alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
608 \emph{some} alphabetic encoding already defined in the document.
609 \item There exists an alphabetic encoding $\langle\hbox{\it
610 enc\/}\rangle$ already defined in the document such that a font
611 definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
612 arg\/}\rangle$|.fd| exists.
614 The same criterion is used for changing Japanese font family.
616 To make possible this behavior, one has to create a list of already
617 defined alphabetic encodings. Hence it works in p\LaTeXe, ...
618 However, since \LuaTeX-ja is loaded as a package, it will not
619 work. Hence \LuaTeX-ja adopted different approach, namely
620 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
621 current alphabetic font family to $\langle\hbox{\it
622 arg\/}\rangle$, if and only if:
624 \item Alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
625 the current alphabetic encoding $\langle\hbox{\it enc\/}\rangle$.
626 \item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
627 arg\/}\rangle$|.fd| exists.
635 \subsection{Classes for Japanese Documents}
636 To produce ``high-quality'' Japanese documents, we need not only that
637 Japanese characters are correctly placed, but also class files for
638 Japanese documents. In p\TeX, there are two major families of classes:
639 \emph{jclasses} which is distributed with the official p\LaTeXe\ macros,
640 and \emph{jsclasses}~\cite{jsclasses} which has developed by Haruhiko
641 Okumura and now widely used in Japanese \TeX\ users. At the present,
642 \LuaTeX-ja simply contains their counterparts: \emph{ltjclasses} and
643 \emph{ltjsclasses}. However, the policy on classess is not determined
644 now, and we hope to have another family of classes which are useful in
645 commercial printing. In the author's opinion, \emph{ltjclasses} is
646 better to stay as an example of porting of class files for \pTeX\ to
650 \section{Implementation}
651 \subsection{Handling of Japanese Fonts}
652 In p\TeX, there are three slots for maintaining current fonts, namely
653 |\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal
654 direction) and |\tfont| for Japanese font (in vertical direction). With
655 these slots, we can select the current font for alphabetic characters
656 and that for Japanese characters separately in p\TeX. However, \LuaTeX\
657 has only one slot for maintaining the current font, as \TeX82. This
658 situation leads a problem: how can we maintain the ``current Japanese
661 There are three approaches for this problem. One approach is to make a
662 mapping table from alphabetic fonts to corresponding Japanese fonts
663 (here we don't assume that NFSS2 is available), and when current
664 alphabetic font is changed, the current Japanese font also changes
665 according to the table. Another approach is that we always use
666 composite fonts with alphabetic fonts and Japanese fonts. The third
667 approach is that the information of the current Japanese font is stored
668 in an attribute. We adopted the third approach, since \LuaTeX-ja is much
669 affected by p\TeX\ as we noted in Subsection~\ref{ssec-pol}.
671 As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining
672 Japanese font, as p\TeX. However, since the information of the current
673 Japanese font is stored into an attribute, control sequences defined by
674 |\jfont| (\emph{e.g.},~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
675 not representing a font by the means of original \TeX. In other words,
676 these control sequence cannot be an argument of |\the| or |\textfont|,
677 and they are just an assignments to an attribute, in fact.
680 \subsection{Overview of the Processes}
681 Now we describe an outline of the \LuaTeX-ja's process briefly.
683 \item[Treatment of Linebreaks after Japanese Characters] This part is
684 described already at Subsection~\ref{ssec-line}. Done in the
685 |process_input_buffer| callback.
686 \item[Font Replacement] In the |hyphenate| callback, we looks into for
687 each \textit{glyph\_node}~$p$. If its character is considered
688 to be a Japanese character, the font used in $p$ is replaced
689 by the value of |\ltj@curjfnt| that is associated
690 with~$p$. Also we subtract the subtype of $p$ by 1 to
691 suppress hyphenation around it by \LuaTeX, since later
692 processes of \LuaTeX-ja take care of all things about
696 Following processes are all executed in |pre_linebreak_filter| and
697 |hpack_filter| callback. These are main routines of \LuaTeX-ja:
700 \item[Examination of Stack Level] We traverse the horizontal list which
701 is the content of a horizontal box
702 to determine what is the level of \LuaTeX-ja's internal stack in the end
703 of the list. This is needed because of the place of
704 |hpack_filter| in the source of \LuaTeX. We will discuss more
705 detail in Subsection~\ref{ssec-stack}.
707 \item[Insertion of Glues/Kerns for Japanese Typesetting]
708 This part is already described at Subsection~\ref{ssec-jglue}.
710 \item[Adjustument of the Places of (Japanese) Characters]
711 Under \LuaTeX-ja, the size of the virtual body of a Japanese character
712 and its position (\emph{i.e.}, offset) are determined by the
713 metric, since the optimal width of a character in
714 typesetting---in most cases, this is specified width in the
715 metric---and the actual width in TrueType/Opentype fonts
716 often differ. For example, the width the fullwidth open brace
717 `\inhibitglue {' is considered to be half-width in
718 typesetting, although this character is full-width in
719 TrueType fonts like IPA~Mincho.
721 To adjust size/places of Japanese characters, \LuaTeX-ja encapsules a
722 \textit{glyph\_node} which containing a Japanese character
723 into a horizontal box which size is specified in the metric.
724 We will discuss more detail in Subsection~\ref{ssec-width}.
727 \subsection{Stack Management}
730 As we noted on Subsection~\ref{ssec-csname}, parameters that the values
731 at the end of a horizontal box or that of a paragraph are effective in
732 whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented by internal integers or
733 registers of other types in \TeX. We explain it in this section.
743 if (cur_list.mode_field == -hmode) {
744 cur_box = filtered_hpack(cur_list.head_field,
745 cur_list.tail_field, saved_value(1),
746 saved_level(1), grp, saved_level(2));
747 subtype(cur_box) = HLIST_SUBTYPE_HBOX;
750 \caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX}
754 Figure~\ref{fig-ltsrc} is an expert of a CWEB-source
755 \texttt{tex/packaging.w} of \LuaTeX\ (version?). This function is called
756 just when explicit |\hbox{...}| or |\vbox{...}| is ended, and the
757 function |filtered_hpack()| is where the |hpack_filter| and then the
758 `hpack' process is performed. Notice that the |unsave()| function is
759 called before |filtered_hpack()|. This is the problem; because of
760 |unsave()|, we can only the values of registers outside the box, even in
761 the |hpack_filter| callback.
763 To cope with this problem, \LuaTeX-ja has its own stack system, based on
764 Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose
765 \emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be
766 appended to the current horizontal list each time the current stack
767 level is incremented, and their values are the values of
768 |\currentgrouplevel| at that time. In the beginning of |hpack_filter|
769 callback, the list in question is traversed to determine whether the
770 stack level at the end of the list and that outside the box coincides.
772 Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current
773 stack level, both inside the |hpack_filter| callback. Then we have:
775 \item A \emph{stack\_node} whose value is $x+1$ (since all materials in
776 the box are included in a group |\hbox{...}|) in the list
777 represents an assignment related to the stack system in just
778 top-level of the list, like
781 \hbox{...(assignment)...}
784 In this case, the current stack level is incremented to $y+1$ after the assignment.
785 \item A \emph{stack\_node} whose value is more than $x+1$ in the list represents
786 an assignment inside another group contained in the box. For example,
787 the following input creates
788 a \emph{stack\_node} whose value is more than $x+3=(x+1)+2$:
791 \hbox{...{...{...(assignment)}...}...}
795 Thus, we can conclude that the stack
796 level at the end of the list is $y+1$, if and only if there is a
797 \emph{whatsit} node whose \emph{user\_id} is 30112 and whose value is
798 $x+1$. Otherwise, the stack level is just $y$.
800 \subsection{Adjustment Of the Place of Japanese Characters}
804 \section*{Acknowledgements}
807 %%% The style of the bibiliogrphy is `amsplain'.
808 \providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
809 \providecommand{\href}[2]{#2}
810 \begin{thebibliography}{99}
813 %Donald E.~Knuth, \emph{The \TeX book}, Addison-Wesley, 1986.
816 ASCII MEDIA WORKS, \textbf{アスキー日本語\TeX\ (p\TeX)}\ (in
817 Japanese). \url{http://ascii.asciimw.jp/pb/ptex/}
820 %Victor Eijkhout, \emph{\TeX\ by Topic, A \TeX nician's Reference}, Addison-Wesley, 1992. \url{http://www.cs.utk.edu/~eijkhout/texbytopic-a4.pdf}
823 Hironori Kitagawa, \textbf{LuaTeXで日本語}\ (in
824 Japanese). \url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378}
826 \bibitem{luajalayout}
827 Kazuki Maeda\ (前田一貴), \textbf{luajalayout パッケージ —LuaLaTeX によ
828 る日本語組版—}\ (in Japanese).
829 \url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/}
832 Atsuhito Kohda, \textbf{LuaTeXと日本語}\ (in
833 Japanese). \url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html}
836 Yannis Haralambous. \textbf{The Joy of LuaTeX}. \url{http://luatex.bluwiki.com/}
839 Shuzaburo Saito\ (齋藤修三郎), \textbf{Open Type Font用VF}\ (in Japanese).
840 \url{http://psitau.kitunebi.com/otf.html}
843 \textbf{The \LuaTeX reference}
846 Haruhiko Okumura\ (奥村晴彦), \textbf{pLaTeX2e 新ドキュメントクラス}\
848 Japanese). \url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/}
851 Haruhiko Okumura\ (奥村晴彦), \textbf{p\TeX\ and Japanese Typesetting},
852 The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51.
855 Jonathan Sauer, \textbf{[Dev-luatex] tex.currentgrouplevel}.
856 \url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html}
859 Yoshiki Otobe\ (乙部厳己), \textbf{min10フォントについて}\ (in japanese).
860 \url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf}
861 \end{thebibliography}