1 %#!lualatex ajt-devel-ltja
4 %%% Packages used in this paper
8 \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipaexm.ttf:jfm=ujis}{}
9 \DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipaexg.ttf:jfm=ujis}{}
11 %%% for LTXexample environment
12 \usepackage{showexpl,lltjlisting}
13 \lstset{basicstyle=\ttfamily, width=0.3\textwidth}
18 %%% Verbatim environment
20 \CustomVerbatimEnvironment{code}{Verbatim}%
21 {numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
22 \CustomVerbatimEnvironment{codewithoutnum}{Verbatim}%
23 {xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
24 \CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}%
25 {xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize}
28 %%% Mandatory article metadata %%%
29 \title{The development of \LuaTeX-ja package}
30 \author{Hironori Kitagawa}
31 \address{The \LuaTeX-ja project team}
32 \email{h\_kitagawa2001@yahoo.co.jp}
34 \keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese}
36 The \LuaTeX-ja package is a macro package for typesetting Japanese documents under \LuaTeX.
37 This packages has much flexibility of typesetting than p\TeX, and corrected some unwanted features of p\TeX.
38 In this paper, we describe specifications, the current status and some internal processing codes of \LuaTeX-ja.
41 \newcommand{\parname}[1]{\textsf{#1}}
45 %%% Do not forget to start with \maketitle!
48 \section{Introduction}
50 To typeset Japanese documents with \TeX, ASCII p\TeX~\cite{ptex} has
51 been widely used. There are other methods---for example, using Omega
52 and OTP~\cite{omegaj}, or with the CJK package---to do so, however,
53 these alternative methods did not became a majority. On the one hand,
54 p\TeX\ enables us to produce high-quality documents, but on the other
55 hand, p\TeX\ is left behind from the extensions of \TeX\ such as \eTeX\
56 and \pdfTeX, and the diffusion of UTF-8 encoding. In recent years, the
57 situation become better, because of the developments of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura,
58 $\varepsilon$-p\TeX~\cite{eptex} by the author,~and
59 up\TeX~\cite{uptex} by Takuji Tanaka.
61 However, there are still lag now.
64 Before this \LuaTeX-ja package, there were several attempts to typeset Japanese documents under \LuaTeX.
65 Here we cite three examples:
67 \item |luaums.sty|~\cite{luaums} developed by the author. This experimental package is for creating a Japanese-based presentation under \LuaTeX.
68 \item |luajalayout| package\cite{luajalayout}, formerly known as the |jafontspec| package, by Kazuki Maeda.
69 This package is based on \LaTeXe\ and |fontspec| package.
70 \item |luajp-test| package\cite{luajp-test}, a test package made by Atsuhito Kohda, based on articles on the web page~\cite{joylua}.
74 \subsection{Development Policy of \LuaTeX-ja}
76 The first aim of the project is to implement features of p\TeX as macros under \LuaTeX, so \LuaTeX-ja is much affected by p\TeX.
77 However, as the development proceeds, some technical/conceptual difficulties are arised. Hence we changed the aim of the project:
79 \item\emph{\LuaTeX-ja offers more flexibility of typesetting than that by
82 There is JIS~X~4051 Japanese Standard, ...
83 However, we think that the ability of producing outputs conformed to JIS~X~4051 is not enough for (Japanese) typesetting;
84 the \TeX\ engine can handle the case even when one wants to produce very incoherent output for some reason.
85 In this point, we can say p\TeX has some flexibility, hence \LuaTeX-ja should have more.
87 For this reason, \LuaTeX-ja has counterparts of additional primitives of p\TeX.
89 \item\emph{\LuaTeX-ja isn't mere re-implementation or porting of p\TeX;
90 some (technically and/or conceptually) inconvenient features of
93 We describe this point in more detail at the next section.
97 \subsection{Contents of this Paper}
98 Here we describe the contents of the rest of this paper briefly.
99 In Section~2, we describe major differences between p\TeX\ and \LuaTeX-ja,
100 which is introduced. Some of them are due to specifications of callbacks
101 in \LuaTeX\ (\emph{i.e.}, technical reason), and others are which we
102 thought which are better to be changed, for ``natural''
103 specifications. In Section~3, we show the current status of the \LuaTeX-ja project.
105 For implementing features into \LuaTeX-ja, we had to use some tricks in Lua scripts.
106 In Section~4, we describe several these tricks and internal processing methods.
107 We hope that the materials in this section, espcially
108 Subsection~4.3, have good applications.
111 \section{Major differences with \pTeX}
112 In this section, we breifly look at ** major differences between p\TeX\ and \LuaTeX-ja.
114 \subsection{Names of Control Sequences}
115 Since p\TeX\ is a modification of a \TeX\ engine, some primitives added in it takes a form that cannot be simulated by a macro.
116 For example, an additional primitive |\prebreakpenalty|$\langle\hbox{\it
117 char\_code}\rangle$|=|$\langle\hbox{\it penalry}\rangle$ in p\TeX\ sets the
118 amount of penalty inserted before $\langle\hbox{\it char\_code}\rangle$
119 to $\langle\hbox{\it penalry}\rangle$.
121 Moreover, there are some parameters for Japanese typesetting which were
122 mere internal integers, dimensions, or~skips in p\TeX\ that cannot be
123 implemented by same approaches in \LuaTeX-ja. These parameters have a
124 common feature; the values at the end of a horizontal box or that of a
125 paragraph are effective in whole box or apragraph. A good example of
126 them is |\kanjiskip|, the default amount of a skip which will be
127 inserted between two consecutive Japanese characters by default. The
128 reason of this is the place of |hpack_filter| in the \LuaTeX's
129 CWEB-source code, and we will discuss on it in
130 Subsection~\ref{ssec-stack}.
132 From above 2~problems we discussed above, the assignment and retrieval
133 of most parameters in \LuaTeX-ja are summarize into 3~control sequences:
135 \item |\ltjsetparameter{|$\langle\hbox{\it
136 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
138 \item |\ltjglobalsetparameter|: for global assignment.
139 \item |\ltjgetparameter{|$\langle\hbox{\it
140 name}\rangle$|}[{|$\langle\hbox{\it optional
141 argument}\rangle$|}]|: for retrieval. The returned value is always
145 \subsection{Linebreak after Japanese Character}
147 Japanese texts can linebreak almost everywhere, in contrast with
148 alphabetic texts can linebreak only between on words (or use
149 hyphenation). Hence, p\TeX's input processor is modified so that a
150 linebreak after a Japanese character doesn't emit a space. However,
151 there is no way to customize the input processor of \LuaTeX. All we can
152 do is that modify an input line before when \LuaTeX\ begin to process
153 it, using |process_input_buffer| callback.
155 Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this
156 purpose) will be appended to an input line, if this ends with a Japanese
157 character\footnote{Strictly speaking, it also requires that the catcode
158 of the endline character is 5.}. One might jump to a conclusion that the
159 treatment of linebreaks by p\TeX and that of \LuaTeX-ja is very similar,
160 but they are different in the respect that \LuaTeX-ja's judgement
161 whether a comment letter will be appended the line is done \emph{before}
162 the line is processed by \LuaTeX.
164 Figure~\ref{fig-linebreak} shows an example; the command at the first
165 line marks most of Japanese characters as ``non-Japanese character''. In
166 other words, from this command onward, the letter `あ' will not be
167 received any process of \LuaTeX-ja. Then, it is natural to occur a space
168 between `あ' and `y' in the output, where the actual output does not so.
169 This is because `あ' is considered to be a Japanese character by
170 \LuaTeX-ja, when \LuaTeX-ja look the end of the input line~2.
174 \ltjsetparameter{jacharrange={-6}}xあ
177 \caption{A notable sample showing the treatment of a linebreak after a Japanese character.}\label{fig-linebreak}
180 \subsection{Separation between ``real'' fonts and Metrics}
181 Traditionally, most Japanese fonts used in typesetting are monospaced,
182 that is, most glyphs have same size and square-shaped. Hence, it is not
183 rare that the contents of different Japanese TFMs (JFM, for short) are
184 totally same, and only differ in their names. For example, the
185 difference between |min10.tfm| and |goth10.tfm|, which are JFMs shipped
186 with p\TeX\ for \emph{Mincho} and \emph{Gothic} fonts, are their
187 |FAMILY| and |FACE| only. Another example is: if one want to use many
188 fonts which are not installed in his \TeX\ distribution, of course he
189 needs to prepare TFMs for them. But, as long as he wants to use Japanese
190 fonts with p\TeX, he has to only copy and rename some JFM (\emph{e.g.,}~copy |jis.tfm| to |hoge.tfm|).
192 Considering this situation, we decided to separate ``real'' fonts and
193 Metrics in \LuaTeX-ja, as shown in Figure~\ref{fig-jfdef};
195 \item a control sequence |\jfont| must be used for japanese fonts, instead of |\font|.
196 \item \LuaTeX-ja automatically loads the |luaotfload| package, so
197 |file:| prefix and features can be used as the line~1 in
198 Figure~\ref{fig-jfdef}.
199 \item The |jfm| key specifies the metric for the font. In
200 Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a Lua script named
202 \item The |psft:| prefix can be used to specify name-only, noembedded
205 We note that |-kern| in features is important, since if kerning information from real font itself will clash with spacing from the metric.
209 \jfont\foo=file:ipaexm.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt
210 \jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt
212 \caption{Typical declarations of Japanese fonts}
216 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: The Timing}
217 As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
218 process is totally different from that of Knuth's original \TeX82.
219 \TeX82's process is done just when a (sequence of) character node is
220 appended to current list, hence we can cancel this process by writing
221 |f{}irm| (this gives `f\hbox{}irm' in \TeX82). However, \LuaTeX's process is
222 \emph{node-based}, that is, the process will be done when a horizontal
223 box of a paragraph is ended, so |f{}irm| and |firm| yield the same
224 output under \LuaTeX.
226 The situation for Japanese characters is basically same, but not entirely.
227 Glues (and kerns) those will be needed for Japanese typesetting will be divided into the following three categories:
229 \item[Glues (and Kerns) from the Metric of Japanese Fonts]
230 \item[Default Skip Between a Japanese Character and an Alphabetic Character]
231 Usually 1/4 of fullwidth with some stretch and shrink.
232 \item[Default Skip Between Two Consecutive Japanese Characters]
234 In p\TeX, these three kinds of glues are treated differently. The first
235 category (\emph{JFM glue}, for short) is inserted when a Japanese
236 character node is appended to current list, same as alphabetic
237 characters in \TeX82. The second category (\emph{xkanjiskip}, for
238 short) is inserted just before `hpack' or line-breaking of a paragraph;
239 this timing is somewhat similar to that of \LuaTeX's kerning
240 process. The third category (\emph{kanjiskip}, for short) is not
241 appeared as a node anywhere; only appears implicitly in calculation of
242 width of a horizontal box of line-breaking. These specifications made
243 p\TeX's behavior very hard to understand.
245 \LuaTeX-ja does the insertion process of glues in all three categories
246 simultaneously, inside |hpack_filter| and |pre_linebreak_filter|
247 callbacks. The reasons of this specification are to behave like
248 alphabetic characters in \LuaTeX\ (as described in the first paragraph),
249 and to clarify the specification for \LuaTeX-ja's process.
252 \subsection{Insertion of Kerns and/or Glues for Japanese Typesetting: The Behavior}
256 \section{Current Status of the Development}
259 \section{Implementation}
260 \subsection{Handling of Japanese Fonts}
261 In p\TeX, there are three slots for maintaining current fonts, namely
262 |\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal
263 direction) and |\tfont| for Japanese font (in vertical direction). With these slots, we can select
264 current font for alphabetic characters and that for Japanese characters separately in p\TeX.
265 However, \LuaTeX\ has only one slot for maintaining current font, as \TeX82.
266 This situation leads a problem: how can we maintain ``current Japanese font''?
268 There are three approaches for this problem. One approach is to make a
269 mapping table from alphabetic fonts to corresponding Japanese fonts
270 (here we don't assume that NFSS2 is available), and when current
271 alphabetic font is changed, current Japanese font also changes according
272 to the table. Another approach is that we always use composite fonts
273 with alphabetic fonts and Japanese fonts. The third approach is that the
274 information of current Japanese font is stored in an attribute. We
275 adopted the third approach, since \LuaTeX-ja is much affected by p\TeX\
276 as we noted in Subsection~\ref{ssec-pol}.
281 \subsection{Overview of the Processes}
282 Now we describe an outline of the \LuaTeX-ja's process briefly.
284 \item[Treatment of Linebreaks after Japanese Characters] We described
285 this already at Subsection~\ref{ssec-line}. Registered in the
286 |process_input_buffer| callback.
287 \item[Font Replacement] In the |hyphenate| callback, we looks into for
288 each \textit{glyph\_node}~$p$. If its character is considered
289 to be a Japanese character, the font used in $p$ is replaced
290 by the value of |\ltj@curjfnt| that is associated
291 with~$p$. Also we subtract the subtype of $p$ by 1 to
292 suppress hyphenation around it by \LuaTeX, since later
293 processes of \LuaTeX-ja take care of all things about
297 Following processes are all executed in |pre_linebreak_filter| and |hpack_filter| callback. These are main routines of \LuaTeX-ja:
300 \item[Examination of Stack Level] We traverse the horizontal list which is the content of a horizontal box
301 to determine what is the level of \LuaTeX-ja's internal stack in the end
302 of the list. This is needed because of the place of
303 |hpack_filter| in the source of \LuaTeX. We will discuss more detail at Subsection~\ref{ssec-stack}.
305 \item[Insertion of Glues/Kerns for Japanese Typesetting]
306 This part is already described at Subsection~\ref{ssec-jglue}.
308 \item[Adjustument of Places of (Japanese) Characters]
309 Under \LuaTeX-ja, the size of the virtual body of a Japanese character
310 and its position (\emph{i.e.}, offset) are determined by the
311 metric, since the optimal width of a character in
312 typesetting---in most cases, this is specified width in the
313 metric---and the actual width in TrueType/Opentype fonts
314 often differ. For example, the width the fullwidth open brace
315 `\inhibitglue {' is considered to be half-width in
316 typesetting, although this character is full-width in
317 TrueType fonts like IPA~Mincho.
319 To adjust size/places of Japanese characters, \LuaTeX-ja encapsules a
320 \textit{glyph\_node} which containing a Japanese character
321 into a horizontal box which size is specified in the metric.
324 \subsection{Stack Management}
326 \LuaTeX-ja has a lot of parameters for Japanese typesetting.
328 \subsection*{About the Project}
329 \subsection*{Acknowledgements}
332 %%% The style of the bibiliogrphy is `amsplain'.
333 \providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
334 \providecommand{\href}[2]{#2}
335 \begin{thebibliography}{9}
338 %Donald E.~Knuth, \emph{The \TeX book}, Addison-Wesley, 1986.
341 ASCII MEDIA WORKS, \textbf{アスキー日本語\TeX\ (p\TeX)}\ (in Japanese). \url{http://ascii.asciimw.jp/pb/ptex/}
344 %Victor Eijkhout, \emph{\TeX\ by Topic, A \TeX nician's Reference}, Addison-Wesley, 1992. \url{http://www.cs.utk.edu/~eijkhout/texbytopic-a4.pdf}
347 Hironori Kitagawa, \textbf{LuaTeXで日本語}\ (in Japanese). \url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378}
349 \bibitem{luajalayout}
350 Kazuki Maeda\ (前田一貴), \textbf{luajalayout パッケージ —LuaLaTeX による日本語組版—}\ (in Japanese).
351 \url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/}
354 Atsuhito Kohda, \textbf{LuaTeXと日本語}\ (in Japanese). \url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html}
357 Yannis Haralambous. \textbf{The Joy of LuaTeX}. \url{http://luatex.bluwiki.com/}
360 \textbf{The \LuaTeX reference}
362 \end{thebibliography}