1 %#!lualatex ajt-devel-ltja
\r
4 %%% Packages used in this paper
\r
6 %%% Font setting for \LuaTeX; this is extract from ajt.cls
\r
9 \RequirePackage{fontspec,xunicode}
\r
10 \RequirePackage{luatextra}
\r
11 \setmainfont[Mapping=tex-text]{Palatino LT Std}
\r
12 \setsansfont[Mapping=tex-text]{Optima LT Std}
\r
14 \RequirePackage{fontspec,luatextra}
\r
15 \setmainfont[Mapping=tex-text]{TeX Gyre Pagella} % \simeq Palatino
\r
19 \usepackage{luatexja,luatexja-fontspec}
\r
20 \ltjsetparameter{jacharrange={-3,-8}}
\r
21 \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipam.ttf:jfm=ujis}{}
\r
22 \DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=ujis}{}
\r
23 % quick hack: monospaced Japanese font by \ttfamily
\r
24 \DeclareKanjiFamily{JY3}{\ttdefault}{}{}
\r
25 \DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=mono}{}
\r
27 %%% LTXexample environment
\r
28 \usepackage{showexpl,lltjlisting}
\r
29 \lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em}
\r
31 %%% Verbatim environment
\r
32 \usepackage{fancyvrb}
\r
33 \CustomVerbatimEnvironment{code}{Verbatim}%
\r
34 {numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
\r
35 \CustomVerbatimEnvironment{codewithoutnum}{Verbatim}%
\r
36 {xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small}
\r
37 \CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}%
\r
38 {xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize}
\r
39 \DefineShortVerb{\|}
\r
42 \usepackage{mflogo,booktabs}
\r
43 \definecolor{grayx}{gray}{0.85}
\r
50 %%% Mandatory article metadata %%%
\r
51 \title{Development of \LuaTeX-ja package}
\r
52 \author[北川 弘典]{Hironori Kitagawa}
\r
53 \address{\LuaTeX-ja project team}
\r
54 \email{h\_kitagawa2001@yahoo.co.jp}
\r
56 \keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese}
\r
58 \LuaTeX-ja package is a macro package for typesetting Japanese
\r
59 documents under \LuaTeX. The package has more flexibility of
\r
60 typesetting than \pTeX, which is widely used Japanese extension of \TeX,
\r
61 and has corrected some unwanted features of \pTeX.
\r
62 In this paper, we describe specifications, the current status and some
\r
63 internal processing methods of \LuaTeX-ja.
\r
66 \newcommand{\parname}[1]{\textsf{#1}}
\r
67 \newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp}
\r
68 \newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi%
\r
69 \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0
\r
70 \smash{\vrule width \wd0 height 0.4pt depth0.4pt}}}}
\r
73 %%% Do not forget to start with \maketitle!
\r
76 \section{Introduction}
\r
77 \subsection{History}
\r
78 To typeset Japanese documents with \TeX, ASCII \pTeX~\cite{ptex} has
\r
79 been widely used in Japan. There are other methods---for example, using
\r
80 Omega and OTP~\cite{omega}, or with the CJK package---to do so, however,
\r
81 these alternative methods did not become majority. The author thinks
\r
82 that this is because \pTeX\ enables us to produce high-quality documents
\r
83 (e.g.,~supporting vertical typesetting), and the appearance of \pTeX\ is
\r
84 earlier than that of alternatives described above.
\r
86 However, \pTeX\ has been left behind from the extensions of \TeX\ such
\r
87 as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In recent
\r
88 years, the situation has become better, by development of
\r
89 |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
\r
90 $\varepsilon$-\pTeX~\cite{eptex} by the author,~and u\pTeX~\cite{uptex}
\r
91 by Takuji Tanaka (田中琢爾). However, continuing this approach, namely,
\r
92 to develop an engine extension localized for Japanese, is not wise. This
\r
93 approach needs lots of work for \emph{each} engine. In addition, if we
\r
94 use \LuaTeX, the necessity of an engine extension is getting smaller
\r
95 because \LuaTeX\ has an ability to hook \TeX's internal process by using
\r
98 Before our \LuaTeX-ja project, there were several experimental attempts to typeset
\r
99 Japanese documents with \LuaTeX. Here we cite three examples:
\r
101 \item |luaums.sty|~\cite{luaums} developed by the author. This
\r
102 experimental package is for creating a certain Japanese-based presentation
\r
104 \item the \emph{luajalayout} package~\cite{luajalayout}, formerly known as the
\r
105 \emph{jafontspec} package, by Kazuki Maeda (前田一貴). This package is based on
\r
106 \LaTeXe\ and \emph{fontspec} package.
\r
107 \item the \emph{luajp-test} package~\cite{luajp-test}, a test package made by
\r
108 Atsuhito Kohda (香田温人), based on articles on the web page~\cite{joylua}.
\r
110 However, these packages are based on \LaTeXe, and do not have much
\r
111 ability to control the typesetting rule. And it is inefficient that more
\r
112 than one person 【one people だと,一つの民族と取られかねないので】 separately develop similar packages. Development of the
\r
113 \LuaTeX-ja package is started initially by the author and Kazuki Maeda, because of
\r
116 \subsection{Development policy of \LuaTeX-ja}
\r
118 The first aim of \LuaTeX-ja project was to implement features (from the
\r
119 `primitive' level) of \pTeX\ as macros under \LuaTeX, therefore \LuaTeX-ja is
\r
120 much affected by \pTeX. However, as development proceeded, some
\r
121 technical/conceptual difficulties arose. Hence we changed the aim
\r
122 of the project as follows:
\r
124 \item\emph{\LuaTeX-ja offers at least the same flexibility of
\r
125 typesetting that p\TeX\ has.}
\r
127 We are not satisfied with the ability of producing (PDF) outputs conformed to
\r
128 JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for
\r
129 typesetting, or to a technical note~\cite{w3c} by W3C;
\r
130 if one wants to produce very incoherent outputs for some reason, it
\r
131 should be possible.
\r
132 In this point, previous attempts of Japanese typesetting with \LuaTeX\
\r
133 which we cited in the previous subsection are inadequate.
\r
135 \pTeX\ has some flexibility of typesetting, by changing internal
\r
136 parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using
\r
137 custom JFM (Japanese TFM). Therefore we decided to include these
\r
138 functionality to \LuaTeX-ja.
\r
140 \item\emph{\LuaTeX-ja isn't mere re-implementation or porting of \pTeX;
\r
141 some (technically and/or conceptually) inconvenient features of
\r
142 \pTeX\ are modified.}
\r
144 We describe this point in more detail at the next section.
\r
148 \subsection{Overview of the processes}
\r
150 We describe an outline of \LuaTeX-ja's process in order.
\r
153 \item In the |process_input_buffer| callback: treatment of line-break
\r
154 after a Japanese character (in Subsection~\ref{ssec-line}).
\r
156 \item In the |hyphenate| callback: font replacement.
\r
158 \LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the horizontal list. If
\r
159 the character represented by $p$ is considered as a Japanese
\r
160 character, the font used at $p$ is replaced by the value of
\r
161 |\ltj@curjfnt|, an attribute for `the current Japanese font'
\r
164 Furthermore, the subtype of $p$ is subtracted by 1 to suppress
\r
165 hyphenation around~$p$ by \LuaTeX, because later processes of
\r
166 \LuaTeX-ja take care of all things about Japanese characters.
\r
168 \item In |pre_linebreak_filter| and |hpack_filter| callbacks:
\r
171 \item \LuaTeX-ja has its own stack system, and the current horizontal
\r
172 list is traversed in this stage to determine what the level of
\r
173 \LuaTeX-ja's internal stack at the end of the list is. We will
\r
174 discuss it in Subsection~\ref{ssec-stack}.
\r
176 \item In this stage, \LuaTeX-ja inserts glues/kerns for Japanese
\r
177 typesetting in the list. This is the core routine of \LuaTeX-ja.
\r
178 We will discuss it in Subsections
\r
179 \ref{ssec-jglue}~and~\ref{ssec-jspec} .
\r
181 \item To make a match between a metric and a real font, sometimes
\r
182 adjustment of the position of (Japanese) glyphs are performed.
\r
183 We will discuss it in Subsection~\ref{ssec-width}.
\r
185 \item In the |mlist_to_hlist| callback: treatment of Japanese characters
\r
186 in math formulas. This stage is similar to adjustment of the
\r
187 position of glyphs (see above), so we omit to describe this stage
\r
191 In this paper, a \emph{alphabetic character} means a non-Japanese
\r
192 character. Similarly, we use the word an \emph{alphabetic font} as the
\r
193 counterpart of a Japanese font.
\r
195 \subsection{Contents of this paper}
\r
196 Here we describe the contents of the rest of this paper briefly. In
\r
197 Section~\ref{sec:differences_with_ptex}, we describe major differences
\r
198 between \pTeX\ and \LuaTeX-ja. The next section,
\r
199 Section~\ref{sec:distinction_of_characters}, is concentrated on a
\r
200 problem how we distinguish between Japanese characters and alphabetic
\r
201 characters. In Section~\ref{sec:current_status}, we show current
\r
202 development status of the package. Finally, in
\r
203 Section~\ref{sec:implementation}, we describe some internal routines of
\r
206 \subsection{General information of the project}
\r
207 This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
\r
209 \url{http://sourceforge.jp/projects/luatex-ja/wiki/}. There is
\r
210 no stable version on October 22, 2011, however a set of developer sources can be
\r
211 obtained from the git repository. Members of the project team are as follows
\r
212 (in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
\r
213 Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda,
\r
214 and~Shuzaburo Saito.
\r
217 \section{Major differences with \pTeX}
\r
218 \label{sec:differences_with_ptex}
\r
219 In this section, we explain several major differences between \pTeX\
\r
220 and our \LuaTeX-ja. For general information of Japanese typesetting and the
\r
221 overview of \pTeX, please see Okumura~\cite{ptexjp}.
\r
224 \subsection{Names of control sequences}
\r
225 \label{ssec-csname} Because \pTeX\ is an engine modification of Knuth's
\r
226 original \TeX82 engine, some of the additional primitives take a form that is
\r
227 very difficult to be simulated by a macro. For example, an additional
\r
228 primitive |\prebreakpenalty|$\langle\hbox{\it
\r
229 char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in \pTeX\
\r
230 sets the amount of penalty inserted before a character whose code is
\r
231 $\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it
\r
232 penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it
\r
233 char\_code}\rangle$ can be also used for retrieving the value.
\r
235 Moreover, there are some internal parameters of \pTeX\ which values of them at the end of a
\r
236 horizontal box or that of a paragraph are valid in whole box or
\r
237 paragraph. However, the implementation of these parameters in
\r
238 \LuaTeX-ja is not so easy; we will discuss it in Subsection~\ref{ssec-stack}.
\r
240 From the two problems discussed above, the assignment and retrieval
\r
241 of most parameters in \LuaTeX-ja are summarized into the following
\r
242 three control sequences:
\r
244 \item |\ltjsetparameter{|$\langle\hbox{\it
\r
245 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
\r
247 \item |\ltjglobalsetparameter|: for global assignment. Note that these two control
\r
248 sequences obey the value of |\globaldefs| primitive.
\r
249 \item |\ltjgetparameter{|$\langle\hbox{\it
\r
250 name}\rangle$|}[{|$\langle\hbox{\it optional
\r
251 argument}\rangle$|}]|: for retrieval. The returned value is always
\r
255 \subsection{Line-break after a Japanese character}
\r
258 Japanese texts can break lines almost everywhere, in contrast with
\r
259 alphabetic texts can break lines only between words (or use
\r
260 hyphenation). Hence, \pTeX's input processor is modified so that a
\r
261 line-break after a Japanese character doesn't emit a space. However,
\r
262 there is no way to customize the input processor of \LuaTeX, other than
\r
263 to hack its CWEB-source. All a macro package can do is to modify an input line before
\r
264 when \LuaTeX\ begin to process it, inside the |process_input_buffer|
\r
267 Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this
\r
268 purpose) will be appended to an input line, if this line ends with a Japanese
\r
269 character.\footnote{Strictly speaking, it also requires that the catcode
\r
270 of the end-line character is 5~(\emph{end-of-line}). This condition is
\r
271 useful under the verbatim environment.} One might jump to a conclusion
\r
272 that the treatment of a line-break by \pTeX\ and that of \LuaTeX-ja are
\r
273 totally same, however they are different in the respect that \LuaTeX-ja's
\r
274 judgment whether a comment letter will be appended the line is done
\r
275 \emph{before} the line is actually processed by \LuaTeX.
\r
277 Figure~\ref{fig-linebreak} shows an example of this situation; the
\r
278 command at the first line marks most of Japanese characters as
\r
279 `non-Japanese characters'. In other words, from that command onward, the
\r
280 letter `あ' will be treated as an alphabetic character by
\r
281 \LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in
\r
282 the output, where the actual output 【of the first example /* 以下の figure 環境と連動*/】 in the figure does not so. This is
\r
283 because `あ' is considered a Japanese character by \LuaTeX-ja,
\r
284 when \LuaTeX-ja does the decision whether U+FFFFF will be added to the
\r
289 \font\x=IPAMincho \x
\r
290 \ltjsetparameter{jacharrange={-6}}xあ
\r
291 y%\qquad xあy\qquad x あ y%%%【とかにしないと,空白が空いているかどうかよくわからない】
\r
293 \caption{A notable sample showing the treatment of a line-break after a
\r
294 Japanese character.}\label{fig-linebreak}
\r
297 \subsection{Separation between `real' fonts and metrics}
\r
298 \label{ssec-sepmet}
\r
300 Traditionally, most Japanese fonts used in typesetting are not
\r
301 proportional, that is, most glyphs have same size (in most cases,
\r
302 square-shaped). Hence, it is not rare that the contents of different
\r
303 JFMs are essentially same, and only differ in their names. For example,
\r
304 |min10.tfm| and |goth10.tfm|, which are JFMs shipped with \pTeX\ for
\r
305 seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family,
\r
306 differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and
\r
307 |jisg.tfm|, which is included in the \emph{jis} font metric, which is
\r
308 used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦),
\r
309 are totally same as binary files. Considering this situation, we
\r
310 decided to separate `real' fonts and metrics used for them in
\r
311 \LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain
\r
312 \TeX\ are shown in Figure~\ref{fig-jfdef}. We would like to add several
\r
315 \item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
\r
316 \item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so
\r
317 \hbox{\tt file:} and \hbox{\tt name:} prefixes, and various font features can be
\r
318 used as the first line in Figure~\ref{fig-jfdef}.
\r
319 \item The |jfm| key specifies the metric for the font. In
\r
320 Figure~\ref{fig-jfdef}, |\foo| and |\bar| will use a metric stored in a
\r
321 Lua script named |jfm-ujis.lua|. This metric is the standard
\r
322 metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf}
\r
323 package~\cite{otf} (hence almost all characters are square-shaped).
\r
324 \item The \hbox{\tt psft:} prefix can be used to specify name-only, non-embedded
\r
325 fonts. When one displays a pdf with these fonts, actual fonts which
\r
326 will be used for them depend on a pdf reader.
\r
328 The specification of a metric for \LuaTeX-ja is similar to that of a JFM
\r
329 (see \cite{ptexjp}); characters are grouped into several classes, the
\r
330 size information of characters are specified for each class, and
\r
331 glue/kern insertions are specified for each pair of classes. Although
\r
332 the author have not tried, it may be possible to develop a program that
\r
333 `converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three
\r
334 metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the
\r
335 \emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|.
\r
337 Note that |-kern| in features
\r
338 is important, because kerning information from a real font itself will
\r
339 clash with glue/kern information from the metric.
\r
343 \jfont\foo=file:ipam.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt
\r
344 \jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt
\r
346 \caption{Typical declarations of Japanese fonts.}
\r
350 \subsection{Insertion of glues/kerns for Japanese typesetting: timing}
\r
353 As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
\r
354 processes are totally different from those of \TeX82. \TeX82's process is
\r
355 done just when a (sequence of) character is appended to the current
\r
356 list. Thus we can interrupt this process by writing as
\r
357 |f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the
\r
358 process will be done when a horizontal box or a paragraph is ended, so
\r
359 |f{}irm| and |firm| yield same outputs under \LuaTeX.
\r
361 The situation for Japanese characters is more complicated.
\r
362 Glues (and kerns) which are needed for Japanese
\r
363 typesetting are divided into the following three categories:
\r
365 \item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue},
\r
368 \item Default glue between a Japanese character and an alphabetic
\r
369 character (we say \emph{xkanjiskip}, for short), usually 1/4 of
\r
370 full-width (\emph{shibuaki}) with some stretch and shrink for
\r
371 justifying each line.
\r
372 \item Default glue between two consecutive Japanese characters
\r
373 (\emph{kanjiskip}, for short). The main reason of this glue is to
\r
374 enable breaking lines almost everywhere in Japanese texts. In most
\r
375 cases, its natural width is zero, and some stretch/shrink for
\r
376 justifying each line.
\r
378 In \pTeX, these three kinds of glues are treated differently. A JFM glue
\r
379 is inserted when a (sequence of) Japanese character is appended to the
\r
380 current list, same as the case of alphabetic characters in \TeX82. This
\r
381 means that one can interrupt the insertion process by saying |{}|. A
\r
382 \emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a
\r
383 paragraph; this timing is somewhat similar to that of \LuaTeX's kerning
\r
384 process. Finally, A \emph{kanjiskip} is not appeared as a node anywhere;
\r
385 only appears implicitly in calculation of the width of a horizontal box,
\r
386 that of breaking lines, and the actual output process to a DVI
\r
387 file. These specifications have made \pTeX's behavior very hard to
\r
390 \LuaTeX-ja inserts glues in all three categories simultaneously inside
\r
391 |hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of
\r
392 this specification are to behave like alphabetic characters in \LuaTeX\
\r
393 (as described in the first paragraph in this subsection), and to clarify
\r
394 the specification for \LuaTeX-ja's process.
\r
396 \subsection{Insertion of glues/kerns for Japanese typesetting: specification}
\r
400 \caption{Examples of differences between \pTeX\ and \LuaTeX-ja.}
\r
401 \label{tab-jfmglue}
\r
403 \begin{tabular}{llllllll}
\r
405 &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\
\r
406 Input &|あ】{}【〕\/〔| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule
\r
407 \pTeX &あ】\hbox{}【〕\hbox{}〔&い』\/a &う)\hbox{}( &え]\hbox{}[\\
\r
408 \LuaTeX-ja &あ】{}【〕\/〔 &い』\/a &う)\hbox{}( &え]\special{}[\\
\r
416 \fontsize{40}{40}\selectfont
\r
417 \imagfm{\jstrut あ}%
\r
418 \imagfm{\jstrut 】\inhibitglue}%
\r
419 \imagfm{\jstrut\kern.5\zw}%
\r
420 \imagfm{\jstrut\kern.5\zw}%
\r
421 \imagfm{\jstrut\inhibitglue【}%
\r
422 \imagfm{\jstrut 〕\inhibitglue}%
\r
423 \imagfm{\jstrut\kern.5\zw}%
\r
424 \imagfm{\jstrut\kern.5\zw}%
\r
425 \imagfm{\jstrut\inhibitglue〔}%
\r
427 \caption{Detail of the output of \pTeX\ in the input~(1) in Table~\ref{tab-jfmglue}.}
\r
428 \label{fig-ptexjfm}
\r
431 Now we will take a look at the insertion process itself through four points.
\r
433 \begin{description}
\r
434 \item[Ignored nodes]
\r
435 As noted in the previous subsection, the insertion process in \pTeX\ can
\r
436 be interrupted by saying |{}| or anything else.\footnote{This
\r
437 is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for
\r
438 \texttt{min10.tfm} and other `old' JFMs work.} This leads the
\r
439 second row in Table~\ref{tab-jfmglue}, or
\r
440 Figure~\ref{fig-ptexjfm}. Here `the process is interrupted'
\r
441 means that \pTeX\ does not think the letter `】\inhibitglue'
\r
442 is followed by `\inhibitglue【', hence two half-width glues
\r
443 are inserted between `】\inhibitglue' and `\inhibitglue【',
\r
444 where the left one is from `】\inhibitglue' and the right one
\r
445 is from `\inhibitglue【'.
\r
447 On the other hand, in \LuaTeX-ja, the process is done inside
\r
448 |hpack_filter| and |pre_linebreak_filter| callbacks. Hence,
\r
449 \emph{anything that does not make any node will be
\r
450 ignored}\ in \LuaTeX-ja, as shown in (1) in
\r
451 Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes
\r
452 which does not make any contribution to current horizontal
\r
453 list---\emph{ins\_node}, \emph{adjust\_node},
\r
454 \emph{mark\_node}, \emph{whatsit\_node} and
\r
455 \emph{penalty\_node}---, as shown in (4).
\r
458 By the way, around a \emph{glyph\_node} $p$ there may be some nodes
\r
459 attached to~$p$. These are an accent and kerns for
\r
460 moving it to the right place, and a kern from the italic
\r
461 correction\footnote{\TeX82 (and \LuaTeX) does not distinguish
\r
462 between explicit kern and a kern for italic correction. To
\r
463 distinguish them, an additional subtype for a kern is introduced
\r
464 in \pTeX. On the other hand, \LuaTeX-ja uses an additional attribute and
\r
465 redefines \texttt{\char`\\/} to set this attribute.} for $p$. It is natural that
\r
466 these attachments should be ignored inside the process. Hence
\r
467 \LuaTeX-ja takes this approach, as the latest version of
\r
468 \pTeX\ (version~p3.2). This explains (2) in the Table~\ref{tab-jfmglue}.
\r
470 Summarizing the above, one should put an empty horizontal box |\hbox{}| to
\r
471 where he/she wants to interrupt the insertion process in
\r
472 \LuaTeX-ja as (3) in the Table~\ref{tab-jfmglue}.
\r
474 \item[Fonts with the same metric]
\r
475 Recall that \LuaTeX-ja separates `real' fonts and metrics, as in Subsection~\ref{ssec-sepmet}.
\r
476 Consider the following input, where all Japanese fonts use same metric
\r
477 (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family for
\r
478 the current Japanese font family:
\r
484 If the above input is processed by \pTeX, because the insertion process is
\r
485 interrupt by |\gt|, the result looks like
\r
487 \mc 明朝)\hbox{}\gt (ゴシック
\r
489 However this seems to be unnatural, since two Japanese fonts in the
\r
490 output use the same metric, i.e.,~the same
\r
491 typesetting rule. Hence, we decided that Japanese fonts with
\r
492 the same metric are treated as one font in the insertion
\r
493 process of \LuaTeX-ja. Thus, the output from the above input
\r
494 in \LuaTeX-ja looks like:
\r
498 One might have the situation that this default behavior is not
\r
499 suitable. \LuaTeX-ja offers a way to handle this situation, but
\r
500 we leave it to the manual~\cite{man}.
\r
502 \item[Fonts with different metrics]
\r
503 The case where two adjacent Japanese characters use different metrics
\r
504 and/or different size is similar. Consider the following
\r
505 input where the \emph{mincho} family and the \emph{gothic}
\r
506 family use different metrics:
\r
512 As the previous paragraph, this input yields the following, by \pTeX:
\r
514 \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大
\r
516 We had thought that amounts of spaces between parentheses in the above output
\r
517 are too much. Hence we have changed the default behavior of
\r
518 \LuaTeX-ja, so that the amount of a glue between two Japanese
\r
519 characters with different metrics is the \emph{average} of a glue
\r
520 from the left character and that from the right
\r
521 character. For example, Figure~\ref{fig-diffmet} shows the
\r
522 output from the above input. The width of glue indicated `(1)' is
\r
523 $(a/2 + a/2)/2 = 0.5a$, and the width of glue indicated `(2)'
\r
524 is $(a/2 + 1.2a/2)/2 = 0.55a$. This default behavior can be
\r
525 changed by \textsf{differentmet} 【綴り間違い?】 parameter of \LuaTeX-ja.
\r
529 \fontsize{40}{40}\selectfont
\r
530 \imagfm{\jstrut\smash{%
\r
531 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr漢\cr
\r
532 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\
\r
533 \hrulefill\vrule height .5ex depth .5ex\cr}}}}%
\r
534 \imagfm{\jstrut )\inhibitglue}%
\r
535 \hbox to .5\zw{\hss\normalsize (1)\hss}%
\r
536 \imagfm{\jstrut\inhibitglue\gt (}%
\r
537 \imagfm{\jstrut\gt 漢}%
\r
538 \imagfm{\jstrut\gt )\inhibitglue}%
\r
539 \hbox to .55\zw{\hss\normalsize (2)\hss}%
\r
540 \imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}%
\r
541 \imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{%
\r
542 \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr
\r
543 \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$1.2a$}\
\r
544 \hrulefill\vrule height .5ex depth .5ex\cr}}}}
\r
546 \caption{Fonts with different metrics.}
\r
547 \label{fig-diffmet}
\r
550 \item[\emph{kanjiskip} and \emph{xkanjiskip}]
\r
551 In \pTeX, the value of \emph{xkanjiskip} is controlled by a skip named
\r
552 |\xkanjiskip|. A well-known defect of this implementation is
\r
553 that the value of \emph{xkanjiskip} is not connected with the
\r
554 size of the current Japanese font. It seems that |EXTRASPACE|,
\r
555 |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are
\r
556 reserved for specifying the default value of
\r
557 \emph{xkanjiskip} in a unit of the design size, but \pTeX\
\r
558 did not use these parameters, actually.
\r
560 Considering this situation of p\TeX, \LuaTeX-ja can use the value of
\r
561 \emph{xkanjiskip} that specified in a metric. If the value of
\r
562 \emph{xkanjiskip} on user side (this is the value of
\r
563 \textsf{xkanjiskip} parameter of |\ltjsetparameter|) is
\r
564 |\maxdimen|, then \LuaTeX-ja uses the specification from
\r
565 the current used metric as the actual value of
\r
566 \emph{xkanjiskip}. This description also applies for \emph{kanjiskip}.
\r
569 \section{Distinction of characters}
\r
570 \label{sec:distinction_of_characters} Since \LuaTeX\ can handle Unicode
\r
571 characters natively, it is a major problem that how we distinguish
\r
572 Japanese characters and alphabetic characters. For example, the
\r
573 multiplication sign (U+00D7) exists both in ISO-8859-1 (hence in Latin-1
\r
574 Supplement in Unicode) and in the basic Japanese character set
\r
575 JIS~X~0208. It is not desirable that this character is always treated as
\r
576 an alphabetic character, because this symbol is often used in the sense
\r
577 of `negative' in Japan.
\r
579 \subsection{Character ranges}
\r
580 Before we describe the approach taken in \LuaTeX-ja, we review the
\r
581 approach taken by u\pTeX. u\pTeX\ extends the |\kcatcode| primitive in
\r
582 \pTeX, to use this primitive for setting how a character is treated
\r
583 among alphabetic characters~(15), \emph{kanji}~(16), \emph{kana}~(17),
\r
584 【kanji は 16 で出てるけど,2 箇所出現?】 \emph{kanji}, \emph{Hangul}~(17), or~\emph{other CJK characters}~(18).
\r
585 The assignment to |\kcatcode| can be done by a Unicode
\r
586 block.\footnote{There are some exceptions. For example, U+FF00--FFEF
\r
587 (Halfwidth and Fullwidth Forms) are divided into three blocks in recent
\r
590 \LuaTeX-ja adopted a different approach. There are many Unicode blocks
\r
591 in Basic Multilingual Plane which are not included in
\r
592 Japanese fonts, therefore it is inconvenient if we process by a Unicode
\r
593 block. Furthermore, JIS~X~0208 are not just union of Unicode
\r
594 blocks; for example, the intersection of JIS~X~0208 and
\r
595 Latin-1 Supplement is shown in
\r
596 Table~\ref{tab-inter}. Considering these two points, to
\r
597 customize the range of Japanese characters in \LuaTeX-ja, one
\r
598 has to define ranges of character codes in his/her source in advance.
\r
602 \caption{Intersection of JIS~X~0208 and Latin-1 Supplement.}
\r
605 \begin{tabular}{llll}
\r
606 \ltjjachar"A7 (U+00A7),&
\r
607 \ltjjachar"A8 (U+00A8),&
\r
608 \ltjjachar"B0 (U+00B0),&
\r
609 \ltjjachar"B1 (U+00B1),\\
\r
610 \ltjjachar"B4 (U+00B4),&
\r
611 \ltjjachar"B6 (U+00B6),&
\r
612 \ltjjachar"D7 (U+00D7),&
\r
613 \ltjjachar"F7 (U+00F7)
\r
619 We note that \LuaTeX-ja offers two additional control sequences,
\r
620 |\ltjjachar| and |\ltjalchar|. They are similar to |\char|
\r
621 primitive, however |\ltjjachar| always yields a Japanese character, provided that
\r
622 the argument is more than or equal to 128, and |\ltjalchar| always
\r
623 yields an alphabetic character, regardless of the argument.
\r
625 \subsection{Default setting of ranges}
\r
626 Patches for plain \TeX\ and \LaTeXe\ of \LuaTeX-ja predefine eight character
\r
627 ranges, as shown in Table~\ref{tab-chrrng}. Almost of these ranges are
\r
628 just the union of Unicode blocks, and determined from the Adobe-Japan1-6
\r
629 character collection~\cite{aj16}, and JIS~X~0208. Among these eight ranges,
\r
630 the ranges~2, 3, 6, 7, and~8 are considered ranges of Japanese
\r
631 characters, and others are considered ranges of alphabetic
\r
632 characters.\footnote{Note that ranges 3~and~8 are considered ranges of
\r
633 alphabetic characters in this paper.} We remark on ranges 2~and~8:
\r
634 \begin{description}
\r
636 JIS~X~0208 includes Greek letters and Cyrillic letters, however, these
\r
637 letters cannot be used for typesetting Greek or Russian, of
\r
638 course. Hence it is reasonable that Greek letters and
\r
639 Cyrillic consist another character range.
\r
640 \item[The range~8]
\r
641 If one wants to use 8-bit TFMs, such as T1 or TS1 encodings, he should
\r
642 mark this range~8 as a range of alphabetic characters by
\r
644 |\ltjsetparameter{jacharrange={-8}}|
\r
646 This is because some 8-bit TFMs have a glyph in this range; for example,
\r
647 the character `\OE' is located at |"D7| in the T1 encoding. %"
\r
652 \caption{Predefined ranges in \LuaTeX-ja.}
\r
655 \begin{tabular}{@{\bf}rl}
\r
656 1&(Additional) Latin characters which are not belonged in the range~8.\\
\r
657 2&Greek and Cyrillic letters.\\
\r
658 3&Punctuations and miscellaneous symbols.\\
\r
659 4&Unicode blocks which does not intersect with Adobe-Japan1-6.\\
\r
660 5&Surrogates and supplementary private use Areas.\\
\r
661 6&Characters used in Japanese typesetting.\\
\r
662 7&Characters possibly used in CJK typesetting, but not in Japanese.\\
\r
663 8&Characters in Table~\ref{tab-inter}.
\r
668 \subsection{Control sequences producing Unicode characters}
\r
669 \label{ssec-unichar}
\r
671 The \emph{fontspec} package\footnote{Preciously saying, it is the
\r
672 \emph{xunicode} package, originally a package for \XeTeX and
\r
673 automatically loaded by the \emph{fontspec} package.} offers various
\r
674 control sequences that produce Unicode characters. However, these
\r
675 control sequences as it stands cannot work correctly with the default
\r
676 range setting of \LuaTeX-ja. For example, |\textquotedblleft| is just
\r
677 an abbreviation of |\char"201C\relax|, and the character U+201C (LEFT %"
\r
678 DOUBLE QUOTATION MARK) is treated as an Japanese character, because it
\r
679 belongs to the range~3. This problem is resolved by using |\ltjalchar|
\r
680 instead of the |\char| primitive. It is included in an optional package
\r
681 named \texttt{luatexja-\penalty0fontspec.sty}. Figure~\ref{fig-unitxt}
\r
682 shows several ways to typeset a character, both as a Japanese character
\r
683 and as an alphabetic characters.
\r
687 ×, \char`×, % depend on range setting
\r
688 \ltjalchar`×, % alphabetic char
\r
689 \ltjjachar`×, % Japanese char
\r
690 \texttimes % alph. char (by fontspec)
\r
692 \caption{Control sequences producing a Unicode character.}
\r
696 The situation looks similar in math formulas, but in fact it differs.
\r
697 Each control sequence that represents an ordinary symbol defined by the
\r
698 \emph{unicode-math} package is just synonym of a character. For example,
\r
699 the meaning of |\otimes| is just the character U+2297 (CIRCLED TIMES),
\r
700 which is included in the range~3. However, it is difficult to define a
\r
701 control sequence like |\ltjalUmathchar| as a counterpart of
\r
702 |\Umathchar|, since an input like `|\sum^\ltjalUmathchar ...|' has to be
\r
705 However, we couldn't develop a satisfactory solution to this problem in
\r
706 time for this paper, due to a lack of time. We are just testing a
\r
709 \item \LuaTeX-ja has a list of character codes which will be always treated as
\r
710 alphabetic characters in math mode. Considering 8-bit TFMs for
\r
711 math symbols, this list includes natural numbers between |"80| and
\r
713 \item Redefine internal commands defined in the \emph{unicode-math}
\r
715 codes of characters which are mentioned in the \emph{unicode-math}
\r
716 package will be included in the list.
\r
720 We would like to extend treatments described in this subsection to 8-bit
\r
721 font encodings, but we leave it to further development too.
\r
723 \section{Current status of development}
\r
724 \label{sec:current_status}
\r
725 At the moment, \LuaTeX-ja can be used under plain \TeX, and under
\r
726 \LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by
\r
727 |\input| command or |\usepackage| (in~\LaTeXe), if you merely want to
\r
728 typeset Japanese characters. We look more details by parts.
\r
730 \subsection{`Engine extension'}
\r
731 The lowest part of \LuaTeX-ja corresponds to the \pTeX\ extension as
\r
732 \emph{an engine extension of \TeX}\@. 【なんとなく,ピリオド直後の空白が小さく見える.補正で直るか?】 We, the project members, think that
\r
733 this part is almost done. There is one more feature of \LuaTeX-ja which
\r
734 we are going to explain:
\r
736 \begin{description}
\r
737 \item[Shifting baseline]
\r
738 In order to make a match between Japanese fonts and alphabetic fonts,
\r
739 sometimes shifting the baseline of alphabetic characters may
\r
740 be needed. \pTeX\ has a dimension |\ybaselineshift|, which
\r
741 corresponds to the amount of shifting down the baseline of alphabetic
\r
742 characters. This is useful for Japanese-based documents, but
\r
743 not for documents mainly in languages with alphabetic
\r
746 Hence, \LuaTeX-ja extends \pTeX's |\ybaselineshift| to Japanese
\r
747 characters. Namely, \LuaTeX-ja offers two parameters,
\r
748 \textsf{yjabaselineshift} and \textsf{yalbaselineshift}, for the
\r
749 amount of shifting the baseline of Japanese characters and
\r
750 that of alphabetic characters, respectively.
\r
753 \fontsize{40}{40}\selectfont\fboxsep0mm
\r
754 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
\r
755 \hbox to 0.9\linewidth{%
\r
757 \raise-10pt\imagfm{\jstrut 漢}%
\r
758 \raise-10pt\imagfm{\jstrut 字}\hskip.25\zw%
\r
762 \imagfm{\jstrut 漢}%
\r
763 \imagfm{\jstrut 字}\hskip.25\zw%
\r
764 \raise-10pt\imagfm{p}%
\r
765 \raise-10pt\imagfm{h}%
\r
770 \caption{First example of shifting baseline.}
\r
776 \fontsize{30}{30}\selectfont\fboxsep0mm
\r
777 \vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth
\r
778 \hbox to 0.9\linewidth{%
\r
781 \imagfm{b}\hskip.25\zw%
\r
782 \imagfm{\jstrut 本}%
\r
783 \imagfm{\jstrut 文}\hskip.33333\zw%
\r
784 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut\inhibitglue (}%
\r
785 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 注}%
\r
786 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 釈}\hskip.1666667\zw%
\r
787 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont c}%
\r
788 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont o}%
\r
789 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
\r
790 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}%
\r
791 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont e}%
\r
792 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont n}%
\r
793 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont t}%
\r
794 \raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut )\inhibitglue}%
\r
796 \imagfm{\jstrut 本}%
\r
797 \imagfm{\jstrut 文}%
\r
802 \caption{Second example of shifting baseline.}
\r
806 An example output is shown in Figure~\ref{fig-bls}. The left half is the
\r
807 output when \textsf{yjabaselineshift} is positive, hence the
\r
808 baseline of Japanese characters is shifted down. On the other
\r
809 hand, the right half is the output when
\r
810 \textsf{yalbaselineshift} is positive, hence the baseline of
\r
811 alphabetic characters is shifted down. Figure~\ref{fig-small}
\r
812 shows an interesting use of these parameters.
\r
815 Note that \LuaTeX-ja doesn't support vertical typesetting, \emph{tategaki}, for now.
\r
817 \subsection{Patches for plain \TeX\ and \LaTeXe}
\r
818 \pTeX\ has a patch for plain \TeX, namely |ptex.tex|, that for \LaTeXe\
\r
819 macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and
\r
820 |kinsoku.tex| which includes the default setting of \emph{kinsoku
\r
821 shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except
\r
822 the codes related to vertical typesetting, because \LuaTeX-ja doesn't
\r
823 support vertical typesetting yet. We remark one point related to the
\r
825 \begin{description}
\r
827 \item[Behavior of\/ {\tt\char92fontfamily\/}]
\r
828 The control sequence |\fontfamily| in p\LaTeXe\ changes the current alphabetic
\r
829 font family and/or the current Japanese font family,
\r
830 depending the argument. More concretely,
\r
831 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
\r
832 current alphabetic font family to $\langle\hbox{\it
\r
833 arg\/}\rangle$, if and only if one of the following
\r
834 conditions are satisfied:
\r
836 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
\r
837 \emph{some} alphabetic encoding is already defined in the document.
\r
838 \item There exists an alphabetic encoding $\langle\hbox{\it
\r
839 enc\/}\rangle$ already defined in the document such that a font
\r
840 definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
\r
841 arg\/}\rangle$|.fd| (all lowercase) exists.
\r
843 The same criterion is used for changing Japanese font family.
\r
845 To work this behavior well, it is required that a list of all (alphabetic) encodings defined
\r
846 already in the document. However, since \LuaTeX-ja
\r
847 is loaded as a package, \LuaTeX-ja cannot have this list.
\r
848 Hence \LuaTeX-ja adopted a different approach, namely
\r
849 |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
\r
850 current alphabetic font family to $\langle\hbox{\it
\r
851 arg\/}\rangle$, if and only if:
\r
853 \item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$
\r
854 in the current alphabetic encoding $\langle\hbox{\it
\r
855 enc\/}\rangle$ is already defined in the document.
\r
856 \item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it
\r
857 arg\/}\rangle$|.fd| (all lowercase) exists.
\r
865 \subsection{Classes for Japanese documents}
\r
866 To produce `high-quality' Japanese documents, we need not only that
\r
867 Japanese characters are correctly placed, but also class files for
\r
868 Japanese documents. Two major families of classes are widely used in Japan:
\r
869 \emph{jclasses} which is distributed with the official p\LaTeXe\ macros,
\r
870 and \emph{jsclasses}. At the present, \LuaTeX-ja
\r
871 simply contains their counterparts: \emph{ltjclasses} and
\r
872 \emph{ltjsclasses}. However, the policy on classes is not determined
\r
873 now, and we hope to have another family of classes which are useful for
\r
874 commercial printing. In the author's opinion, \emph{ltjclasses} is
\r
875 better to stay as an example of porting of class files for \pTeX\ to
\r
878 \subsection{Patches for packages}
\r
879 Apart from patches for the \LaTeXe~kernel and classes for Japanese
\r
880 documents, we need to make patches for several packages. At the present,
\r
881 we considered the following packages, and made patches or porting for
\r
882 the former two packages.
\r
884 \begin{description}
\r
885 \item[The \emph{fontspec} package] The \emph{fontspec} package is built
\r
886 on NFSS2, hence control sequences offered by the
\r
887 \emph{fontspec} package, such as |\setmainfont|, are only
\r
888 effective for alphabetic fonts if \LuaTeX-ja is loaded.
\r
889 \texttt{luatexja-\penalty0fontspec.sty} (not automatically
\r
890 loaded) offers these counterparts for Japanese fonts, with
\r
891 additional `j' in the name of control sequences, such as
\r
892 |\setmainjfont|. As described in
\r
893 Subsection~\ref{ssec-unichar}, it also includes a patch for
\r
894 control sequences producing Unicode characters.
\r
896 \item[The \emph{otf} package]
\r
897 This package is widely used in \pTeX\ for typesetting characters which is
\r
898 not in JIS~X~0208, and for using more than one weight in \emph{mincho}
\r
899 and \emph{gothic} font families. Therefore \LuaTeX-ja supports features
\r
900 in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty}
\r
901 manually. Note that characters by |\UTF{}| and
\r
902 |\CID{}| are not appended to the current list as a
\r
903 \emph{glyph\_node}, to avoid from callbacks by the
\r
904 \emph{luaotfload} package. We have another remark; |\CID|
\r
905 does not work with TrueType fonts, since |\CID| uses the
\r
906 conversion table between CID and the glyph order of the
\r
907 current Japanese font.
\r
909 \item[The \emph{listings} package]
\r
910 It is known for users of \pTeX\ that there is a patch |jlisting.sty| for
\r
911 the \emph{listings} package, to use Japanese characters in
\r
912 the |lstlisting| environment. Generally speaking, it also can
\r
913 be used in \LuaTeX-ja. However, it seems to be that a
\r
914 Japanese character after a space does not receive any process
\r
915 of the \emph{listings} package; this is inconvenient when we
\r
916 use the \emph{showexpl} package.
\r
918 There is another way to use over 256 characters with the
\r
919 \emph{listings} package (described in \cite{apl}). However,
\r
920 this method is not suitable for Japanese, since the number of
\r
921 Japanese characters is very large. We hope that the
\r
922 \emph{listings} package will be able to handle all characters above
\r
923 256 without any patch, in the future.
\r
930 \section{Implementation}
\r
931 \label{sec:implementation}
\r
932 \subsection{Handling of Japanese fonts}
\r
933 In \pTeX, there are three slots for maintaining current fonts, namely
\r
934 |\font| for alphabetic fonts, |\jfont| for Japanese fonts (in horizontal
\r
935 direction) and |\tfont| for Japanese fonts (in vertical direction). With
\r
936 these slots, we can manage the current font for alphabetic characters
\r
937 and that for Japanese characters separately in \pTeX. However, \LuaTeX\
\r
938 has only one slot for maintaining the current font, as \TeX82. This
\r
939 situation leads a problem: how can we maintain the `current Japanese
\r
942 There are three approaches for this problem. One approach is to make a
\r
943 mapping table from alphabetic fonts to corresponding Japanese fonts
\r
944 (here we don't assume that NFSS2 is available). Another approach is
\r
945 that we always use composite fonts with alphabetic fonts and Japanese
\r
946 fonts. The third approach is that the information of the current
\r
947 Japanese font is stored in an attribute. We adopted the third approach,
\r
948 since \LuaTeX-ja is much affected by \pTeX\ as we noted in
\r
949 Subsection~\ref{ssec-pol}.
\r
951 As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining
\r
952 Japanese fonts, as \pTeX. However, because the information of the current
\r
953 Japanese font is stored into an attribute, control sequences defined by
\r
954 |\jfont| (e.g.,~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
\r
955 not representing a font by the means of \TeX82. In other words, each of
\r
956 these control sequences is just an assignment to an attribute, therefore
\r
957 they cannot be an argument of |\the|, |\fontname|, nor |\textfont|.
\r
960 Callbacks by the \emph{luaotfload} package, e.g.,~replacement of glyphs
\r
961 according to OpenType font features, are performed just after `Examination of
\r
962 stack level' (see Subsections
\r
963 \ref{ssec-over}~and~\ref{ssec-stack}). Also note that calculation of
\r
964 character classes for each Japanese character is done \emph{after} the
\r
965 these callbacks for now.
\r
967 \subsection{Stack management}
\r
970 As we noted in Subsection~\ref{ssec-csname}, parameters that the values
\r
971 at the end of a horizontal box or that of a paragraph are valid in
\r
972 whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented
\r
973 by internal integers or registers of other types in \TeX. We explain it
\r
974 in this subsection.
\r
978 void package(int c)
\r
984 if (cur_list.mode_field == -hmode) {
\r
985 cur_box = filtered_hpack(cur_list.head_field,
\r
986 cur_list.tail_field, saved_value(1),
\r
987 saved_level(1), grp, saved_level(2));
\r
988 subtype(cur_box) = HLIST_SUBTYPE_HBOX;
\r
991 \caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX.}
\r
995 Figure~\ref{fig-ltsrc} is an extract of a CWEB-source
\r
996 \texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function
\r
997 is called just when an explicit |\hbox{...}| or |\vbox{...}| is ended, and
\r
998 the function |filtered_hpack()| is where the |hpack_filter| and then the
\r
999 actual `hpack' process are performed. Notice that the |unsave()|
\r
1000 function is called before |filtered_hpack()|. This is the problem;
\r
1001 because of |unsave()|, we can retrieve only the values of registers
\r
1002 \emph{outside} the box, even in the |hpack_filter| callback.
\r
1004 To cope with this problem, \LuaTeX-ja has its own stack system, based on
\r
1005 Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose
\r
1006 \emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be
\r
1007 appended to the current horizontal list each time the current stack
\r
1008 level is incremented, and their values are the values of
\r
1009 |\currentgrouplevel| at that time. In the beginning of the |hpack_filter|
\r
1010 callback, the list in question is traversed to determine whether the
\r
1011 stack level at the end of the list and that outside the box coincides.
\r
1013 Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current
\r
1014 stack level, both inside the |hpack_filter| callback, i.e.,~outside a
\r
1015 horizontal box. Consider a list which represents the content of the box,
\r
1018 \item A \emph{stack\_node} whose value is $x+1$ (because all materials
\r
1019 in the box are included in a group |\hbox{...}|, the value of
\r
1020 |\currentgrouplevel| inside the box is at least $x+1$) in the list
\r
1021 corresponds to an assignment related to the stack system in just
\r
1022 top-level of the list, like
\r
1025 \hbox{...(assignment)...}
\r
1028 In this case, the current stack level is incremented to $y+1$ after the assignment.
\r
1029 \item A \emph{stack\_node} whose value is more than $x+1$ in the list corresponds
\r
1030 to an assignment inside another group contained in the box. For example,
\r
1031 the following input creates
\r
1032 a \emph{stack\_node} whose value is $x+3=(x+1)+2$:
\r
1035 \hbox{...{...{...(assignment)}...}...}
\r
1039 Thus, we can conclude that the stack level at the end of the list is
\r
1040 $y+1$, if and only if there is a \emph{stack\_node} whose value is
\r
1041 $x+1$. Otherwise, the stack level is just $y$.
\r
1043 \subsection{Adjustment of the position of Japanese characters}
\r
1044 \label{ssec-width}
\r
1046 The size of a glyph specified in a metric and that of a real font
\r
1047 usually differ. For example, the letter `\inhibitglue【' is half-width
\r
1048 in |jfm-ujis.lua| or |jis.tfm|, while this letter is full-width like `【'
\r
1049 in most TrueType fonts used in Japanese typesetting, such as
\r
1050 IPA~Mincho. Hence the adjustment of position of such glyphs is
\r
1051 needed. In the context of \pTeX, this process was performed using virtual fonts.
\r
1053 On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph
\r
1054 into a horizontal box. There are two main reasons why we adopted this
\r
1055 method; one is that we feared Lua codes for coexisting with callbacks by
\r
1056 the |luaotfload| package would be large if we use virtual fonts, and the
\r
1057 other is to cope with shifting of the baseline of characters at the
\r
1061 \begin{center}\unitlength=9pt\small
\r
1062 \begin{picture}(15,12)(-1,-3)
\r
1064 \color{grayx}% real glyph
\r
1065 \put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
\r
1067 \color{black}% real glyph :step1
\r
1069 \put(-1,-1.5){\line(0,1){7}\line(0,-1){2.5}}
\r
1070 \put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}}
\r
1071 \put(-1,5.5){\line(1,0){6}}
\r
1072 \put(-1,-4){\line(1,0){6}}
\r
1073 \put(-1,0){\makebox(0,0)[r]{\strut$R$\,}}
\r
1076 \put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}}
\r
1077 \put(12,9){\makebox(0,0)[rt]{\strut$M$\,}}
\r
1078 \put(12,0){\line(0,1){9}\vector(0,-1){3}}
\r
1079 \put(0,9){\line(1,0){12}}
\r
1080 \put(0,-3){\line(1,0){12}}
\r
1081 \put(0.2,4.5){\makebox(0,0)[l]{\texttt{height}}}
\r
1082 \put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}}
\r
1083 \put(6,0.2){\makebox(0,0)[b]{\texttt{width}}}
\r
1086 \put(3,0){\line(0,1){7}\line(0,-1){2.5}\line(1,0){6}}
\r
1087 \put(9,0){\line(0,1){7}\line(0,-1){2.5}}
\r
1088 \put(3,7){\line(1,0){6}}
\r
1089 \put(3,-2.5){\line(1,0){6}}
\r
1090 \newsavebox{\eqdist}
\r
1091 \savebox{\eqdist}(0,0)[c]{%
\r
1093 \put(-0.08,0.2){\line(0,-1){0.4}}%
\r
1094 \put(0.08,0.2){\line(0,-1){0.4}}}
\r
1095 \put(1.5,0){\usebox{\eqdist}}
\r
1096 \put(10.5,0){\usebox{\eqdist}}
\r
1099 \put(3,-1.5){\vector(-1,0){4}}
\r
1100 \put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}}
\r
1101 \put(3,0){\vector(0,-1){1.5}}
\r
1102 \put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}}
\r
1105 \caption{The position of the `real' glyph.}
\r
1109 Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is
\r
1110 the imaginary body specified in the metric, and a vertical
\r
1111 rectangle is the imaginary body of a real glyph. First, the real glyph
\r
1112 is aligned with respect to the width of $M$. In the figure, the real
\r
1113 glyph is aligned `middle'; this setting is useful for the full-width
\r
1114 middle dot `・'. We have other settings, `left' and `right'.
\r
1115 Furthermore, it is shifted according to the value of |left| and |down|,
\r
1116 which are specified in the metric, for fine adjustment.
\r
1117 The final position of the real glyph
\r
1118 is shown by the gray rectangle~$R$. If the amount of shifting the baseline is
\r
1119 not zero, $M$ (and hence the real glyph) is shifted by that amount.
\r
1121 We would like to remark briefly on the vertical position of a real
\r
1122 glyph. A JFM (or a metric used in \LuaTeX-ja) and a real font used for
\r
1123 it may have different height or depth. In that case, it may look better
\r
1124 if the real glyph is shifted vertically to match the height-depth ratio
\r
1125 specified in the metric, while any vertical adjustment except the
\r
1126 adjustment by the |down| value does not performed in the present
\r
1127 implementation of \LuaTeX-ja . This situation is carefully studied by
\r
1128 Otobe~\cite{min10}. Here the policy on this problem is not determined
\r
1129 now, however we would like to offer several solutions in future
\r
1133 \subsection{Further notes on metrics for \LuaTeX-ja}
\r
1134 \label{ssec-jfmnote}
\r
1135 \begin{description}
\r
1136 \item[Proportional typesetting]
\r
1137 Some fonts are proportional, that is, each glyphs in those fonts have
\r
1138 its own width. An example of proportional fonts is
\r
1139 IPA~P~Mincho. Using these fonts in \pTeX\ is very
\r
1140 hard, since one needs to make a dedicated JFM for a real font.
\r
1142 \LuaTeX-ja supports these proportional fonts; specifying the |width| of
\r
1143 a character class in a metric to |"prop"| makes the width of
\r
1144 each character in this class that of a glyph in a real font.
\r
1145 If no JFM glue is needed, one simply has to use |jfm-prop.lua|. The
\r
1146 following is an example:
\r
1147 \begin{LTXexample}
\r
1148 \jfont\pr=file:ipamp.ttf:jfm=prop at 3.25mm
\r
1152 \item[Scaling by metrics]
\r
1153 Because of virtual fonts, even if one specifies to use |min10.tfm| or
\r
1154 |jis.tfm| at 10\,pt in \pTeX, the actual size of real fonts used in
\r
1155 dviwares for these JFMs are 9.62216\,pt. Hence, for
\r
1156 example, if one wants to use 3.25\,mm Japanese
\r
1157 fonts and 10\,pt alphabetic fonts in \pTeX,
\r
1158 he/she needs to scale a Japanese font by
\r
1160 \frac{3.25\,\mathrm{mm}}{10\,\mathrm{pt}\cdot 0.962216}\simeq 0.961
\r
1162 in declarations of Japanese fonts.
\r
1164 \LuaTeX-ja didn't support such scaling of glyphs by metrics, so one has
\r
1165 to adjust the size argument for |\jfont| manually. Continuing
\r
1166 the previous example, for using 3.25\,mm Japanese
\r
1167 fonts and 10\,pt alphabetic fonts in \LuaTeX-ja,
\r
1168 he/she needs to scale a Japanese font by
\r
1169 3.25\,mm${}/{}$10\,pt${}\simeq{}$0.92487.
\r
1172 \section{Conclusion}
\r
1173 We have discussed about our \LuaTeX-ja package, which is much affected
\r
1174 by \pTeX. For now, it can be used for experimental use, however there
\r
1175 are much refinements which are needed for regular use. The author hopes
\r
1176 that this paper and \LuaTeX-ja project contribute the typesetting Japanese,
\r
1177 and possibly other Asian languages, under \LuaTeX.
\r
1179 \section*{Acknowledgements}
\r
1180 The author would like to thank Ken Nakano and Hideaki Togashi for their
\r
1181 development and management of ASCII \pTeX. The author is very grateful to Haruhiko
\r
1182 Okumura for his leadership in the Japanese \TeX\ community. The author
\r
1183 is also very grateful to members of \LuaTeX-ja project team for their
\r
1184 valuable cooperation in development.
\r
1186 %%% The style of the bibiliogrphy is `amsplain'.
\r
1187 \providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
\r
1188 \providecommand{\href}[2]{#2}
\r
1189 \begin{thebibliography}{99}
\r
1192 Adobe Systems Incorporated, \emph{Adobe-Japan1-6 Character Collection
\r
1193 for CID-Keyed Fonts}, Technical Note~\#5078, 2004.
\r
1194 \newblock\url{http://partners.adobe.com/public/developer/en/font/5078.Adobe-Japan1-6.pdf}
\r
1197 ASCII MEDIA WORKS, アスキー日本語\TeX\ (\pTeX). \newblock\url{http://ascii.asciimw.jp/pb/ptex/}
\r
1200 John Baker, \emph{Typesetting UTF8 APL code with the \LaTeX\ lstlisting package}.
\r
1201 \newblock\url{http://bakerjd99.wordpress.com/2011/08/15/}
\r
1204 Jin-Hwan~Cho and Haruhiko Okumura, \emph{Typesetting CJK Languages with Omega},
\r
1205 \TeX, XML, and Digital Typography, Lecture Notes in Computer Science, vol.~3130,
\r
1206 Springer, 2004, 139--148.
\r
1209 Yannis Haralambous. \emph{The Joy of \LuaTeX}. \newblock\url{http://luatex.bluwiki.com/}
\r
1211 \bibitem{jisx4051}
\r
1212 Japanese Industrial Standards Committee. \emph{JIS~X~4051: Formatting
\r
1213 rules for Japanese documents}, 1993, 1995, 2004.
\r
1216 北川弘典, $\varepsilon$-\pTeX についてのwiki.
\r
1217 \newblock\url{http://sourceforge.jp/projects/eptex/wiki/FrontPage}
\r
1220 北川弘典, \LuaTeX で日本語.
\r
1221 \newblock\url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378}
\r
1223 \bibitem{luatexref}
\r
1224 \LuaTeX\ development team, \emph{The \LuaTeX\ reference}.
\r
1225 \newblock\url{http://www.luatex.org/svn/trunk/manual/luatexref-t.pdf} (snapshot of SVN trunk)
\r
1228 \LuaTeX-ja project team, \emph{The \LuaTeX-ja package}. \newblock
\r
1229 Not completed for now. \newblock Available at |doc/man-en.pdf| (in English) or
\r
1230 |doc/man-ja.pdf| (in Japanese)
\r
1231 in the Git repository.
\r
1233 \bibitem{luajp-test}
\r
1234 香田温人, \LuaTeX と日本語.
\r
1235 \newblock\url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html}
\r
1237 \bibitem{luajalayout}
\r
1238 前田一貴, luajalayout パッケージ---Lua\LaTeX によ
\r
1240 \newblock\url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/}
\r
1242 \bibitem{jsclasses}
\r
1243 奥村晴彦, p\LaTeXe 新ドキュメントクラス.
\r
1244 \newblock\url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/}
\r
1247 Haruhiko Okumura, \emph{\pTeX\ and Japanese Typesetting},
\r
1248 The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51.
\r
1251 乙部厳己, min10フォントについて.
\r
1252 \newblock\url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf}
\r
1255 齋藤修三郎, Open Type Font用VF.
\r
1256 \newblock\url{http://psitau.kitunebi.com/otf.html}
\r
1258 \bibitem{stack-mail}
\r
1259 Jonathan Sauer, \emph{[Dev-luatex] tex.currentgrouplevel}.
\r
1260 \newblock\url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html}
\r
1263 Takuji Tanaka, \emph{u\pTeX, up\LaTeX---unicode version of \pTeX, p\LaTeX}.
\r
1264 \newblock\url{http://homepage3.nifty.com/ttk/comp/tex/uptex_en.html}
\r
1267 Nobuyuki Tsuchimura and Yusuke Kuroki, \emph{Development of Japanese \TeX\ Environment},
\r
1268 The Asian Journal of \TeX\ \textbf{2}~(2008), 53--62.
\r
1271 W3C Working Group, \emph{Requirements for Japanese Text Layout}.
\r
1272 \newblock\url{http://www.w3.org/TR/jlreq/}
\r
1273 \end{thebibliography}
\r