\lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em}
\usepackage{mflogo,booktabs}
-\definecolor{gray10}{gray}{0.9}
+\definecolor{grayx}{gray}{0.85}
%%% Verbatim environment
\usepackage{fancyvrb}
\section{Introduction}
\subsection{History}
To typeset Japanese documents with \TeX, ASCII p\TeX~\cite{ptex} has
-been widely used in Japan. There are other methods---for example, using Omega
-and OTP~\cite{omega}, or with the CJK package---to do so, however,
-these alternative methods did not become a majority. On the one hand,
-p\TeX\ enables us to produce high-quality documents.
-
-On the other hand, p\TeX\ is left behind from the extensions of \TeX\
+been widely used in Japan. There are other methods---for example, using
+Omega and OTP~\cite{omega}, or with the CJK package---to do so, however,
+these alternative methods did not become a majority. The author thinks
+that this is because p\TeX\ enables us to produce high-quality documents
+(e.g.,~supporting vertical typesetting), and the appearance of p\TeX\ is
+earlier than alternatives described above.
+
+However, p\TeX\ has been left behind from the extensions of \TeX\
such as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In
recent years, the situation become better, because of developments
of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
$\varepsilon$-p\TeX~\cite{eptex} by the author,~and up\TeX~\cite{uptex}
by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, to develop
an engine extension localized for Japanese, is not wise. This approach
-needs lots of work for \emph{each} engine, and \LuaTeX\ has an ability
-to hook \TeX's internal process by using Lua callbacks.
+needs lots of work for \emph{each} engine, and since \LuaTeX\ has an ability
+to hook \TeX's internal process by using Lua callbacks, the necessity of
+an engine extension is getting smaller.
There were several experimental attempts to typeset
p\TeX\ has some flexibility of typesetting, by changing internal
parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using
- custom JFM (Japanese TFM).
+ custom JFM (Japanese TFM). Therefore we decided to include these
+ functionality to \LuaTeX-ja.
\item\emph{\LuaTeX-ja isn't mere re-implementation or porting of p\TeX;
some (technically and/or conceptually) inconvenient features of
This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
is located on
\url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage}. There is
-no stable version at Oct.\ 11, 2011, however the development source can be
+no stable version at Oct.\ 15, 2011, however the development source can be
obtained from the git repository. Members of the project are as follows
(in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda,
\section{Major differences with \pTeX}
In this section, we look at several major differences between p\TeX\
and our \LuaTeX-ja. For general information of Japanese typesetting and the
-overview of p\TeX, please see Okumara~\cite{ptexjp}.
+overview of p\TeX, please see Okumura~\cite{ptexjp}.
\subsection{Names of Control Sequences}
char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in p\TeX\
sets the amount of penalty inserted before a character whose code is
$\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it
-penalty}\rangle$, and |\prebreakpenalty|$\langle\hbox{\it
+penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it
char\_code}\rangle$ can be also used for retrieving the value.
Moreover, there are some parameters which values of them at the end of a
a string.
\end{itemize}
-\subsection{Line Break after a Japanese Character}
+\subsection{Line-break after a Japanese Character}
\label{ssec-line}
Japanese texts can break lines almost everywhere, in contrast with
alphabetic texts can break lines only between words (or use
hyphenation). Hence, p\TeX's input processor is modified so that a
-line break after a Japanese character doesn't emit a space. However,
+line-break after a Japanese character doesn't emit a space. However,
there is no way to customize the input processor of \LuaTeX, other than
to hack its CWEB-source. All a macro package can do is to modify an input line before
when \LuaTeX\ begin to process it, inside the |process_input_buffer|
Figure~\ref{fig-linebreak} shows an example of this situation; the
command at the first line marks most of Japanese characters as
-non-Japanese characters. In other words, from that command onward, the
+`non-Japanese characters'. In other words, from that command onward, the
letter `あ' will be treated as an alphabetic character by
-\LuaTeX-ja. Then, it is natural to occur a space between `あ' and `y' in
+\LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in
the output, where the actual output in the figure does not so. This is
-because `あ' is considered to be a Japanese character by \LuaTeX-ja,
+because `あ' is considered a Japanese character by \LuaTeX-ja,
when \LuaTeX-ja does a decision whether U+FFFFF will be added to the
input line~2.
Traditionally, most Japanese fonts used in typesetting are not
proportional, that is, most glyphs have same size (in most cases,
square-shaped). Hence, it is not rare that the contents of different
-JFMs are totally same, and only differ in their names. For example,
+JFMs are essentially same, and only differ in their names. For example,
|min10.tfm| and |goth10.tfm|, which are JFMs shipped with p\TeX\ for
seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family,
differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and
are totally same as binary files. Considering this situation, we
decided to separate `real' fonts and metrics used for them in
\LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain
-\TeX\ are shown in Figure~\ref{fig-jfdef}.
+\TeX\ are shown in Figure~\ref{fig-jfdef}. We would like to add several
+remarks:
\begin{itemize}
\item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
\item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so
fonts. When one display a pdf with these fonts, actual fonts which
will be used for them depend on a pdf reader.
\end{itemize}
-The specification of a metric used in \LuaTeX-ja is similar to that of a
-JFM (see \cite{ptexjp}); characters are grouped into several classes,
-the size information of characters are specified for each class, and
+The specification of a metric for \LuaTeX-ja is similar to that of a JFM
+(see \cite{ptexjp}); characters are grouped into several classes, the
+size information of characters are specified for each class, and
glue/kern insertions are specified for each pair of classes. Although
the author have not tried, it may be possible to develop a program that
`converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three
As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing
processes are totally different from those of \TeX82. \TeX82's process is
-done just when a (sequence of) character is appended to current
+done just when a (sequence of) character is appended to the current
list. Thus we can interrupt this process by writing as
|f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the
process will be done when a horizontal box or a paragraph is ended, so
justifying each line.
\end{itemize}
In p\TeX, these three kinds of glues are treated differently. A JFM glue
-is inserted when a (sequence of) Japanese character is appended to
+is inserted when a (sequence of) Japanese character is appended to the
current list, same as the case of alphabetic characters in \TeX82. This
means that one can interrupt the insertion process by saying |{}|. A
\emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a
\begin{description}
\item[Ignored Nodes]
-As noted in the previous subsection, the insertion process in p\TeX\ can be
- interrupted by saying |{}| or anything else\footnote{This is
- why some tricks like \texttt{ちょ\char`\{\char`\}っと} that
- are needed when we use \texttt{min10.tfm} work.}. This leads
+As noted in the previous subsection, the insertion process in p\TeX\ can
+ be interrupted by saying |{}| or anything else\footnote{This
+ is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for
+ \texttt{min10.tfm} and other `old' JFMs work.}. This leads
the second row in Table~\ref{tab-jfmglue}, or
Figure~\ref{fig-ptexjfm}. `The process is interrupted' means
that p\TeX\ does not think the letter `】\inhibitglue' is
\emph{mark\_node}, \emph{whatsit\_node} and
\emph{penalty\_node}---, as shown in (4).
-By the way, around a \emph{glyph\_node} $p$ there may be some nodes
+
+By the way, around a \emph{glyph\_node} $p$ there may be some nld odes
attached to $p$. These are an accent and kerns for
- positioning it, and a kern from italic
+ positioning it, and a kern from the italic
correction\footnote{\TeX82 (and \LuaTeX) does not distinguish
between explicit kern and a kern for italic correction. To
- distinguish them, \LuaTeX-ja uses an additional attribute and
+ distinguish them, an additional subtype for kern is introduced
+ in p\TeX. On the other hand, \LuaTeX-ja uses an additional attribute and
redefines \texttt{\char`\\/}.} for $p$. It is natural that
- these attachments should be ignored in the process. Hence
+ these attachments should be ignored inside the process. Hence
\LuaTeX-ja takes this approach, as the latest version of
p\TeX\ (p3.2). This explains (2) in the figure.
\mc 明朝)\hbox{}\gt (ゴシック
\end{quote}
However this seems to be unnatural, since two Japanese fonts in the
- output uses the same metric, \emph{i.e.}, the same
+ output use the same metric, i.e.,~the same
typesetting rule. Hence, we decided that Japanese fonts with
the same metric are treated as one font in the insertion
process of \LuaTeX-ja. Thus, the output from the above input
- in \LuaTeX-ja is:
+ in \LuaTeX-ja looks like:
\begin{quote}
\mc 明朝)\gt (ゴシック
\end{quote}
\small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\
\hrulefill\vrule height .5ex depth .5ex\cr}}}}%
\imagfm{\jstrut )\inhibitglue}%
-\imagfm{\jstrut\hbox to .5\zw{\hss\normalsize (1)\hss}}%
+\hbox to .5\zw{\hss\normalsize (1)\hss}%
\imagfm{\jstrut\inhibitglue\gt (}%
\imagfm{\jstrut\gt 漢}%
\imagfm{\jstrut\gt )\inhibitglue}%
-\imagfm{\jstrut\hbox to .55\zw{\hss\normalsize (2)\hss}}%
+\hbox to .55\zw{\hss\normalsize (2)\hss}%
\imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}%
\imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{%
\vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr
Considering this situation of p\TeX, \LuaTeX-ja can use the value of
\emph{xkanjiskip} that specified in a metric. If the value of
- \emph{xkanjiskip} on the user side (this is the
+ \emph{xkanjiskip} on user side (this is the
\textsf{xkanjiskip} parameter in |\ltjsetparameter|) is
|\maxdimen|, then the \LuaTeX-ja use the specification from
the current used metric as the actual value of
At the moment, \LuaTeX-ja can be used under plain \TeX, and under
\LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by |\input|
command or |\usepackage| (in~\LaTeXe), if you merely want to typeset
-Japanese character. We look more detail by parts.
+Japanese characters. We look more detail by parts.
\subsection{`Engine Extension'}
The lowest part of \LuaTeX-ja corresponds the p\TeX\ extension as
assignment of |\kcatcode| can be done by a Unicode
block\footnote{There are some exceptions. For example,
U+FF00--FFEF (Halfwidth and Fullwidth Forms) are divided into
- three blocks in up\TeX.}.
+ three blocks in recent up\TeX.}.
\LuaTeX-ja uses a slightly different approach. Because there are many
Unicode blocks already in Basic Multilingual Plane which are
range of Japanese characters. \LuaTeX-ja predefines 8~character ranges,
as shown in Table~\ref{tab-chrrng}. Almost of these ranges are just the
union of Unicode blocks, and determined from the Adobe-Japan1-6 character
-correction, and JIS~X~0208. And, among these 8~ranges, the ranges~2, 3, 6, 7,
-and~8 are considered ranges of Japanese characters, and others are
-considered ranges of alphabetic characters.
+collection~\cite{aj16}, and JIS~X~0208. And, among these 8~ranges, the
+ ranges~2, 3, 6, 7, and~8 are considered ranges of Japanese
+ characters, and others are considered ranges of alphabetic
+ characters.
This default setting is suitable for Japanese-based documents, however it
causes that other packages which use Unicode fonts do not work
to the range~8, and |\textendash| provided by the |EU2|
encoding used in the \emph{fontspec} package is the
character U+2013, which belongs to the range~3. hence, these
- charatcer cannot be typeset with the default range setting.
+ character cannot be typeset correctly with the default range setting.
\begin{table}
\caption{Predefined ranges in \LuaTeX-ja}
1&(Additional) Latin characters which is not belonged in the range~8.\\
2&Greek and Cyrillic letters.\\
3&Punctuations and miscellaneous symbols.\\
-4&Unicode blocks which does not intersect with Adobe-Japan1.\\
+4&Unicode blocks which does not intersect with Adobe-Japan1-6.\\
5&Surrogates and supplementary private use Areas.\\
6&Characters used in Japanese typesetting.\\
7&Characters possibly used in CJK typesetting, but not in Japanese.\\
\end{itemize}
The same criterion is used for changing Japanese font family.
-To work this behavior well, a list of all encodings defined already in the
- document is needed. Since \LuaTeX-ja is loaded as a package,
- \LuaTeX-ja cannot have this list. Hence \LuaTeX-ja adopted different
- approach, namely |\fontfamily{|$\langle\hbox{\it
- arg\/}\rangle$|}| changes the current alphabetic font family
- to $\langle\hbox{\it arg\/}\rangle$, if and only if:
+To work this behavior well, a list of all (alphabetic) encodings defined
+ already in the document is needed. However, since \LuaTeX-ja
+ is loaded as a package, \LuaTeX-ja cannot have this list.
+ Hence \LuaTeX-ja adopted a different approach, namely
+ |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the
+ current alphabetic font family to $\langle\hbox{\it
+ arg\/}\rangle$, if and only if:
\begin{itemize}
\item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in
the current alphabetic encoding $\langle\hbox{\it enc\/}\rangle$.
\item[The \emph{fontspec} package] The \emph{fontspec} package is built
on NFSS2, hence control sequences offered by the
\emph{fontspec} package, such as |\setmainfont|, are only
- effective for alphabetic fonts if \LuaTeX-ja is
- loaded. |luatexja-fontspec.sty| offers these counterparts for
- Japanese fonts, with additional `j' in the name of control
- sequences, such as |\setmainjfont|.
+ effective for alphabetic fonts if \LuaTeX-ja is loaded. The
+ optional package \texttt{luatexja-\penalty0fontspec.sty}
+ offers these counterparts for Japanese fonts, with additional
+ `j' in the name of control sequences, such as
+ |\setmainjfont|.
\item[The \emph{otf} package]
-This package is widely used for characters which is
+This package is widely used in p\TeX\ for characters which is
not in JIS~X~0208, and for using more than one weight in \emph{mincho}
and \emph{gothic} font families. Therefore \LuaTeX-ja supports features
-in the \emph{otf} package, by loading |luatexja-otf.sty|. Note that
-characters by |\UTF{xxxx}| and |\CID{xxxx}| are not appended to the
-current list as a \emph{glyph\_node}, so they are not affected by
-callbacks by the \emph{luaotfload} package. We have another remark; |\CID| does not work
-with TrueType fonts.
+in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty}
+ manually. Note that characters by |\UTF{xxxx}| and
+ |\CID{xxxx}| are not appended to the current list as a
+ \emph{glyph\_node}, so they are not affected by callbacks by
+ the \emph{luaotfload} package. We have another remark; |\CID|
+ does not work with TrueType fonts.
\item[The \emph{listings} package]
-It is well-known that there is a patch of the \emph{listings} package for
- p\LaTeXe,\ called |jlisting.sty|. Generally speaking, it also
- can be used in \LuaTeX-ja. However, it seems to be that a
- Japanese character after a space does not recieve any process
- of the \emph{listings} package; this is inconvinient when we
- use the \emph{showexpl} package.
+It is well-known that there is a patch |jlisting.sty| of the
+ \emph{listings} package for p\LaTeXe. Generally speaking, it
+ also can be used in \LuaTeX-ja. However, it seems to be that
+ a Japanese character after a space does not recieve any
+ process of the \emph{listings} package; this is inconvinient
+ when we use the \emph{showexpl} package.
\end{description}
As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining
Japanese font, as p\TeX. However, since the information of the current
Japanese font is stored into an attribute, control sequences defined by
-|\jfont| (\emph{e.g.},~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
+|\jfont| (e.g.,~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is
not representing a font by the means of \TeX82. In other words, each of
these control sequences is just an assignment to an attribute, therefore
they cannot be an argument of |\the|, |\fontname|, or |\textfont|.
\subsection{Overview of the Processes}
Now we describe an outline of the \LuaTeX-ja's process briefly.
\begin{description}
-\item[Treatment of Linebreaks after Japanese Characters] This part is
+\item[Treatment of Line-breaks after a Japanese Character] This part is
described already in Subsection~\ref{ssec-line}. Done in the
|process_input_buffer| callback.
\item[Font Replacement] In the |hyphenate| callback, \LuaTeX-ja looks
Furthermore the subtype of $p$ is subtracted by 1 to suppress
hyphenation around it by \LuaTeX, since later processes of
- \LuaTeX-ja take care of all things about Japanese charaters.
+ \LuaTeX-ja take care of all things about Japanese characters.
\end{description}
%
Following processes are all executed in |pre_linebreak_filter| and
is the content of a horizontal box is traversed,
to determine what is the level of \LuaTeX-ja's internal stack at the end
of the list. This is needed because of the place of
- |hpack_filter| callback in the source of \LuaTeX. We will discuss more
+ the |hpack_filter| callback in the source of \LuaTeX. We will discuss more
detail in Subsection~\ref{ssec-stack}.
\item[Insertion of Glues/Kerns for Japanese Typesetting]
We will discuss the detail about this in Subsection~\ref{ssec-width}.
\end{description}
-The callbacks by the \emph{luaotfload} package, e.g., replacement of
+The callbacks by the \emph{luaotfload} package, e.g.,~replacement of
glyphs according to font features, are executed just after `Examination
of Stack Level' above.
Figure~\ref{fig-ltsrc} is an extract of a CWEB-source
\texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function
-is called just when explicit |\hbox{...}| or |\vbox{...}| is ended, and
+is called just when an explicit |\hbox{...}| or |\vbox{...}| is ended, and
the function |filtered_hpack()| is where the |hpack_filter| and then the
actual `hpack' process are performed. Notice that the |unsave()|
function is called before |filtered_hpack()|. This is the problem;
\emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be
appended to the current horizontal list each time the current stack
level is incremented, and their values are the values of
-|\currentgrouplevel| at that time. In the beginning of |hpack_filter|
+|\currentgrouplevel| at that time. In the beginning of the |hpack_filter|
callback, the list in question is traversed to determine whether the
stack level at the end of the list and that outside the box coincides.
Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current
-stack level, both inside the |hpack_filter| callback, i.e., outside a
+stack level, both inside the |hpack_filter| callback, i.e.,~outside a
horizontal box. Consider a list which represents the content of the box,
then we have:
\begin{itemize}
\begin{center}\unitlength=9pt\small
\begin{picture}(15,12)(-1,-3)
-\color{gray10}% real glyph
+\color{grayx}% real glyph
\put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
\color{black}% real glyph :step1
\put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}}
\put(-1,5.5){\line(1,0){6}}
\put(-1,-4){\line(1,0){6}}
+\put(-1,0){\makebox(0,0)[r]{\strut$R$\,}}
\thicklines
\put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}}
middle dot `・'. We have other settings, namely, `left' and `right'.
After that, it is shifted according to the value of |left| and |down|,
which are specified in the metric. The final position of the real glyph
-is shown by the gray rectangle. If the amount of shifting baseline is
+is shown by the gray rectangle~$R$. If the amount of shifting the baseline is
not zero, $M$ (and hence the real glyph) is shifted by that amount.
We would like to remark briefly about the vertical position of a glyph.
and possibly other Asian languages, under \LuaTeX.
-
%%% The style of the bibiliogrphy is `amsplain'.
\providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace}
\providecommand{\href}[2]{#2}
\begin{thebibliography}{99}
+\bibitem{aj16}
+Adobe Systems Incorporated, \emph{Adobe-Japan1-6 Character Collection
+ for CID-Keyed Fonts}, Technical Note~\#5078, 2004.
+\url{http://partners.adobe.com/public/developer/en/font/5078.Adobe-Japan1-6.pdf}
+
\bibitem{ptex}
ASCII MEDIA WORKS,アスキー日本語\TeX\ (p\TeX).\url{http://ascii.asciimw.jp/pb/ptex/}