From 7506143c0be2805a6550db6746be040219056303 Mon Sep 17 00:00:00 2001 From: KUROKI Yusuke Date: Tue, 22 Nov 2011 19:49:08 +0900 Subject: [PATCH] Obeyed ajt.cls. --- doc/ajt-devel-ltja.tex | 2470 ++++++++++++++++++++++++------------------------ 1 file changed, 1235 insertions(+), 1235 deletions(-) diff --git a/doc/ajt-devel-ltja.tex b/doc/ajt-devel-ltja.tex index 5a81a5f..cf1159d 100644 --- a/doc/ajt-devel-ltja.tex +++ b/doc/ajt-devel-ltja.tex @@ -1,1235 +1,1235 @@ -%#!lualatex ajt-devel-ltja -\documentclass{ajt} - -%%% Packages used in this paper - -%%% Font setting for \LuaTeX; this is extract from ajt.cls -\makeatletter - \if@print - \RequirePackage{fontspec,xunicode} - \RequirePackage{luatextra} - \setmainfont[Mapping=tex-text]{Palatino LT Std} - \setsansfont[Mapping=tex-text]{Optima LT Std} - \else - \RequirePackage{fontspec,luatextra} - \setmainfont[Mapping=tex-text]{TeX Gyre Pagella} % \simeq Palatino - \fi - -%%% LuaTeX-ja -\usepackage{luatexja,luatexja-fontspec} -\ltjsetparameter{jacharrange={-3,-8}} -\DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipam.ttf:jfm=ujis}{} -\DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=ujis}{} -% quick hack: monospaced Japanese font by \ttfamily -\DeclareKanjiFamily{JY3}{\ttdefault}{}{} -\DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=mono}{} - - -%%% LTXexample environment -\usepackage{showexpl,lltjlisting} -\lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em} - -%%% Verbatim environment -\usepackage{fancyvrb} -\CustomVerbatimEnvironment{code}{Verbatim}% -{numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small} -\CustomVerbatimEnvironment{codewithoutnum}{Verbatim}% -{xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small} -\CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}% -{xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize} -\DefineShortVerb{\|} - -%%% Others -\usepackage{mflogo,booktabs} -\definecolor{grayx}{gray}{0.85} -\hyphenation{ - kanjiskip - xkanjiskip -} - -%%% Mandatory article metadata %%% -\title{Development of \LuaTeX-ja package} -\author{Hironori Kitagawa {\normalsize 北川 弘典}} -\address{\LuaTeX-ja project team} -\email{h\_kitagawa2001@yahoo.co.jp} - -\keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese} -\abstract{% -\LuaTeX-ja package is a macro package for typesetting Japanese -documents under \LuaTeX. The package has more flexibility of -typesetting than \pTeX, which is widely used Japanese extension of \TeX, -and has corrected some unwanted features of \pTeX. -In this paper, we describe specifications, the current status and some -internal processing methods of \LuaTeX-ja. -} - -\newcommand{\parname}[1]{\textsf{#1}} -\newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp} -\newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi% - \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0 - \smash{\vrule width \wd0 height 0.4pt depth0.4pt}}}} -\begin{document} - -%%% Do not forget to start with \maketitle! -\maketitle - -\section{Introduction} -\subsection{History} -To typeset Japanese documents with \TeX, ASCII \pTeX~\cite{ptex} has -been widely used in Japan. There are other methods---for example, using -Omega and OTP~\cite{omega}, or with the CJK package---to do so, however, -these alternative methods did not become majority. The author thinks -that this is because \pTeX\ enables us to produce high-quality documents -(e.g.,~supporting vertical typesetting), and the appearance of \pTeX\ is -earlier than that of alternatives described above. - -However, \pTeX\ has been left behind from the extensions of \TeX\ such -as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In recent -years, the situation has become better, by development of -|ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}), -$\varepsilon$-\pTeX~\cite{eptex} by the author,~and u\pTeX~\cite{uptex} -by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, -to develop an engine extension localized for Japanese, is not wise. This -approach needs lots of work for \emph{each} engine. In addition, if we -use \LuaTeX, the necessity of an engine extension is getting smaller -because \LuaTeX\ has an ability to hook \TeX's internal process by using -Lua callbacks. - - -There were several experimental attempts to typeset -Japanese documents with \LuaTeX\ before. Here we cite three examples: -\begin{itemize} -\item |luaums.sty|~\cite{luaums} developed by the author. This - experimental package is for creating a certain Japanese-based presentation - with \LuaTeX. -\item the \emph{luajalayout} package~\cite{luajalayout}, formerly known as the - \emph{jafontspec} package, by Kazuki Maeda (前田一貴). This package is based on - \LaTeXe\ and \emph{fontspec} package. -\item the \emph{luajp-test} package~\cite{luajp-test}, a test package made by - Atsuhito Kohda (香田温人), based on articles on the web page~\cite{joylua}. -\end{itemize} -However, these packages are based on \LaTeXe, and do not have much -ability to control the typesetting rule. And it is inefficient that more -than one people separately develop similar packages. Development of the -\LuaTeX-ja package is started initially by the author and Kazuki Maeda, because of -these situations. - -\subsection{Development policy of \LuaTeX-ja} -\label{ssec-pol} -The first aim of \LuaTeX-ja project was to implement features (from the -`primitive' level) of \pTeX\ as macros under \LuaTeX, therefore \LuaTeX-ja is -much affected by \pTeX. However, as development proceeded, some -technical/conceptual difficulties arose. Hence we changed the aim -of the project as follows: -\begin{itemize} -\item\emph{\LuaTeX-ja offers at least the same flexibility of - typesetting that p\TeX\ has.} - - We are not satisfied with the ability of producing outputs conformed to - JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for - typesetting, or to a technical note~\cite{w3c} by W3C; - if one wants to produce very incoherent outputs for some reason, it - should be possible. -In this point, previous attempts of Japanese typesetting with \LuaTeX\ - which we cited in the previous subsection are inadequate. - -\pTeX\ has some flexibility of typesetting, by changing internal - parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using - custom JFM (Japanese TFM). Therefore we decided to include these - functionality to \LuaTeX-ja. - -\item\emph{\LuaTeX-ja isn't mere re-implementation or porting of \pTeX; - some (technically and/or conceptually) inconvenient features of - \pTeX\ are modified.} - - We describe this point in more detail at the next section. -\end{itemize} - - -\subsection{Overview of the processes} -\label{ssec-over} -We describe an outline of \LuaTeX-ja's process in order. - -\begin{itemize} -\item In the |process_input_buffer| callback: treatment of breaking - lines after a Japanese character (in Subsection~\ref{ssec-line}). - -\item In the |hyphenate| callback: font replacement. - -\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the horizontal list. If - the character represented by $p$ is considered as a Japanese - character, the font used at $p$ is replaced by the value of - |\ltj@curjfnt|, an attribute for `the current Japanese font' - at~$p$. - -Furthermore, the subtype of $p$ is subtracted by 1 to suppress - hyphenation around $p$ by \LuaTeX, because later processes of - \LuaTeX-ja take care of all things about Japanese characters. - -\item In |pre_linebreak_filter| and |hpack_filter| callbacks: - -\begin{enumerate} -\item \LuaTeX-ja has its own stack system, and the current horizontal - list is traversed in this stage to determine what the level of - \LuaTeX-ja's internal stack at the end of the list is. We will - discuss it in Subsection~\ref{ssec-stack}. - -\item In this stage, \LuaTeX-ja inserts glues/kerns for Japanese - typesetting in the list. This is the core routine of \LuaTeX-ja. - We will discuss it in Subsections - \ref{ssec-jglue}~and~\ref{ssec-jspec} . - -\item To make a match between a metric and a real font, sometimes - adjustument of the position of (Japanese) glyphs are performed. - We will discuss it in Subsection~\ref{ssec-width}. -\end{enumerate} -\item In the |mlist_to_hlist| callback: treatment of Japanese characters - in math formulas. This stage is similar to adjustment of the - position of glyphs (see above), so we omit to describe this stage - from this paper. -\end{itemize} - -In this paper, a \emph{alphabetic character} means a non-Japanese -character. Similarly, we use the word an \emph{alphabetic font} as the -counterpart of a jJpanese font. - -\subsection{Contents of this paper} -Here we describe the contents of the rest of this paper briefly. In -Section~\ref{sec:differences_with_ptex}, we describe major differences -between \pTeX\ and \LuaTeX-ja. The next section, -Section~\ref{sec:distinction_of_characters}, is concentrated on a -problem how we distinguish between Japanese characters and alphabetic -characters. In Section~\ref{sec:current_status}, we show current -development status of the package. Finally, in -Section~\ref{sec:implementation}, we describe some internal routines of -\LuaTeX-ja. - -\subsection{General information of the project} -This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki -is located on -\url{http://sourceforge.jp/projects/luatex-ja/wiki/}. There is -no stable version on October 22, 2011, however a set of developer sources can be -obtained from the git repository. Members of the project team are as follows -(in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato, -Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda, -and~Shuzaburo Saito. - - -\section{Major differences with \pTeX} -\label{sec:differences_with_ptex} -In this section, we explain several major differences between \pTeX\ -and our \LuaTeX-ja. For general information of Japanese typesetting and the -overview of \pTeX, please see Okumura~\cite{ptexjp}. - - -\subsection{Names of control sequences} -\label{ssec-csname} Because \pTeX\ is an engine modification of Knuth's -original \TeX82 engine, some of the additional primitives take a form that is -very difficult to be simulated by a macro. For example, an additional -primitive |\prebreakpenalty|$\langle\hbox{\it -char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in \pTeX\ -sets the amount of penalty inserted before a character whose code is -$\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it -penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it -char\_code}\rangle$ can be also used for retrieving the value. - -Moreover, there are some internal parameters of \pTeX\ which values of them at the end of a -horizontal box or that of a paragraph are valid in whole box or -paragraph. However, the implementation of these parameters in -\LuaTeX-ja is not so easy; we will discuss it in Subsection~\ref{ssec-stack}. - -From above two problems discussed above, the assignment and retrieval -of most parameters in \LuaTeX-ja are summarized into the following -three control sequences: -\begin{itemize} -\item |\ltjsetparameter{|$\langle\hbox{\it - name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local - assignment. -\item |\ltjglobalsetparameter|: for global assignment. Note that these two control - sequences obey the value of |\globaldefs| primitive. -\item |\ltjgetparameter{|$\langle\hbox{\it - name}\rangle$|}[{|$\langle\hbox{\it optional - argument}\rangle$|}]|: for retrieval. The returned value is always - a string. -\end{itemize} - -\subsection{Line-break after a Japanese character} -\label{ssec-line} - -Japanese texts can break lines almost everywhere, in contrast with -alphabetic texts can break lines only between words (or use -hyphenation). Hence, \pTeX's input processor is modified so that a -line-break after a Japanese character doesn't emit a space. However, -there is no way to customize the input processor of \LuaTeX, other than -to hack its CWEB-source. All a macro package can do is to modify an input line before -when \LuaTeX\ begin to process it, inside the |process_input_buffer| -callback. - -Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this -purpose) will be appended to an input line, if this line ends with a Japanese -character.\footnote{Strictly speaking, it also requires that the catcode -of the end-line character is 5~(\emph{end-of-line}). This condition is -useful under the verbatim environment.} One might jump to a conclusion -that the treatment of a line-break by \pTeX\ and that of \LuaTeX-ja are -totally same, however they are different in the respect that \LuaTeX-ja's -judgement whether a comment letter will be appended the line is done -\emph{before} the line is actually processed by \LuaTeX. - -Figure~\ref{fig-linebreak} shows an example of this situation; the -command at the first line marks most of Japanese characters as -`non-Japanese characters'. In other words, from that command onward, the -letter `あ' will be treated as an alphabetic character by -\LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in -the output, where the actual output in the figure does not so. This is -because `あ' is considered a Japanese character by \LuaTeX-ja, -when \LuaTeX-ja does the decision whether U+FFFFF will be added to the -input line~2. - -\begin{figure} -\begin{LTXexample} -\font\x=IPAMincho \x -\ltjsetparameter{jacharrange={-6}}xあ -y -\end{LTXexample} -\caption{A notable sample showing the treatment of a line-break after a -Japanese character.}\label{fig-linebreak} -\end{figure} - -\subsection{Separation between `real' fonts and metrics} -\label{ssec-sepmet} - -Traditionally, most Japanese fonts used in typesetting are not -proportional, that is, most glyphs have same size (in most cases, -square-shaped). Hence, it is not rare that the contents of different -JFMs are essentially same, and only differ in their names. For example, -|min10.tfm| and |goth10.tfm|, which are JFMs shipped with \pTeX\ for -seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family, -differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and -|jisg.tfm|, which is included in the \emph{jis} font metric, which is -used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦), -are totally same as binary files. Considering this situation, we -decided to separate `real' fonts and metrics used for them in -\LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain -\TeX\ are shown in Figure~\ref{fig-jfdef}. We would like to add several -remarks: -\begin{itemize} -\item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|. -\item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so - \hbox{\tt file:} and \hbox{\tt name:} prefixes, and various font features can be - used as the first line in Figure~\ref{fig-jfdef}. -\item The |jfm| key specifies the metric for the font. In - Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a - Lua script named |jfm-ujis.lua|. This metric is the standard - metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf} - package~\cite{otf}. -\item The \hbox{psft:} prefix can be used to specify name-only, non-embedded - fonts. When one displays a pdf with these fonts, actual fonts which - will be used for them depend on a pdf reader. -\end{itemize} -The specification of a metric for \LuaTeX-ja is similar to that of a JFM -(see \cite{ptexjp}); characters are grouped into several classes, the -size information of characters are specified for each class, and -glue/kern insertions are specified for each pair of classes. Although -the author have not tried, it may be possible to develop a program that -`converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three -metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the -\emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|. - - Note that |-kern| in features -is important, because kerning information from a real font itself will -clash with glue/kern information from the metric. - -\begin{figure} -\begin{verbatim} -\jfont\foo=file:ipam.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt -\jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt -\end{verbatim} -\caption{Typical declarations of Japanese fonts.} -\label{fig-jfdef} -\end{figure} - -\subsection{Insertion of glues/kerns for Japanese typesetting: timing} -\label{ssec-jglue} - -As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing -processes are totally different from those of \TeX82. \TeX82's process is -done just when a (sequence of) character is appended to the current -list. Thus we can interrupt this process by writing as -|f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the -process will be done when a horizontal box or a paragraph is ended, so -|f{}irm| and |firm| yield same outputs under \LuaTeX. - -The situation for Japanese characters is more complicated. -Glues (and kerns) which are needed for Japanese -typesetting are divided into the following three categories: -\begin{itemize} -\item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue}, - for short). - -\item Default glue between a Japanese character and an alphabetic - character (\emph{xkanjiskip}, for short), usually 1/4 of - full-width (\emph{shibuaki}) with some stretch and shrink for - justifying each line. -\item Default glue between two consecutive Japanese characters - (\emph{kanjiskip}, for short). The main reason of this glue is to - enable breaking lines almost everywhere in Japanese texts. In most - cases, its natural width is zero, and some stretch/shrink for - justifying each line. -\end{itemize} -In \pTeX, these three kinds of glues are treated differently. A JFM glue -is inserted when a (sequence of) Japanese character is appended to the -current list, same as the case of alphabetic characters in \TeX82. This -means that one can interrupt the insertion process by saying |{}|. A -\emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a -paragraph; this timing is somewhat similar to that of \LuaTeX's kerning -process. Finally, A \emph{kanjiskip} is not appeared as a node anywhere; -only appears implicitly in calculation of the width of a horizontal box, -that of breaking lines, and the actual output process to a DVI -file. These specifications have made \pTeX's behavior very hard to -understand. - -\LuaTeX-ja inserts glues in all three categories simultaneously inside -|hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of -this specification are to behave like alphabetic characters in \LuaTeX\ -(as described in the first paragraph in this subsection), and to clarify -the specification for \LuaTeX-ja's process. - -\subsection{Insertion of glues/kerns for Japanese typesetting: specification} -\label{ssec-jspec} - -\begin{table} -\caption{Examples of differences between \pTeX\ and \LuaTeX-ja.} -\label{tab-jfmglue} -\begin{center} -\begin{tabular}{llllllll} -\toprule -&\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\ -Input &|あ】{}【〕\/〔| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule -\pTeX &あ】\hbox{}【〕\hbox{}〔&い』\/a &う)\hbox{}( &え]\hbox{}[\\ -\LuaTeX-ja &あ】{}【〕\/〔 &い』\/a &う)\hbox{}( &え]\special{}[\\ -\bottomrule -\end{tabular} -\end{center} -\end{table} - -\begin{figure} -\begin{center} -\fontsize{40}{40}\selectfont -\imagfm{\jstrut あ}% -\imagfm{\jstrut 】\inhibitglue}% -\imagfm{\jstrut\kern.5\zw}% -\imagfm{\jstrut\kern.5\zw}% -\imagfm{\jstrut\inhibitglue【}% -\imagfm{\jstrut 〕\inhibitglue}% -\imagfm{\jstrut\kern.5\zw}% -\imagfm{\jstrut\kern.5\zw}% -\imagfm{\jstrut\inhibitglue〔}% -\end{center} -\caption{Detail of the output of \pTeX\ in the input~(1) in Table~\ref{tab-jfmglue}.} -\label{fig-ptexjfm} -\end{figure} - -Now we will take a look at the insertion process itself through four points. - -\begin{description} -\item[Ignored Nodes] -As noted in the previous subsection, the insertion process in \pTeX\ can - be interrupted by saying |{}| or anything else.\footnote{This - is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for - \texttt{min10.tfm} and other `old' JFMs work.} This leads the - second row in Table~\ref{tab-jfmglue}, or - Figure~\ref{fig-ptexjfm}. Here `the process is interrupted' - means that \pTeX\ does not think the letter `】\inhibitglue' - is followed by `\inhibitglue【', hence two half-width glues - are inserted between `】\inhibitglue' and `\inhibitglue【', - where the left one is from `】\inhibitglue' and the right one - is from `\inhibitglue【'. - - On the other hand, in \LuaTeX-ja, the process is done inside - |hpack_filter| and |pre_linebreak_filter| callbacks. Hence, - \emph{anything that does not make any node will be - ignored}\ in \LuaTeX-ja, as shown in (1) in - Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes - which does not make any contribution to current horizontal - list---\emph{ins\_node}, \emph{adjust\_node}, - \emph{mark\_node}, \emph{whatsit\_node} and - \emph{penalty\_node}---, as shown in (4). - - -By the way, around a \emph{glyph\_node} $p$ there may be some nodes - attached to~$p$. These are an accent and kerns for - moving it to the right place, and a kern from the italic - correction\footnote{\TeX82 (and \LuaTeX) does not distinguish - between explicit kern and a kern for italic correction. To - distinguish them, an additional subtype for a kern is introduced - in \pTeX. On the other hand, \LuaTeX-ja uses an additional attribute and - redefines \texttt{\char`\\/} to set this attribute.} for $p$. It is natural that - these attachments should be ignored inside the process. Hence - \LuaTeX-ja takes this approach, as the latest version of - \pTeX\ (version~p3.2). This explains (2) in the Table~\ref{tab-jfmglue}. - -Summerizing above, one should put an empty horizontal box |\hbox{}| to - where he/she wants to interrupt the insertion process in - \LuaTeX-ja as (3) in the Table~\ref{tab-jfmglue}. - -\item[Fonts with the Same Metric] -Recall that \LuaTeX-ja separates `real' fonts and metrics, as in Subsection~\ref{ssec-sepmet}. -Consider the following input, where all Japanese fonts use same metric - (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family for - the current Japanese font family: -\begin{quote} -\begin{verbatim} -明朝)\gt (ゴシック -\end{verbatim} -\end{quote} -If the above input is processed by \pTeX, because the insertion process is - interrupt by |\gt|, the result looks like -\begin{quote} -\mc 明朝)\hbox{}\gt (ゴシック -\end{quote} -However this seems to be unnatural, since two Japanese fonts in the - output use the same metric, i.e.,~the same - typesetting rule. Hence, we decided that Japanese fonts with - the same metric are treated as one font in the insertion - process of \LuaTeX-ja. Thus, the output from the above input - in \LuaTeX-ja looks like: -\begin{quote} -\mc 明朝)\gt (ゴシック -\end{quote} -One might have the situation that this default behavior is not - suitable. \LuaTeX-ja offers a way to handle this situation, but - we leave it to the manual~\cite{man}. - -\item[Fonts with Different Metrics] -The case where two consecutive Japanese characters use different metrics and/or - different size is similar. Consider the following input where - the \emph{mincho} family and the \emph{gothic} family use - different metrics: -\begin{quote} -\begin{verbatim} -漢)\gt (漢)\large (大 -\end{verbatim} -\end{quote} -As the previous paragraph, this input yields the following, by \pTeX: -\begin{quote} -\mc 漢)\hbox{}\gt (漢)\hbox{}\large (大 -\end{quote} -We had thought that amounts of spaces between parentheses in above output - are too much. Hence we have changed the default behavior of - \LuaTeX-ja, so that the amount of a glue between two Japanese - characters with different metrics is the \emph{average} of a glue - from the left character and that from the right - character. For example, Figure~\ref{fig-diffmet} shows the - output from above input. The width of glue indicated `(1)' is - $(a/2 + a/2)/2 = 0.5a$, and the width of glue indicated `(2)' - is $(a/2 + 1.2a/2)/2 = 0.55a$. This default behavior can be - changed by \textsf{diffrentmet} parameter of \LuaTeX-ja. - -\begin{figure} -\begin{center} -\fontsize{40}{40}\selectfont -\imagfm{\jstrut\smash{% - \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr漢\cr - \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\ - \hrulefill\vrule height .5ex depth .5ex\cr}}}}% -\imagfm{\jstrut )\inhibitglue}% -\hbox to .5\zw{\hss\normalsize (1)\hss}% -\imagfm{\jstrut\inhibitglue\gt (}% -\imagfm{\jstrut\gt 漢}% -\imagfm{\jstrut\gt )\inhibitglue}% -\hbox to .55\zw{\hss\normalsize (2)\hss}% -\imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}% -\imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{% - \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr - \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$1.2a$}\ - \hrulefill\vrule height .5ex depth .5ex\cr}}}} -\end{center} -\caption{Fonts with different metrics.} -\label{fig-diffmet} -\end{figure} - -\item[\emph{kanjiskip} and \emph{xkanjiskip}] -In \pTeX, the value of \emph{xkanjiskip} is controlled by a skip named - |\xkanjiskip|. A well-known defect of this implementation is - that the value of \emph{xkanjiskip} is not connected with the - size of the currnt Japanese font. It seems that |EXTRASPACE|, - |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are - reserved for specifying the default value of - \emph{xkanjiskip} in a unit of the design size, but \pTeX\ - did not use these parameters, actually. - -Considering this situation of p\TeX, \LuaTeX-ja can use the value of - \emph{xkanjiskip} that specified in a metric. If the value of - \emph{xkanjiskip} on user side (this is the value of - \textsf{xkanjiskip} parameter of |\ltjsetparameter|) is - |\maxdimen|, then \LuaTeX-ja use the specification from - the current used metric as the actual value of - \emph{xkanjiskip}. This description also applies for \emph{kanjiskip}. -\end{description} - -\section{Distinction of characters} -\label{sec:distinction_of_characters} Since \LuaTeX\ can handle Unicode -characters natively, it is a major problem that how we distinguish -Japanese characters and alphabetic characters. For example, the -multiplication sign (U+00D7) exists both in ISO-8859-1 (hence in Latin-1 -Supplement in Unicode) and in the basic Japanese character set -JIS~X~0208. It is not desirable that this character is always treated as -an alphabetic character, because this symbol is often used in the sense -of `negative' in Japan. - -\subsection{Character ranges} -Before we describe the approach taken is \LuaTeX-ja, we review the -approach taken by u\pTeX. u\pTeX\ extends the |\kcatcode| primitive in -\pTeX, to use this primitive for setting how a character is treated -among alphabetic characters~(15), \emph{kanji}~(16), \emph{kana}~(17), -\emph{kanji}, \emph{Hangul}~(17), or~\emph{other CJK characters}~(18). -The assignment to |\kcatcode| can be done by a Unicode -block.\footnote{There are some exceptions. For example, U+FF00--FFEF -(Halfwidth and Fullwidth Forms) are divided into three blocks in recent -u\pTeX.} - -\LuaTeX-ja adopted a different approach. There are many Unicode blocks - in Basic Multilingual Plane which are not included in - Japanese fonts, therefore it is inconvenient if we process by a Unicode - block. Furthermore, JIS~X~0208 are not just union of Unicode - blocks; for example, the intersection of JIS~X~0208 and - Latin-1 Supplement is shown in - Table~\ref{tab-inter}. Considering these two points, to - customize the range of Japanese characters in \LuaTeX-ja, one - has to define ranges of character codes in his source in advance. - - -\begin{table} -\caption{Intersection of JIS~X~0208 and Latin-1 Supplement.} -\label{tab-inter} -\begin{center} -\begin{tabular}{llll} -\ltjjachar"A7 (U+00A7),& -\ltjjachar"A8 (U+00A8),& -\ltjjachar"B0 (U+00B0),& -\ltjjachar"B1 (U+00B1),\\ -\ltjjachar"B4 (U+00B4),& -\ltjjachar"B6 (U+00B6),& -\ltjjachar"D7 (U+00D7),& -\ltjjachar"F7 (U+00F7) -\end{tabular} -\end{center} -\end{table} - - -We note that \LuaTeX-ja offers two additional control sequences, - |\ltjjachar| and |\ltjalchar|. They are similar to |\char| - primitive, however |\ltjjachar| always yields a Japanese character, provided that - the argument is more than or equal to 128, and |\ltjalchar| always - yields an alphabetic character, regardless of the argument. - -\subsection{Default setting of ranges} -Patches for plain \TeX\ and \LaTeXe\ of \LuaTeX-ja predefine 8~character -ranges, as shown in Table~\ref{tab-chrrng}. Almost of these ranges are -just the union of Unicode blocks, and determined from the Adobe-Japan1-6 -character collection~\cite{aj16}, and JIS~X~0208. Among these 8~ranges, -the ranges~2, 3, 6, 7, and~8 are considered ranges of Japanese -characters, and others are considered ranges of alphabetic -characters.\footnote{Note that ranges 3~and~8 are considered ranges of -alphabetic characters in this paper.} We remark on ranges 2~and~8: -\begin{description} -\item[The range~2] -JIS~X~0208 includes Greek letters and Cyrillic letters, however, these - letters cannot be used for typesetting Greek or Russian, of - course. Hence it is reasonable that Greek letters and - Cyrillic consist another character range. -\item[The range~8] -If one want to use 8-bit TFMs, such as T1 or TS1 encodings, he should - mark this range~8 as a range of alphabetic characters by -\begin{quote} -|\ltjsetparameter{jacharrange={-8}}| -\end{quote} -This is because some 8-bit TFMs have a glyph in this range; for example, - the character `\OE' is located at |"D7| in the T1 encoding. %" -\end{description} - - -\begin{table} -\caption{Predefined ranges in \LuaTeX-ja.} -\label{tab-chrrng} -\begin{center} -\begin{tabular}{@{\bf}rl} -1&(Additional) Latin characters which are not belonged in the range~8.\\ -2&Greek and Cyrillic letters.\\ -3&Punctuations and miscellaneous symbols.\\ -4&Unicode blocks which does not intersect with Adobe-Japan1-6.\\ -5&Surrogates and supplementary private use Areas.\\ -6&Characters used in Japanese typesetting.\\ -7&Characters possibly used in CJK typesetting, but not in Japanese.\\ -8&Characters in Table~\ref{tab-inter}. -\end{tabular} -\end{center} -\end{table} - -\subsection{Control sequences producing Unicode characters} -\label{ssec-unichar} - -The \emph{fontspec} package\footnote{Preciously saying, it is the -\emph{xunicode} package, originally a package for \XeTeX and -automatically loaded by the \emph{fontspec} package.} offers various -control sequences that produce Unicode characters. However, these -control sequences as it stands cannot work correctly with the default -range setting of \LuaTeX-ja. For example, |\textquotedblleft| is just -an abbreviation of |\char"201C\relax|, and the character U+201C (LEFT %" -DOUBLE QUOTATION MARK) is treated as an Japanese character, because it -belongs to the range~3. This problem is resolved by using |\ltjalchar| -instead of the |\char| primitive. It is included in an optional package -named \texttt{luatexja-\penalty0fontspec.sty}. Figure~\ref{fig-unitxt} -shows several ways o typeset a character , both as a Japanese character -and as as an alphabetic characters. - -\begin{figure} -\begin{LTXexample} -×, \char`×, % depend on range setting -\ltjalchar`×, % alphabetic char -\ltjjachar`×, % Japanese char -\texttimes % alph. char (by fontspec) -\end{LTXexample} -\caption{Control sequences producing a Unicode character.} -\label{fig-unitxt} -\end{figure} - -The situation looks similar in math formulas, but in fact it differs. -Each control sequence that represents an ordinary symbol defined by the -\emph{unicode-math} package is just synonym of a character. For example, -the meaning of |\otimes| is just the character U+2297 (CIRCLED TIMES), -which is included in the range~3. However, it is difficult to define a -control sequence like |\ltjalUmathchar| as a counterpart of -|\Umathchar|, since an input like `|\sum^\ltjalUmathchar ...|' has to be -permitted. - -However, we couldn't develop a satisfactory solution to this problem in -time for this paper, due to a lack of time. We are just testing a -solution below: -\begin{itemize} -\item \LuaTeX-ja has a list of character codes which will be always reated as - alphabetic characters in math mode. Considering 8-bit TFMs for - math symbols, this list includes natural numbers between |"80| and - |"FF| by default. -\item Redefine internal commands defined in the \emph{unicode-math} - package so that -codes of characters which are mentioned in the \emph{unicode-math} - package will be included in the list. -\end{itemize} - - -We would like to extend treatments described in this subsection to 8-bit -font encodings, but we leave it to further development too. - -\section{Current status of development} -\label{sec:current_status} -At the moment, \LuaTeX-ja can be used under plain \TeX, and under -\LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by -|\input| command or |\usepackage| (in~\LaTeXe), if you merely want to -typeset Japanese characters. We look more detail by parts. - -\subsection{`Engine extension'} -The lowest part of \LuaTeX-ja corresponds to the \pTeX\ extension as -\emph{an engine extension of \TeX}. We, the project menbers, think that -this part is almost done. There is one more feature of \LuaTeX-ja which -we are going to explain: - -\begin{description} -\item[Shifting Baseline] -In order to make a match between Japanese fonts and alphabetic fonts, - sometimes shifting the baseline of alphabetic characters may - be needed. \pTeX\ has a dimension |\ybaselineshift|, which - corresponds to the amount of shifting down the baseline of alphabetic - characters. This is useful for Japanese-based documents, but - not for documents mainly in languages with alphabetic - characters. - -Hence, \LuaTeX-ja extends \pTeX's |\ybaselineshift| to Japanese - characters. Namely, \LuaTeX-ja offers two parameters, - \textsf{yjabaselineshift} and \textsf{yalbaselineshift}, for the - amount of shifting the baseline of Japanese characters and - that of alphabetic characters, respectively. -\begin{figure} -\begin{center} -\fontsize{40}{40}\selectfont\fboxsep0mm -\vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth -\hbox to 0.9\linewidth{% -\hfil -\raise-10pt\imagfm{\jstrut 漢}% -\raise-10pt\imagfm{\jstrut 字}\hskip.25\zw% -\imagfm{p}% -\imagfm{h}% -\hfil\hfil -\imagfm{\jstrut 漢}% -\imagfm{\jstrut 字}\hskip.25\zw% -\raise-10pt\imagfm{p}% -\raise-10pt\imagfm{h}% -\hfil -} -\end{center} - -\caption{First example of shifting baseline.} -\label{fig-bls} -\end{figure} - -\begin{figure} -\begin{center} -\fontsize{30}{30}\selectfont\fboxsep0mm -\vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth -\hbox to 0.9\linewidth{% -\hfil -\imagfm{a}% -\imagfm{b}\hskip.25\zw% -\imagfm{\jstrut 本}% -\imagfm{\jstrut 文}\hskip.33333\zw% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut\inhibitglue (}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 注}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 釈}\hskip.1666667\zw% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont c}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont o}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont e}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont n}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont t}% -\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut )\inhibitglue}% -\hskip.33333\zw% -\imagfm{\jstrut 本}% -\imagfm{\jstrut 文}% -\hfil -} -\end{center} - -\caption{Second example of shifting baseline.} -\label{fig-small} -\end{figure} - -An example output is shown in Figure~\ref{fig-bls}. The left half is the - output when \textsf{yjabaselineshift} is positive, hence the - baseline of Japanese characters is shifted down. On the other - hand, the right half is the output when - \textsf{yalbaselineshift} is positive, hence the baseline of - alphabetic characters is shifted down. Figure~\ref{fig-small} - shows an intresting use of these parameters. - -\end{description} -Note that \LuaTeX-ja doesn't support vertical typesetting, \emph{tategaki}, for now. - -\subsection{Patches for plain \TeX\ and \LaTeXe} -\pTeX\ has a patch for plain \TeX, namely |ptex.tex|, that for \LaTeXe\ -macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and -|kinsoku.tex| which includes the default setting of \emph{kinsoku -shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except -the codes related to vertical typesetting, because \LuaTeX-ja doesn't -support vertical typesetting yet. We remark one point related to the -porting: -\begin{description} - -\item[Behavior of\/ {\tt\char92fontfamily\/}] -The control sequence |\fontfamily| in p\LaTeXe\ changes the current alphabetic - font family and/or the current Japanese font family, - depending the argument. More concretely, - |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the - current alphabetic font family to $\langle\hbox{\it - arg\/}\rangle$, if and only if one of the following - conditions are satisfied: -\begin{itemize} -\item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in - \emph{some} alphabetic encoding is already defined in the document. -\item There exists an alphabetic encoding $\langle\hbox{\it - enc\/}\rangle$ already defined in the document such that a font - definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it - arg\/}\rangle$|.fd| (all lowercase) exists. -\end{itemize} -The same criterion is used for changing Japanese font family. - -To work this behavior well, a list of all (alphabetic) encodings defined - already in the document is needed. However, since \LuaTeX-ja - is loaded as a package, \LuaTeX-ja cannot have this list. - Hence \LuaTeX-ja adopted a different approach, namely - |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the - current alphabetic font family to $\langle\hbox{\it - arg\/}\rangle$, if and only if: -\begin{itemize} -\item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ - in the current alphabetic encoding $\langle\hbox{\it - enc\/}\rangle$ is already defined in the document. -\item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it - arg\/}\rangle$|.fd| (all lowercase) exists. -\end{itemize} - - -\end{description} - - - -\subsection{Classes for Japanese documents} -To produce `high-quality' Japanese documents, we need not only that -Japanese characters are correctly placed, but also class files for -Japanese documents. Two major families of classes are widely used in Japan: -\emph{jclasses} which is distributed with the official p\LaTeXe\ macros, -and \emph{jsclasses}. At the present, \LuaTeX-ja -simply contains their counterparts: \emph{ltjclasses} and -\emph{ltjsclasses}. However, the policy on classes is not determined -now, and we hope to have another family of classes which are useful for -commercial printing. In the author's opinion, \emph{ltjclasses} is -better to stay as an example of porting of class files for \pTeX\ to -\LuaTeX-ja. - -\subsection{Patches for packages} -Apart from patches for the \LaTeXe~kernel and classes for Japanese -documents, we need to make patches for several packages. At the present, -we considered the following packages, and made patches or porting for -the former two packages. - -\begin{description} -\item[The \emph{fontspec} package] The \emph{fontspec} package is built - on NFSS2, hence control sequences offered by the - \emph{fontspec} package, such as |\setmainfont|, are only - effective for alphabetic fonts if \LuaTeX-ja is loaded. - \texttt{luatexja-\penalty0fontspec.sty} (not automatically - loaded) offers these counterparts for Japanese fonts, with - additional `j' in the name of control sequences, such as - |\setmainjfont|. As described in - Subsection~\ref{ssec-unichar}, it also includes a patch for - control sequences producing Unicode characters. - -\item[The \emph{otf} package] -This package is widely used in \pTeX\ for typesetting characters which is -not in JIS~X~0208, and for using more than one weight in \emph{mincho} -and \emph{gothic} font families. Therefore \LuaTeX-ja supports features -in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty} - manually. Note that characters by |\UTF{xxxx}| and - |\CID{xxxx}| are not appended to the current list as a - \emph{glyph\_node}, to avoid from callbacks by the - \emph{luaotfload} package. We have another remark; |\CID| - does not work with TrueType fonts, since |\CID| use the - conversion table between CID and the glyph order of the - current Japanese font. - -\item[The \emph{listings} package] -It is known for users of \pTeX\ that there is a patch |jlisting.sty| for - the \emph{listings} package, to use Japanese characters in - the |lstlisting| environment. Generally speaking, it also can - be used in \LuaTeX-ja. However, it seems to be that a - Japanese character after a space does not recieve any process - of the \emph{listings} package; this is inconvinient when we - use the \emph{showexpl} package. - -There is another way to use characters above 256 with the - \emph{listings} package (described in\cite{apl}). However, - this method is not suitable for Japanese, since the number of - Japanese characters is very large. We hope that the - \emph{listings} package will be able to handle all characters above - 256 without any patch, in the future. - - -\end{description} - - - -\section{Implementation} -\label{sec:implementation} -\subsection{Handling of Japanese fonts} -In \pTeX, there are three slots for maintaining current fonts, namely -|\font| for alphabetic fonts, |\jfont| for Japanese fonts (in horizontal -direction) and |\tfont| for Japanese fonts (in vertical direction). With -these slots, we can manage the current font for alphabetic characters -and that for Japanese characters separately in \pTeX. However, \LuaTeX\ -has only one slot for maintaining the current font, as \TeX82. This -situation leads a problem: how can we maintain the `current Japanese -font'? - -There are three approaches for this problem. One approach is to make a -mapping table from alphabetic fonts to corresponding Japanese fonts -(here we don't assume that NFSS2 is available). Another approach is -that we always use composite fonts with alphabetic fonts and Japanese -fonts. The third approach is that the information of the current -Japanese font is stored in an attribute. We adopted the third approach, -since \LuaTeX-ja is much affected by \pTeX\ as we noted in -Subsection~\ref{ssec-pol}. - -As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining -Japanese fonts, as \pTeX. However, because the information of the current -Japanese font is stored into an attribute, control sequences defined by -|\jfont| (e.g.,~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is -not representing a font by the means of \TeX82. In other words, each of -these control sequences is just an assignment to an attribute, therefore -they cannot be an argument of |\the|, |\fontname|, nor |\textfont|. - - -Callbacks by the \emph{luaotfload} package, e.g.,~replacement of glyphs -according to OpenType font features, are performed just after `Examination of -stack level' (see Subsections -\ref{ssec-over}~and~\ref{ssec-stack}). Also note that calculation of -character classes for each Japanese character is done \emph{after} the -these callbacks for now. - -\subsection{Stack management} -\label{ssec-stack} - -As we noted in Subsection~\ref{ssec-csname}, parameters that the values -at the end of a horizontal box or that of a paragraph are valid in -whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented -by internal integers or registers of other types in \TeX. We explain it -in this subsection. - -\begin{figure} -\begin{lstlisting} -void package(int c) -{ - ... - d = box_max_depth; - unsave(); - save_ptr -= 4; - if (cur_list.mode_field == -hmode) { - cur_box = filtered_hpack(cur_list.head_field, - cur_list.tail_field, saved_value(1), - saved_level(1), grp, saved_level(2)); - subtype(cur_box) = HLIST_SUBTYPE_HBOX; - } else { -\end{lstlisting} -\caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX.} -\label{fig-ltsrc} -\end{figure} - -Figure~\ref{fig-ltsrc} is an extract of a CWEB-source -\texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function -is called just when an explicit |\hbox{...}| or |\vbox{...}| is ended, and -the function |filtered_hpack()| is where the |hpack_filter| and then the -actual `hpack' process are performed. Notice that the |unsave()| -function is called before |filtered_hpack()|. This is the problem; -because of |unsave()|, we can retrive only the values of registers -\emph{outside} the box, even in the |hpack_filter| callback. - -To cope with this problem, \LuaTeX-ja has its own stack system, based on -Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose -\emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be -appended to the current horizontal list each time the current stack -level is incremented, and their values are the values of -|\currentgrouplevel| at that time. In the beginning of the |hpack_filter| -callback, the list in question is traversed to determine whether the -stack level at the end of the list and that outside the box coincides. - -Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current -stack level, both inside the |hpack_filter| callback, i.e.,~outside a -horizontal box. Consider a list which represents the content of the box, -then we have: -\begin{itemize} -\item A \emph{stack\_node} whose value is $x+1$ (because all materials - in the box are included in a group |\hbox{...}|, the value of - |\currentgrouplevel| inside the box is at least $x+1$) in the list - corresponds to an assignment related to the stack system in just - top-level of the list, like -\begin{quote} -\begin{verbatim} -\hbox{...(assignment)...} -\end{verbatim} -\end{quote} -In this case, the current stack level is incremented to $y+1$ after the assignment. -\item A \emph{stack\_node} whose value is more than $x+1$ in the list corresponds -to an assignment inside another group contained in the box. For example, - the following input creates -a \emph{stack\_node} whose value is $x+3=(x+1)+2$: -\begin{quote} -\begin{verbatim} -\hbox{...{...{...(assignment)}...}...} -\end{verbatim} -\end{quote} -\end{itemize} -Thus, we can conclude that the stack level at the end of the list is -$y+1$, if and only if there is a \emph{stack\_node} whose value is -$x+1$. Otherwise, the stack level is just $y$. - -\subsection{Adjustment of the position of Japanese characters} -\label{ssec-width} - -The size of a glyph specified in a metric and that of a real font -usually differ. For example, the letter `\inhibitglue【' is half-width -in |jfm-ujis.lua| or |jis.tfm|, while this letter is full-width like `【' -in most TrueType fonts used in Japanese typesetting, such as -IPA~Mincho. Hence the adjustment of position of such glyphs is -needed. In the context of \pTeX, this process was performed using virtual fonts. - -On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph -into a horizontal box. There are two main reasons why we adopted this -method; one is that we feared Lua codes for coexisting with callbacks by -the |luaotfload| package would be large if we use virtual fonts, and the -other is to cope with shifting of the baseline of characters at the -same time. - -\begin{figure} -\begin{center}\unitlength=9pt\small -\begin{picture}(15,12)(-1,-3) - -\color{grayx}% real glyph -\put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength} - -\color{black}% real glyph :step1 -\thicklines -\put(-1,-1.5){\line(0,1){7}\line(0,-1){2.5}} -\put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}} -\put(-1,5.5){\line(1,0){6}} -\put(-1,-4){\line(1,0){6}} -\put(-1,0){\makebox(0,0)[r]{\strut$R$\,}} - -\thicklines -\put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}} -\put(12,9){\makebox(0,0)[rt]{\strut$M$\,}} -\put(12,0){\line(0,1){9}\vector(0,-1){3}} -\put(0,9){\line(1,0){12}} -\put(0,-3){\line(1,0){12}} -\put(0.2,4.5){\makebox(0,0)[l]{\texttt{height}}} -\put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}} -\put(6,0.2){\makebox(0,0)[b]{\texttt{width}}} - -\thicklines -\put(3,0){\line(0,1){7}\line(0,-1){2.5}\line(1,0){6}} -\put(9,0){\line(0,1){7}\line(0,-1){2.5}} -\put(3,7){\line(1,0){6}} -\put(3,-2.5){\line(1,0){6}} -\newsavebox{\eqdist} -\savebox{\eqdist}(0,0)[c]{% - \thinlines - \put(-0.08,0.2){\line(0,-1){0.4}}% - \put(0.08,0.2){\line(0,-1){0.4}}} -\put(1.5,0){\usebox{\eqdist}} -\put(10.5,0){\usebox{\eqdist}} - -\thicklines -\put(3,-1.5){\vector(-1,0){4}} -\put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}} -\put(3,0){\vector(0,-1){1.5}} -\put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}} -\end{picture} -\end{center} -\caption{The position of the `real' glyph.} -\label{fig-pos} -\end{figure} - -Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is -the imaginary body specified in the metric, and a vertical -rectangle is the imaginary body of a real glyph. First, the real glyph -is aligned with respect to the width of $M$. In the figure, the real -glyph is aligned `middle'; this setting is useful for the full-width -middle dot `・'. We have other settings, `left' and `right'. -After that, it is shifted according to the value of |left| and |down|, -which are specified in the metric, too. The final position of the real glyph -is shown by the gray rectangle~$R$. If the amount of shifting the baseline is -not zero, $M$ (and hence the real glyph) is shifted by that amount. - -We would like to remark briefly on the vertical position of a real -glyph. A JFM (or a metric used in \LuaTeX-ja) and a real font used for -it may have different height or depth. In that case, it may look better -if the real glyph is shifted vertically to match the height-depth ratio -specified in the metric, while any vertical adjustment except the -adjustment by the |down| value does not performed in the present -implementation of \LuaTeX-ja . This situation is carefully studied by -Otobe~\cite{min10}. Here the policy on this problem is not determined -now, however we would like to offer several solutions in future -development. - -\section{Conclusion} -We have discussed about our \LuaTeX-ja package, which is much affected -by \pTeX. For now, it can be used for experimental use, however there -are much refinements which are needed for regular use. The author hopes -that this paper and \LuaTeX-ja project contribute the typesetting Japanese, -and possibly other Asian languages, under \LuaTeX. - -\section*{Acknowledgements} -The author would like to thank Ken Nakano and Hideaki Togashi for their -development of ASCII \pTeX. The author is very grateful to Haruhiko -Okumura for his leadership in the Japanese \TeX\ community. The author -is also very grateful to members of \LuaTeX-ja project team for their -valuable cooperation in development. - -%%% The style of the bibiliogrphy is `amsplain'. -\providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace} -\providecommand{\href}[2]{#2} -\begin{thebibliography}{99} - -\bibitem{aj16} -Adobe Systems Incorporated, \emph{Adobe-Japan1-6 Character Collection - for CID-Keyed Fonts}, Technical Note~\#5078, 2004. -\url{http://partners.adobe.com/public/developer/en/font/5078.Adobe-Japan1-6.pdf} - -\bibitem{ptex} -ASCII MEDIA WORKS,アスキー日本語\TeX\ (\pTeX).\url{http://ascii.asciimw.jp/pb/ptex/} - -\bibitem{apl} -John Baker, \emph{Typesetting UTF8 APL code with the \LaTeX\ lstlisting package}. -\url{http://bakerjd99.wordpress.com/2011/08/15/} - -\bibitem{omega} -Jin-Hwan~Cho and Haruhiko Okumura, \emph{Typesetting CJK Languages with Omega}, -\TeX, XML, and Digital Typography, Lecture Notes in Computer Science, vol.~3130, -Springer, 2004, 139--148. - -\bibitem{joylua} -Yannis Haralambous. \emph{The Joy of \LuaTeX}. \url{http://luatex.bluwiki.com/} - -\bibitem{jisx4051} -Japanese Industrial Standards Committee. \emph{JIS~X~4051: Formatting - rules for Japanese documents}, 1993, 1995, 2004. - -\bibitem{eptex} -北川弘典,$\varepsilon$-\pTeX についてのwiki. -\url{http://sourceforge.jp/projects/eptex/wiki/FrontPage} - -\bibitem{luaums} -北川弘典,\LuaTeX で日本語. -\url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378} - -\bibitem{luatexref} -\LuaTeX\ development team, \emph{The \LuaTeX\ reference}. -\url{http://www.luatex.org/svn/trunk/manual/luatexref-t.pdf} (snapshot of SVN trunk) - -\bibitem{man} -\LuaTeX-ja project team, \emph{The \LuaTeX-ja package}. -Not completed for now. Available at |doc/man-en.pdf| (in English) or - |doc/man-ja.pdf| (in Japanese) -in the Git repository. - -\bibitem{luajp-test} -香田温人,\LuaTeX と日本語. -\url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html} - -\bibitem{luajalayout} -前田一貴,luajalayout パッケージ---Lua\LaTeX によ - る日本語組版---. -\url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/} - -\bibitem{jsclasses} -奥村晴彦,p\LaTeXe 新ドキュメントクラス. -\url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/} - -\bibitem{ptexjp} -Haruhiko Okumura, \emph{\pTeX\ and Japanese Typesetting}, - The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51. - -\bibitem{min10} -乙部厳己,min10フォントについて. -\url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf} - -\bibitem{otf} -齋藤修三郎,Open Type Font用VF. -\url{http://psitau.kitunebi.com/otf.html} - -\bibitem{stack-mail} -Jonathan Sauer, \emph{[Dev-luatex] tex.currentgrouplevel}. -\url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html} - -\bibitem{uptex} -Takuji Tanaka, \emph{u\pTeX, up\LaTeX---unicode version of \pTeX, p\LaTeX}. -\url{http://homepage3.nifty.com/ttk/comp/tex/uptex_en.html} - -\bibitem{ptexenc} -Nobuyuki Tsuchimura, \emph{Development of a Japanese \TeX\ Distribution~`ptetex3'}, -Computer Software\ \textbf{24} (2007), no.~4, 40--50, (in Japanese). - -\bibitem{w3c} -W3C Working Group, \emph{Requirements for Japanese Text Layout}. -\url{http://www.w3.org/TR/jlreq/} -\end{thebibliography} - -\end{document} +%#!lualatex ajt-devel-ltja +\documentclass{ajt} + +%%% Packages used in this paper + +%%% Font setting for \LuaTeX; this is extract from ajt.cls +\makeatletter + \if@print + \RequirePackage{fontspec,xunicode} + \RequirePackage{luatextra} + \setmainfont[Mapping=tex-text]{Palatino LT Std} + \setsansfont[Mapping=tex-text]{Optima LT Std} + \else + \RequirePackage{fontspec,luatextra} + \setmainfont[Mapping=tex-text]{TeX Gyre Pagella} % \simeq Palatino + \fi + +%%% LuaTeX-ja +\usepackage{luatexja,luatexja-fontspec} +\ltjsetparameter{jacharrange={-3,-8}} +\DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.92489] file:ipam.ttf:jfm=ujis}{} +\DeclareFontShape{JY3}{gt}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=ujis}{} +% quick hack: monospaced Japanese font by \ttfamily +\DeclareKanjiFamily{JY3}{\ttdefault}{}{} +\DeclareFontShape{JY3}{\ttdefault}{m}{n}{<-> s*[0.92489] file:ipag.ttf:jfm=mono}{} + + +%%% LTXexample environment +\usepackage{showexpl,lltjlisting} +\lstset{basicstyle=\ttfamily\small, width=0.3\textwidth, basewidth=.5em} + +%%% Verbatim environment +\usepackage{fancyvrb} +\CustomVerbatimEnvironment{code}{Verbatim}% +{numbers=left,xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small} +\CustomVerbatimEnvironment{codewithoutnum}{Verbatim}% +{xleftmargin=1.5em,baselinestretch=1.069,fontsize=\small} +\CustomVerbatimEnvironment{codewithoutnumsmall}{Verbatim}% +{xleftmargin=1.5em,baselinestretch=1.0,fontsize=\footnotesize} +\DefineShortVerb{\|} + +%%% Others +\usepackage{mflogo,booktabs} +\definecolor{grayx}{gray}{0.85} +\hyphenation{ + kanjiskip + xkanjiskip +} + +%%% Mandatory article metadata %%% +\title{Development of \LuaTeX-ja package} +\author[北川 弘典]{Hironori Kitagawa} +\address{\LuaTeX-ja project team} +\email{h\_kitagawa2001@yahoo.co.jp} + +\keywords{\TeX, p\TeX, \LuaTeX, \LuaTeX-ja, Japanese} +\abstract{% +\LuaTeX-ja package is a macro package for typesetting Japanese +documents under \LuaTeX. The package has more flexibility of +typesetting than \pTeX, which is widely used Japanese extension of \TeX, +and has corrected some unwanted features of \pTeX. +In this paper, we describe specifications, the current status and some +internal processing methods of \LuaTeX-ja. +} + +\newcommand{\parname}[1]{\textsf{#1}} +\newcommand{\jstrut}{\vrule width0pt height\cht depth\cdp} +\newcommand{\imagfm}[1]{\ifvmode\leavevmode\fi% + \hbox{\fboxsep=0pt\fbox{\setbox0=\hbox{#1}\copy0\kern-\wd0 + \smash{\vrule width \wd0 height 0.4pt depth0.4pt}}}} +\begin{document} + +%%% Do not forget to start with \maketitle! +\maketitle + +\section{Introduction} +\subsection{History} +To typeset Japanese documents with \TeX, ASCII \pTeX~\cite{ptex} has +been widely used in Japan. There are other methods---for example, using +Omega and OTP~\cite{omega}, or with the CJK package---to do so, however, +these alternative methods did not become majority. The author thinks +that this is because \pTeX\ enables us to produce high-quality documents +(e.g.,~supporting vertical typesetting), and the appearance of \pTeX\ is +earlier than that of alternatives described above. + +However, \pTeX\ has been left behind from the extensions of \TeX\ such +as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In recent +years, the situation has become better, by development of +|ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}), +$\varepsilon$-\pTeX~\cite{eptex} by the author,~and u\pTeX~\cite{uptex} +by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, +to develop an engine extension localized for Japanese, is not wise. This +approach needs lots of work for \emph{each} engine. In addition, if we +use \LuaTeX, the necessity of an engine extension is getting smaller +because \LuaTeX\ has an ability to hook \TeX's internal process by using +Lua callbacks. + + +There were several experimental attempts to typeset +Japanese documents with \LuaTeX\ before. Here we cite three examples: +\begin{itemize} +\item |luaums.sty|~\cite{luaums} developed by the author. This + experimental package is for creating a certain Japanese-based presentation + with \LuaTeX. +\item the \emph{luajalayout} package~\cite{luajalayout}, formerly known as the + \emph{jafontspec} package, by Kazuki Maeda (前田一貴). This package is based on + \LaTeXe\ and \emph{fontspec} package. +\item the \emph{luajp-test} package~\cite{luajp-test}, a test package made by + Atsuhito Kohda (香田温人), based on articles on the web page~\cite{joylua}. +\end{itemize} +However, these packages are based on \LaTeXe, and do not have much +ability to control the typesetting rule. And it is inefficient that more +than one people separately develop similar packages. Development of the +\LuaTeX-ja package is started initially by the author and Kazuki Maeda, because of +these situations. + +\subsection{Development policy of \LuaTeX-ja} +\label{ssec-pol} +The first aim of \LuaTeX-ja project was to implement features (from the +`primitive' level) of \pTeX\ as macros under \LuaTeX, therefore \LuaTeX-ja is +much affected by \pTeX. However, as development proceeded, some +technical/conceptual difficulties arose. Hence we changed the aim +of the project as follows: +\begin{itemize} +\item\emph{\LuaTeX-ja offers at least the same flexibility of + typesetting that p\TeX\ has.} + + We are not satisfied with the ability of producing outputs conformed to + JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for + typesetting, or to a technical note~\cite{w3c} by W3C; + if one wants to produce very incoherent outputs for some reason, it + should be possible. +In this point, previous attempts of Japanese typesetting with \LuaTeX\ + which we cited in the previous subsection are inadequate. + +\pTeX\ has some flexibility of typesetting, by changing internal + parameters such as |\kanjiskip| or |\prebreakpenalty|, and by using + custom JFM (Japanese TFM). Therefore we decided to include these + functionality to \LuaTeX-ja. + +\item\emph{\LuaTeX-ja isn't mere re-implementation or porting of \pTeX; + some (technically and/or conceptually) inconvenient features of + \pTeX\ are modified.} + + We describe this point in more detail at the next section. +\end{itemize} + + +\subsection{Overview of the processes} +\label{ssec-over} +We describe an outline of \LuaTeX-ja's process in order. + +\begin{itemize} +\item In the |process_input_buffer| callback: treatment of breaking + lines after a Japanese character (in Subsection~\ref{ssec-line}). + +\item In the |hyphenate| callback: font replacement. + +\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the horizontal list. If + the character represented by $p$ is considered as a Japanese + character, the font used at $p$ is replaced by the value of + |\ltj@curjfnt|, an attribute for `the current Japanese font' + at~$p$. + +Furthermore, the subtype of $p$ is subtracted by 1 to suppress + hyphenation around $p$ by \LuaTeX, because later processes of + \LuaTeX-ja take care of all things about Japanese characters. + +\item In |pre_linebreak_filter| and |hpack_filter| callbacks: + +\begin{enumerate} +\item \LuaTeX-ja has its own stack system, and the current horizontal + list is traversed in this stage to determine what the level of + \LuaTeX-ja's internal stack at the end of the list is. We will + discuss it in Subsection~\ref{ssec-stack}. + +\item In this stage, \LuaTeX-ja inserts glues/kerns for Japanese + typesetting in the list. This is the core routine of \LuaTeX-ja. + We will discuss it in Subsections + \ref{ssec-jglue}~and~\ref{ssec-jspec} . + +\item To make a match between a metric and a real font, sometimes + adjustument of the position of (Japanese) glyphs are performed. + We will discuss it in Subsection~\ref{ssec-width}. +\end{enumerate} +\item In the |mlist_to_hlist| callback: treatment of Japanese characters + in math formulas. This stage is similar to adjustment of the + position of glyphs (see above), so we omit to describe this stage + from this paper. +\end{itemize} + +In this paper, a \emph{alphabetic character} means a non-Japanese +character. Similarly, we use the word an \emph{alphabetic font} as the +counterpart of a jJpanese font. + +\subsection{Contents of this paper} +Here we describe the contents of the rest of this paper briefly. In +Section~\ref{sec:differences_with_ptex}, we describe major differences +between \pTeX\ and \LuaTeX-ja. The next section, +Section~\ref{sec:distinction_of_characters}, is concentrated on a +problem how we distinguish between Japanese characters and alphabetic +characters. In Section~\ref{sec:current_status}, we show current +development status of the package. Finally, in +Section~\ref{sec:implementation}, we describe some internal routines of +\LuaTeX-ja. + +\subsection{General information of the project} +This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki +is located on +\url{http://sourceforge.jp/projects/luatex-ja/wiki/}. There is +no stable version on October 22, 2011, however a set of developer sources can be +obtained from the git repository. Members of the project team are as follows +(in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato, +Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda, +and~Shuzaburo Saito. + + +\section{Major differences with \pTeX} +\label{sec:differences_with_ptex} +In this section, we explain several major differences between \pTeX\ +and our \LuaTeX-ja. For general information of Japanese typesetting and the +overview of \pTeX, please see Okumura~\cite{ptexjp}. + + +\subsection{Names of control sequences} +\label{ssec-csname} Because \pTeX\ is an engine modification of Knuth's +original \TeX82 engine, some of the additional primitives take a form that is +very difficult to be simulated by a macro. For example, an additional +primitive |\prebreakpenalty|$\langle\hbox{\it +char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in \pTeX\ +sets the amount of penalty inserted before a character whose code is +$\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it +penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it +char\_code}\rangle$ can be also used for retrieving the value. + +Moreover, there are some internal parameters of \pTeX\ which values of them at the end of a +horizontal box or that of a paragraph are valid in whole box or +paragraph. However, the implementation of these parameters in +\LuaTeX-ja is not so easy; we will discuss it in Subsection~\ref{ssec-stack}. + +From above two problems discussed above, the assignment and retrieval +of most parameters in \LuaTeX-ja are summarized into the following +three control sequences: +\begin{itemize} +\item |\ltjsetparameter{|$\langle\hbox{\it + name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local + assignment. +\item |\ltjglobalsetparameter|: for global assignment. Note that these two control + sequences obey the value of |\globaldefs| primitive. +\item |\ltjgetparameter{|$\langle\hbox{\it + name}\rangle$|}[{|$\langle\hbox{\it optional + argument}\rangle$|}]|: for retrieval. The returned value is always + a string. +\end{itemize} + +\subsection{Line-break after a Japanese character} +\label{ssec-line} + +Japanese texts can break lines almost everywhere, in contrast with +alphabetic texts can break lines only between words (or use +hyphenation). Hence, \pTeX's input processor is modified so that a +line-break after a Japanese character doesn't emit a space. However, +there is no way to customize the input processor of \LuaTeX, other than +to hack its CWEB-source. All a macro package can do is to modify an input line before +when \LuaTeX\ begin to process it, inside the |process_input_buffer| +callback. + +Hence, in \LuaTeX-ja, a comment letter (we reserve U+FFFFF for this +purpose) will be appended to an input line, if this line ends with a Japanese +character.\footnote{Strictly speaking, it also requires that the catcode +of the end-line character is 5~(\emph{end-of-line}). This condition is +useful under the verbatim environment.} One might jump to a conclusion +that the treatment of a line-break by \pTeX\ and that of \LuaTeX-ja are +totally same, however they are different in the respect that \LuaTeX-ja's +judgement whether a comment letter will be appended the line is done +\emph{before} the line is actually processed by \LuaTeX. + +Figure~\ref{fig-linebreak} shows an example of this situation; the +command at the first line marks most of Japanese characters as +`non-Japanese characters'. In other words, from that command onward, the +letter `あ' will be treated as an alphabetic character by +\LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in +the output, where the actual output in the figure does not so. This is +because `あ' is considered a Japanese character by \LuaTeX-ja, +when \LuaTeX-ja does the decision whether U+FFFFF will be added to the +input line~2. + +\begin{figure} +\begin{LTXexample} +\font\x=IPAMincho \x +\ltjsetparameter{jacharrange={-6}}xあ +y +\end{LTXexample} +\caption{A notable sample showing the treatment of a line-break after a +Japanese character.}\label{fig-linebreak} +\end{figure} + +\subsection{Separation between `real' fonts and metrics} +\label{ssec-sepmet} + +Traditionally, most Japanese fonts used in typesetting are not +proportional, that is, most glyphs have same size (in most cases, +square-shaped). Hence, it is not rare that the contents of different +JFMs are essentially same, and only differ in their names. For example, +|min10.tfm| and |goth10.tfm|, which are JFMs shipped with \pTeX\ for +seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family, +differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and +|jisg.tfm|, which is included in the \emph{jis} font metric, which is +used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦), +are totally same as binary files. Considering this situation, we +decided to separate `real' fonts and metrics used for them in +\LuaTeX-ja. Typical declarations of Japanese fonts in the style of plain +\TeX\ are shown in Figure~\ref{fig-jfdef}. We would like to add several +remarks: +\begin{itemize} +\item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|. +\item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so + \hbox{\tt file:} and \hbox{\tt name:} prefixes, and various font features can be + used as the first line in Figure~\ref{fig-jfdef}. +\item The |jfm| key specifies the metric for the font. In + Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a + Lua script named |jfm-ujis.lua|. This metric is the standard + metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf} + package~\cite{otf}. +\item The \hbox{psft:} prefix can be used to specify name-only, non-embedded + fonts. When one displays a pdf with these fonts, actual fonts which + will be used for them depend on a pdf reader. +\end{itemize} +The specification of a metric for \LuaTeX-ja is similar to that of a JFM +(see \cite{ptexjp}); characters are grouped into several classes, the +size information of characters are specified for each class, and +glue/kern insertions are specified for each pair of classes. Although +the author have not tried, it may be possible to develop a program that +`converts' a JFM to a metric for \LuaTeX-ja. \LuaTeX-ja offers three +metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the +\emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|. + + Note that |-kern| in features +is important, because kerning information from a real font itself will +clash with glue/kern information from the metric. + +\begin{figure} +\begin{verbatim} +\jfont\foo=file:ipam.ttf:jfm=ujis;script=latn;-kern;+jp04 at 12pt +\jfont\bar=psft:Ryumin-Light:jfm=ujis at 10pt +\end{verbatim} +\caption{Typical declarations of Japanese fonts.} +\label{fig-jfdef} +\end{figure} + +\subsection{Insertion of glues/kerns for Japanese typesetting: timing} +\label{ssec-jglue} + +As described in \cite{luatexref}, \LuaTeX's kerning and ligaturing +processes are totally different from those of \TeX82. \TeX82's process is +done just when a (sequence of) character is appended to the current +list. Thus we can interrupt this process by writing as +|f{}irm|. However, \LuaTeX's process is \emph{node-based}, that is, the +process will be done when a horizontal box or a paragraph is ended, so +|f{}irm| and |firm| yield same outputs under \LuaTeX. + +The situation for Japanese characters is more complicated. +Glues (and kerns) which are needed for Japanese +typesetting are divided into the following three categories: +\begin{itemize} +\item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue}, + for short). + +\item Default glue between a Japanese character and an alphabetic + character (\emph{xkanjiskip}, for short), usually 1/4 of + full-width (\emph{shibuaki}) with some stretch and shrink for + justifying each line. +\item Default glue between two consecutive Japanese characters + (\emph{kanjiskip}, for short). The main reason of this glue is to + enable breaking lines almost everywhere in Japanese texts. In most + cases, its natural width is zero, and some stretch/shrink for + justifying each line. +\end{itemize} +In \pTeX, these three kinds of glues are treated differently. A JFM glue +is inserted when a (sequence of) Japanese character is appended to the +current list, same as the case of alphabetic characters in \TeX82. This +means that one can interrupt the insertion process by saying |{}|. A +\emph{xkanjiskip} is inserted just before `hpack' or line-breaking of a +paragraph; this timing is somewhat similar to that of \LuaTeX's kerning +process. Finally, A \emph{kanjiskip} is not appeared as a node anywhere; +only appears implicitly in calculation of the width of a horizontal box, +that of breaking lines, and the actual output process to a DVI +file. These specifications have made \pTeX's behavior very hard to +understand. + +\LuaTeX-ja inserts glues in all three categories simultaneously inside +|hpack_filter| and |pre_linebreak_filter| callbacks. The reasons of +this specification are to behave like alphabetic characters in \LuaTeX\ +(as described in the first paragraph in this subsection), and to clarify +the specification for \LuaTeX-ja's process. + +\subsection{Insertion of glues/kerns for Japanese typesetting: specification} +\label{ssec-jspec} + +\begin{table} +\caption{Examples of differences between \pTeX\ and \LuaTeX-ja.} +\label{tab-jfmglue} +\begin{center} +\begin{tabular}{llllllll} +\toprule +&\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}\\ +Input &|あ】{}【〕\/〔| &|い』\/a| &|う)\hbox{}(| &|え]\special{}[|\\\midrule +\pTeX &あ】\hbox{}【〕\hbox{}〔&い』\/a &う)\hbox{}( &え]\hbox{}[\\ +\LuaTeX-ja &あ】{}【〕\/〔 &い』\/a &う)\hbox{}( &え]\special{}[\\ +\bottomrule +\end{tabular} +\end{center} +\end{table} + +\begin{figure} +\begin{center} +\fontsize{40}{40}\selectfont +\imagfm{\jstrut あ}% +\imagfm{\jstrut 】\inhibitglue}% +\imagfm{\jstrut\kern.5\zw}% +\imagfm{\jstrut\kern.5\zw}% +\imagfm{\jstrut\inhibitglue【}% +\imagfm{\jstrut 〕\inhibitglue}% +\imagfm{\jstrut\kern.5\zw}% +\imagfm{\jstrut\kern.5\zw}% +\imagfm{\jstrut\inhibitglue〔}% +\end{center} +\caption{Detail of the output of \pTeX\ in the input~(1) in Table~\ref{tab-jfmglue}.} +\label{fig-ptexjfm} +\end{figure} + +Now we will take a look at the insertion process itself through four points. + +\begin{description} +\item[Ignored Nodes] +As noted in the previous subsection, the insertion process in \pTeX\ can + be interrupted by saying |{}| or anything else.\footnote{This + is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for + \texttt{min10.tfm} and other `old' JFMs work.} This leads the + second row in Table~\ref{tab-jfmglue}, or + Figure~\ref{fig-ptexjfm}. Here `the process is interrupted' + means that \pTeX\ does not think the letter `】\inhibitglue' + is followed by `\inhibitglue【', hence two half-width glues + are inserted between `】\inhibitglue' and `\inhibitglue【', + where the left one is from `】\inhibitglue' and the right one + is from `\inhibitglue【'. + + On the other hand, in \LuaTeX-ja, the process is done inside + |hpack_filter| and |pre_linebreak_filter| callbacks. Hence, + \emph{anything that does not make any node will be + ignored}\ in \LuaTeX-ja, as shown in (1) in + Table~\ref{tab-jfmglue}. \LuaTeX-ja also ignores any nodes + which does not make any contribution to current horizontal + list---\emph{ins\_node}, \emph{adjust\_node}, + \emph{mark\_node}, \emph{whatsit\_node} and + \emph{penalty\_node}---, as shown in (4). + + +By the way, around a \emph{glyph\_node} $p$ there may be some nodes + attached to~$p$. These are an accent and kerns for + moving it to the right place, and a kern from the italic + correction\footnote{\TeX82 (and \LuaTeX) does not distinguish + between explicit kern and a kern for italic correction. To + distinguish them, an additional subtype for a kern is introduced + in \pTeX. On the other hand, \LuaTeX-ja uses an additional attribute and + redefines \texttt{\char`\\/} to set this attribute.} for $p$. It is natural that + these attachments should be ignored inside the process. Hence + \LuaTeX-ja takes this approach, as the latest version of + \pTeX\ (version~p3.2). This explains (2) in the Table~\ref{tab-jfmglue}. + +Summerizing above, one should put an empty horizontal box |\hbox{}| to + where he/she wants to interrupt the insertion process in + \LuaTeX-ja as (3) in the Table~\ref{tab-jfmglue}. + +\item[Fonts with the Same Metric] +Recall that \LuaTeX-ja separates `real' fonts and metrics, as in Subsection~\ref{ssec-sepmet}. +Consider the following input, where all Japanese fonts use same metric + (in \LuaTeX-ja), and |\gt| selects \emph{gothic} family for + the current Japanese font family: +\begin{quote} +\begin{verbatim} +明朝)\gt (ゴシック +\end{verbatim} +\end{quote} +If the above input is processed by \pTeX, because the insertion process is + interrupt by |\gt|, the result looks like +\begin{quote} +\mc 明朝)\hbox{}\gt (ゴシック +\end{quote} +However this seems to be unnatural, since two Japanese fonts in the + output use the same metric, i.e.,~the same + typesetting rule. Hence, we decided that Japanese fonts with + the same metric are treated as one font in the insertion + process of \LuaTeX-ja. Thus, the output from the above input + in \LuaTeX-ja looks like: +\begin{quote} +\mc 明朝)\gt (ゴシック +\end{quote} +One might have the situation that this default behavior is not + suitable. \LuaTeX-ja offers a way to handle this situation, but + we leave it to the manual~\cite{man}. + +\item[Fonts with Different Metrics] +The case where two consecutive Japanese characters use different metrics and/or + different size is similar. Consider the following input where + the \emph{mincho} family and the \emph{gothic} family use + different metrics: +\begin{quote} +\begin{verbatim} +漢)\gt (漢)\large (大 +\end{verbatim} +\end{quote} +As the previous paragraph, this input yields the following, by \pTeX: +\begin{quote} +\mc 漢)\hbox{}\gt (漢)\hbox{}\large (大 +\end{quote} +We had thought that amounts of spaces between parentheses in above output + are too much. Hence we have changed the default behavior of + \LuaTeX-ja, so that the amount of a glue between two Japanese + characters with different metrics is the \emph{average} of a glue + from the left character and that from the right + character. For example, Figure~\ref{fig-diffmet} shows the + output from above input. The width of glue indicated `(1)' is + $(a/2 + a/2)/2 = 0.5a$, and the width of glue indicated `(2)' + is $(a/2 + 1.2a/2)/2 = 0.55a$. This default behavior can be + changed by \textsf{diffrentmet} parameter of \LuaTeX-ja. + +\begin{figure} +\begin{center} +\fontsize{40}{40}\selectfont +\imagfm{\jstrut\smash{% + \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr漢\cr + \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$a$}\ + \hrulefill\vrule height .5ex depth .5ex\cr}}}}% +\imagfm{\jstrut )\inhibitglue}% +\hbox to .5\zw{\hss\normalsize (1)\hss}% +\imagfm{\jstrut\inhibitglue\gt (}% +\imagfm{\jstrut\gt 漢}% +\imagfm{\jstrut\gt )\inhibitglue}% +\hbox to .55\zw{\hss\normalsize (2)\hss}% +\imagfm{\fontsize{48}{48}\selectfont\jstrut\gt\inhibitglue (}% +\imagfm{\fontsize{48}{48}\selectfont\jstrut\smash{% + \vtop{\lineskiplimit=\maxdimen\lineskip2pt\halign{#\cr\gt 大\cr + \small\vrule height .5ex depth .5ex\hrulefill\ \lower.5ex\hbox{$1.2a$}\ + \hrulefill\vrule height .5ex depth .5ex\cr}}}} +\end{center} +\caption{Fonts with different metrics.} +\label{fig-diffmet} +\end{figure} + +\item[\emph{kanjiskip} and \emph{xkanjiskip}] +In \pTeX, the value of \emph{xkanjiskip} is controlled by a skip named + |\xkanjiskip|. A well-known defect of this implementation is + that the value of \emph{xkanjiskip} is not connected with the + size of the currnt Japanese font. It seems that |EXTRASPACE|, + |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are + reserved for specifying the default value of + \emph{xkanjiskip} in a unit of the design size, but \pTeX\ + did not use these parameters, actually. + +Considering this situation of p\TeX, \LuaTeX-ja can use the value of + \emph{xkanjiskip} that specified in a metric. If the value of + \emph{xkanjiskip} on user side (this is the value of + \textsf{xkanjiskip} parameter of |\ltjsetparameter|) is + |\maxdimen|, then \LuaTeX-ja use the specification from + the current used metric as the actual value of + \emph{xkanjiskip}. This description also applies for \emph{kanjiskip}. +\end{description} + +\section{Distinction of characters} +\label{sec:distinction_of_characters} Since \LuaTeX\ can handle Unicode +characters natively, it is a major problem that how we distinguish +Japanese characters and alphabetic characters. For example, the +multiplication sign (U+00D7) exists both in ISO-8859-1 (hence in Latin-1 +Supplement in Unicode) and in the basic Japanese character set +JIS~X~0208. It is not desirable that this character is always treated as +an alphabetic character, because this symbol is often used in the sense +of `negative' in Japan. + +\subsection{Character ranges} +Before we describe the approach taken is \LuaTeX-ja, we review the +approach taken by u\pTeX. u\pTeX\ extends the |\kcatcode| primitive in +\pTeX, to use this primitive for setting how a character is treated +among alphabetic characters~(15), \emph{kanji}~(16), \emph{kana}~(17), +\emph{kanji}, \emph{Hangul}~(17), or~\emph{other CJK characters}~(18). +The assignment to |\kcatcode| can be done by a Unicode +block.\footnote{There are some exceptions. For example, U+FF00--FFEF +(Halfwidth and Fullwidth Forms) are divided into three blocks in recent +u\pTeX.} + +\LuaTeX-ja adopted a different approach. There are many Unicode blocks + in Basic Multilingual Plane which are not included in + Japanese fonts, therefore it is inconvenient if we process by a Unicode + block. Furthermore, JIS~X~0208 are not just union of Unicode + blocks; for example, the intersection of JIS~X~0208 and + Latin-1 Supplement is shown in + Table~\ref{tab-inter}. Considering these two points, to + customize the range of Japanese characters in \LuaTeX-ja, one + has to define ranges of character codes in his source in advance. + + +\begin{table} +\caption{Intersection of JIS~X~0208 and Latin-1 Supplement.} +\label{tab-inter} +\begin{center} +\begin{tabular}{llll} +\ltjjachar"A7 (U+00A7),& +\ltjjachar"A8 (U+00A8),& +\ltjjachar"B0 (U+00B0),& +\ltjjachar"B1 (U+00B1),\\ +\ltjjachar"B4 (U+00B4),& +\ltjjachar"B6 (U+00B6),& +\ltjjachar"D7 (U+00D7),& +\ltjjachar"F7 (U+00F7) +\end{tabular} +\end{center} +\end{table} + + +We note that \LuaTeX-ja offers two additional control sequences, + |\ltjjachar| and |\ltjalchar|. They are similar to |\char| + primitive, however |\ltjjachar| always yields a Japanese character, provided that + the argument is more than or equal to 128, and |\ltjalchar| always + yields an alphabetic character, regardless of the argument. + +\subsection{Default setting of ranges} +Patches for plain \TeX\ and \LaTeXe\ of \LuaTeX-ja predefine 8~character +ranges, as shown in Table~\ref{tab-chrrng}. Almost of these ranges are +just the union of Unicode blocks, and determined from the Adobe-Japan1-6 +character collection~\cite{aj16}, and JIS~X~0208. Among these 8~ranges, +the ranges~2, 3, 6, 7, and~8 are considered ranges of Japanese +characters, and others are considered ranges of alphabetic +characters.\footnote{Note that ranges 3~and~8 are considered ranges of +alphabetic characters in this paper.} We remark on ranges 2~and~8: +\begin{description} +\item[The range~2] +JIS~X~0208 includes Greek letters and Cyrillic letters, however, these + letters cannot be used for typesetting Greek or Russian, of + course. Hence it is reasonable that Greek letters and + Cyrillic consist another character range. +\item[The range~8] +If one want to use 8-bit TFMs, such as T1 or TS1 encodings, he should + mark this range~8 as a range of alphabetic characters by +\begin{quote} +|\ltjsetparameter{jacharrange={-8}}| +\end{quote} +This is because some 8-bit TFMs have a glyph in this range; for example, + the character `\OE' is located at |"D7| in the T1 encoding. %" +\end{description} + + +\begin{table} +\caption{Predefined ranges in \LuaTeX-ja.} +\label{tab-chrrng} +\begin{center} +\begin{tabular}{@{\bf}rl} +1&(Additional) Latin characters which are not belonged in the range~8.\\ +2&Greek and Cyrillic letters.\\ +3&Punctuations and miscellaneous symbols.\\ +4&Unicode blocks which does not intersect with Adobe-Japan1-6.\\ +5&Surrogates and supplementary private use Areas.\\ +6&Characters used in Japanese typesetting.\\ +7&Characters possibly used in CJK typesetting, but not in Japanese.\\ +8&Characters in Table~\ref{tab-inter}. +\end{tabular} +\end{center} +\end{table} + +\subsection{Control sequences producing Unicode characters} +\label{ssec-unichar} + +The \emph{fontspec} package\footnote{Preciously saying, it is the +\emph{xunicode} package, originally a package for \XeTeX and +automatically loaded by the \emph{fontspec} package.} offers various +control sequences that produce Unicode characters. However, these +control sequences as it stands cannot work correctly with the default +range setting of \LuaTeX-ja. For example, |\textquotedblleft| is just +an abbreviation of |\char"201C\relax|, and the character U+201C (LEFT %" +DOUBLE QUOTATION MARK) is treated as an Japanese character, because it +belongs to the range~3. This problem is resolved by using |\ltjalchar| +instead of the |\char| primitive. It is included in an optional package +named \texttt{luatexja-\penalty0fontspec.sty}. Figure~\ref{fig-unitxt} +shows several ways o typeset a character , both as a Japanese character +and as as an alphabetic characters. + +\begin{figure} +\begin{LTXexample} +×, \char`×, % depend on range setting +\ltjalchar`×, % alphabetic char +\ltjjachar`×, % Japanese char +\texttimes % alph. char (by fontspec) +\end{LTXexample} +\caption{Control sequences producing a Unicode character.} +\label{fig-unitxt} +\end{figure} + +The situation looks similar in math formulas, but in fact it differs. +Each control sequence that represents an ordinary symbol defined by the +\emph{unicode-math} package is just synonym of a character. For example, +the meaning of |\otimes| is just the character U+2297 (CIRCLED TIMES), +which is included in the range~3. However, it is difficult to define a +control sequence like |\ltjalUmathchar| as a counterpart of +|\Umathchar|, since an input like `|\sum^\ltjalUmathchar ...|' has to be +permitted. + +However, we couldn't develop a satisfactory solution to this problem in +time for this paper, due to a lack of time. We are just testing a +solution below: +\begin{itemize} +\item \LuaTeX-ja has a list of character codes which will be always reated as + alphabetic characters in math mode. Considering 8-bit TFMs for + math symbols, this list includes natural numbers between |"80| and + |"FF| by default. +\item Redefine internal commands defined in the \emph{unicode-math} + package so that +codes of characters which are mentioned in the \emph{unicode-math} + package will be included in the list. +\end{itemize} + + +We would like to extend treatments described in this subsection to 8-bit +font encodings, but we leave it to further development too. + +\section{Current status of development} +\label{sec:current_status} +At the moment, \LuaTeX-ja can be used under plain \TeX, and under +\LaTeXe. Generally speaking, one only has to read |luatexja.sty|, by +|\input| command or |\usepackage| (in~\LaTeXe), if you merely want to +typeset Japanese characters. We look more detail by parts. + +\subsection{`Engine extension'} +The lowest part of \LuaTeX-ja corresponds to the \pTeX\ extension as +\emph{an engine extension of \TeX}. We, the project menbers, think that +this part is almost done. There is one more feature of \LuaTeX-ja which +we are going to explain: + +\begin{description} +\item[Shifting Baseline] +In order to make a match between Japanese fonts and alphabetic fonts, + sometimes shifting the baseline of alphabetic characters may + be needed. \pTeX\ has a dimension |\ybaselineshift|, which + corresponds to the amount of shifting down the baseline of alphabetic + characters. This is useful for Japanese-based documents, but + not for documents mainly in languages with alphabetic + characters. + +Hence, \LuaTeX-ja extends \pTeX's |\ybaselineshift| to Japanese + characters. Namely, \LuaTeX-ja offers two parameters, + \textsf{yjabaselineshift} and \textsf{yalbaselineshift}, for the + amount of shifting the baseline of Japanese characters and + that of alphabetic characters, respectively. +\begin{figure} +\begin{center} +\fontsize{40}{40}\selectfont\fboxsep0mm +\vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth +\hbox to 0.9\linewidth{% +\hfil +\raise-10pt\imagfm{\jstrut 漢}% +\raise-10pt\imagfm{\jstrut 字}\hskip.25\zw% +\imagfm{p}% +\imagfm{h}% +\hfil\hfil +\imagfm{\jstrut 漢}% +\imagfm{\jstrut 字}\hskip.25\zw% +\raise-10pt\imagfm{p}% +\raise-10pt\imagfm{h}% +\hfil +} +\end{center} + +\caption{First example of shifting baseline.} +\label{fig-bls} +\end{figure} + +\begin{figure} +\begin{center} +\fontsize{30}{30}\selectfont\fboxsep0mm +\vrule width 0.9\textwidth height0.4pt depth0.4pt\kern-0.9\textwidth +\hbox to 0.9\linewidth{% +\hfil +\imagfm{a}% +\imagfm{b}\hskip.25\zw% +\imagfm{\jstrut 本}% +\imagfm{\jstrut 文}\hskip.33333\zw% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut\inhibitglue (}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 注}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut 釈}\hskip.1666667\zw% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont c}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont o}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont m}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont e}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont n}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont t}% +\raise3.514582pt\imagfm{\fontsize{20}{20}\selectfont\jstrut )\inhibitglue}% +\hskip.33333\zw% +\imagfm{\jstrut 本}% +\imagfm{\jstrut 文}% +\hfil +} +\end{center} + +\caption{Second example of shifting baseline.} +\label{fig-small} +\end{figure} + +An example output is shown in Figure~\ref{fig-bls}. The left half is the + output when \textsf{yjabaselineshift} is positive, hence the + baseline of Japanese characters is shifted down. On the other + hand, the right half is the output when + \textsf{yalbaselineshift} is positive, hence the baseline of + alphabetic characters is shifted down. Figure~\ref{fig-small} + shows an intresting use of these parameters. + +\end{description} +Note that \LuaTeX-ja doesn't support vertical typesetting, \emph{tategaki}, for now. + +\subsection{Patches for plain \TeX\ and \LaTeXe} +\pTeX\ has a patch for plain \TeX, namely |ptex.tex|, that for \LaTeXe\ +macro (this patch and \LaTeXe\ consist \emph{p\LaTeXe}), and +|kinsoku.tex| which includes the default setting of \emph{kinsoku +shori}, the Japanese hyphenation. We ported them to \LuaTeX-ja, except +the codes related to vertical typesetting, because \LuaTeX-ja doesn't +support vertical typesetting yet. We remark one point related to the +porting: +\begin{description} + +\item[Behavior of\/ {\tt\char92fontfamily\/}] +The control sequence |\fontfamily| in p\LaTeXe\ changes the current alphabetic + font family and/or the current Japanese font family, + depending the argument. More concretely, + |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the + current alphabetic font family to $\langle\hbox{\it + arg\/}\rangle$, if and only if one of the following + conditions are satisfied: +\begin{itemize} +\item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ in + \emph{some} alphabetic encoding is already defined in the document. +\item There exists an alphabetic encoding $\langle\hbox{\it + enc\/}\rangle$ already defined in the document such that a font + definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it + arg\/}\rangle$|.fd| (all lowercase) exists. +\end{itemize} +The same criterion is used for changing Japanese font family. + +To work this behavior well, a list of all (alphabetic) encodings defined + already in the document is needed. However, since \LuaTeX-ja + is loaded as a package, \LuaTeX-ja cannot have this list. + Hence \LuaTeX-ja adopted a different approach, namely + |\fontfamily{|$\langle\hbox{\it arg\/}\rangle$|}| changes the + current alphabetic font family to $\langle\hbox{\it + arg\/}\rangle$, if and only if: +\begin{itemize} +\item An alphabetic font family named $\langle\hbox{\it arg\/}\rangle$ + in the current alphabetic encoding $\langle\hbox{\it + enc\/}\rangle$ is already defined in the document. +\item A font definition file $\langle\hbox{\it enc\/}\rangle\langle\hbox{\it + arg\/}\rangle$|.fd| (all lowercase) exists. +\end{itemize} + + +\end{description} + + + +\subsection{Classes for Japanese documents} +To produce `high-quality' Japanese documents, we need not only that +Japanese characters are correctly placed, but also class files for +Japanese documents. Two major families of classes are widely used in Japan: +\emph{jclasses} which is distributed with the official p\LaTeXe\ macros, +and \emph{jsclasses}. At the present, \LuaTeX-ja +simply contains their counterparts: \emph{ltjclasses} and +\emph{ltjsclasses}. However, the policy on classes is not determined +now, and we hope to have another family of classes which are useful for +commercial printing. In the author's opinion, \emph{ltjclasses} is +better to stay as an example of porting of class files for \pTeX\ to +\LuaTeX-ja. + +\subsection{Patches for packages} +Apart from patches for the \LaTeXe~kernel and classes for Japanese +documents, we need to make patches for several packages. At the present, +we considered the following packages, and made patches or porting for +the former two packages. + +\begin{description} +\item[The \emph{fontspec} package] The \emph{fontspec} package is built + on NFSS2, hence control sequences offered by the + \emph{fontspec} package, such as |\setmainfont|, are only + effective for alphabetic fonts if \LuaTeX-ja is loaded. + \texttt{luatexja-\penalty0fontspec.sty} (not automatically + loaded) offers these counterparts for Japanese fonts, with + additional `j' in the name of control sequences, such as + |\setmainjfont|. As described in + Subsection~\ref{ssec-unichar}, it also includes a patch for + control sequences producing Unicode characters. + +\item[The \emph{otf} package] +This package is widely used in \pTeX\ for typesetting characters which is +not in JIS~X~0208, and for using more than one weight in \emph{mincho} +and \emph{gothic} font families. Therefore \LuaTeX-ja supports features +in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty} + manually. Note that characters by |\UTF{xxxx}| and + |\CID{xxxx}| are not appended to the current list as a + \emph{glyph\_node}, to avoid from callbacks by the + \emph{luaotfload} package. We have another remark; |\CID| + does not work with TrueType fonts, since |\CID| use the + conversion table between CID and the glyph order of the + current Japanese font. + +\item[The \emph{listings} package] +It is known for users of \pTeX\ that there is a patch |jlisting.sty| for + the \emph{listings} package, to use Japanese characters in + the |lstlisting| environment. Generally speaking, it also can + be used in \LuaTeX-ja. However, it seems to be that a + Japanese character after a space does not recieve any process + of the \emph{listings} package; this is inconvinient when we + use the \emph{showexpl} package. + +There is another way to use characters above 256 with the + \emph{listings} package (described in\cite{apl}). However, + this method is not suitable for Japanese, since the number of + Japanese characters is very large. We hope that the + \emph{listings} package will be able to handle all characters above + 256 without any patch, in the future. + + +\end{description} + + + +\section{Implementation} +\label{sec:implementation} +\subsection{Handling of Japanese fonts} +In \pTeX, there are three slots for maintaining current fonts, namely +|\font| for alphabetic fonts, |\jfont| for Japanese fonts (in horizontal +direction) and |\tfont| for Japanese fonts (in vertical direction). With +these slots, we can manage the current font for alphabetic characters +and that for Japanese characters separately in \pTeX. However, \LuaTeX\ +has only one slot for maintaining the current font, as \TeX82. This +situation leads a problem: how can we maintain the `current Japanese +font'? + +There are three approaches for this problem. One approach is to make a +mapping table from alphabetic fonts to corresponding Japanese fonts +(here we don't assume that NFSS2 is available). Another approach is +that we always use composite fonts with alphabetic fonts and Japanese +fonts. The third approach is that the information of the current +Japanese font is stored in an attribute. We adopted the third approach, +since \LuaTeX-ja is much affected by \pTeX\ as we noted in +Subsection~\ref{ssec-pol}. + +As in Figure~\ref{fig-jfdef}, \LuaTeX-ja uses |\jfont| for defining +Japanese fonts, as \pTeX. However, because the information of the current +Japanese font is stored into an attribute, control sequences defined by +|\jfont| (e.g.,~|\foo| and |\bar| in Figure~\ref{fig-jfdef}) is +not representing a font by the means of \TeX82. In other words, each of +these control sequences is just an assignment to an attribute, therefore +they cannot be an argument of |\the|, |\fontname|, nor |\textfont|. + + +Callbacks by the \emph{luaotfload} package, e.g.,~replacement of glyphs +according to OpenType font features, are performed just after `Examination of +stack level' (see Subsections +\ref{ssec-over}~and~\ref{ssec-stack}). Also note that calculation of +character classes for each Japanese character is done \emph{after} the +these callbacks for now. + +\subsection{Stack management} +\label{ssec-stack} + +As we noted in Subsection~\ref{ssec-csname}, parameters that the values +at the end of a horizontal box or that of a paragraph are valid in +whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented +by internal integers or registers of other types in \TeX. We explain it +in this subsection. + +\begin{figure} +\begin{lstlisting} +void package(int c) +{ + ... + d = box_max_depth; + unsave(); + save_ptr -= 4; + if (cur_list.mode_field == -hmode) { + cur_box = filtered_hpack(cur_list.head_field, + cur_list.tail_field, saved_value(1), + saved_level(1), grp, saved_level(2)); + subtype(cur_box) = HLIST_SUBTYPE_HBOX; + } else { +\end{lstlisting} +\caption{An extract of a CWEB-source \texttt{tex/packaging.w} of \LuaTeX.} +\label{fig-ltsrc} +\end{figure} + +Figure~\ref{fig-ltsrc} is an extract of a CWEB-source +\texttt{tex/packaging.w} of \LuaTeX\ (SVN revision 4358). This function +is called just when an explicit |\hbox{...}| or |\vbox{...}| is ended, and +the function |filtered_hpack()| is where the |hpack_filter| and then the +actual `hpack' process are performed. Notice that the |unsave()| +function is called before |filtered_hpack()|. This is the problem; +because of |unsave()|, we can retrive only the values of registers +\emph{outside} the box, even in the |hpack_filter| callback. + +To cope with this problem, \LuaTeX-ja has its own stack system, based on +Lua codes in \cite{stack-mail}. Furthermore, \emph{whatsit} nodes whose +\emph{user\_id} is 30112 (\emph{stack\_node}, for short) will be +appended to the current horizontal list each time the current stack +level is incremented, and their values are the values of +|\currentgrouplevel| at that time. In the beginning of the |hpack_filter| +callback, the list in question is traversed to determine whether the +stack level at the end of the list and that outside the box coincides. + +Let $x$ be the value of |\currentgrouplevel|, and $y$ be the current +stack level, both inside the |hpack_filter| callback, i.e.,~outside a +horizontal box. Consider a list which represents the content of the box, +then we have: +\begin{itemize} +\item A \emph{stack\_node} whose value is $x+1$ (because all materials + in the box are included in a group |\hbox{...}|, the value of + |\currentgrouplevel| inside the box is at least $x+1$) in the list + corresponds to an assignment related to the stack system in just + top-level of the list, like +\begin{quote} +\begin{verbatim} +\hbox{...(assignment)...} +\end{verbatim} +\end{quote} +In this case, the current stack level is incremented to $y+1$ after the assignment. +\item A \emph{stack\_node} whose value is more than $x+1$ in the list corresponds +to an assignment inside another group contained in the box. For example, + the following input creates +a \emph{stack\_node} whose value is $x+3=(x+1)+2$: +\begin{quote} +\begin{verbatim} +\hbox{...{...{...(assignment)}...}...} +\end{verbatim} +\end{quote} +\end{itemize} +Thus, we can conclude that the stack level at the end of the list is +$y+1$, if and only if there is a \emph{stack\_node} whose value is +$x+1$. Otherwise, the stack level is just $y$. + +\subsection{Adjustment of the position of Japanese characters} +\label{ssec-width} + +The size of a glyph specified in a metric and that of a real font +usually differ. For example, the letter `\inhibitglue【' is half-width +in |jfm-ujis.lua| or |jis.tfm|, while this letter is full-width like `【' +in most TrueType fonts used in Japanese typesetting, such as +IPA~Mincho. Hence the adjustment of position of such glyphs is +needed. In the context of \pTeX, this process was performed using virtual fonts. + +On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph +into a horizontal box. There are two main reasons why we adopted this +method; one is that we feared Lua codes for coexisting with callbacks by +the |luaotfload| package would be large if we use virtual fonts, and the +other is to cope with shifting of the baseline of characters at the +same time. + +\begin{figure} +\begin{center}\unitlength=9pt\small +\begin{picture}(15,12)(-1,-3) + +\color{grayx}% real glyph +\put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength} + +\color{black}% real glyph :step1 +\thicklines +\put(-1,-1.5){\line(0,1){7}\line(0,-1){2.5}} +\put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}} +\put(-1,5.5){\line(1,0){6}} +\put(-1,-4){\line(1,0){6}} +\put(-1,0){\makebox(0,0)[r]{\strut$R$\,}} + +\thicklines +\put(0,0){\vector(0,1){9}\line(0,-1){3}\vector(1,0){12}} +\put(12,9){\makebox(0,0)[rt]{\strut$M$\,}} +\put(12,0){\line(0,1){9}\vector(0,-1){3}} +\put(0,9){\line(1,0){12}} +\put(0,-3){\line(1,0){12}} +\put(0.2,4.5){\makebox(0,0)[l]{\texttt{height}}} +\put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}} +\put(6,0.2){\makebox(0,0)[b]{\texttt{width}}} + +\thicklines +\put(3,0){\line(0,1){7}\line(0,-1){2.5}\line(1,0){6}} +\put(9,0){\line(0,1){7}\line(0,-1){2.5}} +\put(3,7){\line(1,0){6}} +\put(3,-2.5){\line(1,0){6}} +\newsavebox{\eqdist} +\savebox{\eqdist}(0,0)[c]{% + \thinlines + \put(-0.08,0.2){\line(0,-1){0.4}}% + \put(0.08,0.2){\line(0,-1){0.4}}} +\put(1.5,0){\usebox{\eqdist}} +\put(10.5,0){\usebox{\eqdist}} + +\thicklines +\put(3,-1.5){\vector(-1,0){4}} +\put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}} +\put(3,0){\vector(0,-1){1.5}} +\put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}} +\end{picture} +\end{center} +\caption{The position of the `real' glyph.} +\label{fig-pos} +\end{figure} + +Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is +the imaginary body specified in the metric, and a vertical +rectangle is the imaginary body of a real glyph. First, the real glyph +is aligned with respect to the width of $M$. In the figure, the real +glyph is aligned `middle'; this setting is useful for the full-width +middle dot `・'. We have other settings, `left' and `right'. +After that, it is shifted according to the value of |left| and |down|, +which are specified in the metric, too. The final position of the real glyph +is shown by the gray rectangle~$R$. If the amount of shifting the baseline is +not zero, $M$ (and hence the real glyph) is shifted by that amount. + +We would like to remark briefly on the vertical position of a real +glyph. A JFM (or a metric used in \LuaTeX-ja) and a real font used for +it may have different height or depth. In that case, it may look better +if the real glyph is shifted vertically to match the height-depth ratio +specified in the metric, while any vertical adjustment except the +adjustment by the |down| value does not performed in the present +implementation of \LuaTeX-ja . This situation is carefully studied by +Otobe~\cite{min10}. Here the policy on this problem is not determined +now, however we would like to offer several solutions in future +development. + +\section{Conclusion} +We have discussed about our \LuaTeX-ja package, which is much affected +by \pTeX. For now, it can be used for experimental use, however there +are much refinements which are needed for regular use. The author hopes +that this paper and \LuaTeX-ja project contribute the typesetting Japanese, +and possibly other Asian languages, under \LuaTeX. + +\section*{Acknowledgements} +The author would like to thank Ken Nakano and Hideaki Togashi for their +development of ASCII \pTeX. The author is very grateful to Haruhiko +Okumura for his leadership in the Japanese \TeX\ community. The author +is also very grateful to members of \LuaTeX-ja project team for their +valuable cooperation in development. + +%%% The style of the bibiliogrphy is `amsplain'. +\providecommand{\bysame}{\leavevmode\hbox to3em{\hrulefill}\thinspace} +\providecommand{\href}[2]{#2} +\begin{thebibliography}{99} + +\bibitem{aj16} +Adobe Systems Incorporated, \emph{Adobe-Japan1-6 Character Collection + for CID-Keyed Fonts}, Technical Note~\#5078, 2004. +\url{http://partners.adobe.com/public/developer/en/font/5078.Adobe-Japan1-6.pdf} + +\bibitem{ptex} +ASCII MEDIA WORKS,アスキー日本語\TeX\ (\pTeX).\url{http://ascii.asciimw.jp/pb/ptex/} + +\bibitem{apl} +John Baker, \emph{Typesetting UTF8 APL code with the \LaTeX\ lstlisting package}. +\url{http://bakerjd99.wordpress.com/2011/08/15/} + +\bibitem{omega} +Jin-Hwan~Cho and Haruhiko Okumura, \emph{Typesetting CJK Languages with Omega}, +\TeX, XML, and Digital Typography, Lecture Notes in Computer Science, vol.~3130, +Springer, 2004, 139--148. + +\bibitem{joylua} +Yannis Haralambous. \emph{The Joy of \LuaTeX}. \url{http://luatex.bluwiki.com/} + +\bibitem{jisx4051} +Japanese Industrial Standards Committee. \emph{JIS~X~4051: Formatting + rules for Japanese documents}, 1993, 1995, 2004. + +\bibitem{eptex} +北川弘典,$\varepsilon$-\pTeX についてのwiki. +\url{http://sourceforge.jp/projects/eptex/wiki/FrontPage} + +\bibitem{luaums} +北川弘典,\LuaTeX で日本語. +\url{http://oku.edu.mie-u.ac.jp/tex/mod/forum/discuss.php?d=378} + +\bibitem{luatexref} +\LuaTeX\ development team, \emph{The \LuaTeX\ reference}. +\url{http://www.luatex.org/svn/trunk/manual/luatexref-t.pdf} (snapshot of SVN trunk) + +\bibitem{man} +\LuaTeX-ja project team, \emph{The \LuaTeX-ja package}. +Not completed for now. Available at |doc/man-en.pdf| (in English) or + |doc/man-ja.pdf| (in Japanese) +in the Git repository. + +\bibitem{luajp-test} +香田温人,\LuaTeX と日本語. +\url{http://www1.pm.tokushima-u.ac.jp/~kohda/tex/luatex-old.html} + +\bibitem{luajalayout} +前田一貴,luajalayout パッケージ---Lua\LaTeX によ + る日本語組版---. +\url{http://www-is.amp.i.kyoto-u.ac.jp/lab/kmaeda/lualatex/luajalayout/} + +\bibitem{jsclasses} +奥村晴彦,p\LaTeXe 新ドキュメントクラス. +\url{http://oku.edu.mie-u.ac.jp/~okumura/jsclasses/} + +\bibitem{ptexjp} +Haruhiko Okumura, \emph{\pTeX\ and Japanese Typesetting}, + The Asian Journal of \TeX\ \textbf{2}~(2008), 43--51. + +\bibitem{min10} +乙部厳己,min10フォントについて. +\url{http://argent.shinshu-u.ac.jp/~otobe/tex/files/min10.pdf} + +\bibitem{otf} +齋藤修三郎,Open Type Font用VF. +\url{http://psitau.kitunebi.com/otf.html} + +\bibitem{stack-mail} +Jonathan Sauer, \emph{[Dev-luatex] tex.currentgrouplevel}. +\url{http://www.ntg.nl/pipermail/dev-luatex/2008-August/001765.html} + +\bibitem{uptex} +Takuji Tanaka, \emph{u\pTeX, up\LaTeX---unicode version of \pTeX, p\LaTeX}. +\url{http://homepage3.nifty.com/ttk/comp/tex/uptex_en.html} + +\bibitem{ptexenc} +Nobuyuki Tsuchimura, \emph{Development of a Japanese \TeX\ Distribution~`ptetex3'}, +Computer Software\ \textbf{24} (2007), no.~4, 40--50, (in Japanese). + +\bibitem{w3c} +W3C Working Group, \emph{Requirements for Japanese Text Layout}. +\url{http://www.w3.org/TR/jlreq/} +\end{thebibliography} + +\end{document} -- 2.11.0