%#! lualatex -shell-escape manual.ins %<*en> \documentclass[a4paper,titlepage]{article} \usepackage[margin=20mm,footskip=5mm]{geometry} % %<*ja> \documentclass[a4paper,titlepage]{bxjsarticle} \setpagelayout*{margin=20mm,footskip=5mm} \def\headfont{\normalfont\bfseries} % \def\headfont{\sffamily\gtfamily} is needed in ordinal documents % This document cannot typeset in ltjsclasses (conflict with showexpl?) % \usepackage{amsmath,amssymb,xcolor,pict2e,multienum,amsthm,float} \usepackage{booktabs,listings,lltjlisting,showexpl,multicol} \usepackage{luatexja-otf} \usepackage[unicode=false]{hyperref} \usepackage[all]{xy} \SelectTips{cm}{} \DeclareRobustCommand\eTeX{\ensuremath{\varepsilon}-\kern-.125em\TeX} \DeclareRobustCommand\LuaTeX{Lua\TeX} \DeclareRobustCommand\pdfTeX{pdf\TeX} \DeclareRobustCommand\pTeX{p\kern-.05em\TeX} \DeclareRobustCommand\upTeX{p\kern-.05em\TeX} \DeclareRobustCommand\pLaTeX{p\kern-.05em\LaTeX} \DeclareRobustCommand\pLaTeXe{p\kern-.05em\LaTeXe} \DeclareRobustCommand\epTeX{\ensuremath{\varepsilon}-\kern-.125em\pTeX} \theoremstyle{definition} \newtheorem{defn}{Definition} \newenvironment{cslist}{% \leftskip2em\parindent=0pt\def\makelabel##1{{\tt\char92##1}} \def\{{\char`\{}\def\}{\char`\}} \let\origitem=\item \def\item[##1]{\par\smallskip\par\hskip-\leftskip\makelabel{##1}\par} }{} \makeatletter \long\def\@makecaption#1#2{% \vskip\abovecaptionskip \sbox\@tempboxa{{\small #1. #2}}% \ifdim \wd\@tempboxa >\hsize {\small #1. #2}\par \else \global \@minipagefalse \hb@xt@\hsize{\hfil\box\@tempboxa\hfil}% \fi \vskip\belowcaptionskip} \makeatother %<*en> \title{The \LuaTeX-ja package} \author{The \LuaTeX-ja project team} % %<*ja> \title{\LuaTeX-jaパッケージ} \author{\LuaTeX-jaプロジェクトチーム} % \lstset{ basicstyle=\ttfamily\small, pos=o, breaklines=true, numbers=none, rframe={}, basewidth=0.5em } \parskip=\smallskipamount \protected\def\Param#1{\textsf{#1}} % parameter name \protected\def\Pkg#1{\underline{\smash{\texttt{#1}}}} % packages/classes \begin{document} \catcode`\<=13 \def<#1>{{\normalfont\rm\itshape$\langle$#1$\rangle$}} \maketitle \tableofcontents \bigskip %<*en> {\Large\bf This documentation is far from complete. It may have many grammatical (and contextual) errors.} % %<*ja> \textbf{\large 本ドキュメントはまだまだ未完成です. また,英語版と日本語版をdocstripプログラムを用いることで一緒に生成している都合上, 見出しが英語のままになっています.} % \clearpage \part{User's manual} \section{Introduction} %<*en> The \LuaTeX-ja package is a macro package for typesetting high-quality Japanese documents when using \LuaTeX. % %<*ja> \LuaTeX-jaパッケージは,次世代標準\TeX である\LuaTeX の上で,\pTeX と同等 /それ以上の品質の日本語組版を実現させようとするマクロパッケージである. % \subsection{Backgrounds} %<*en> Traditionally, ASCII \pTeX, an extension of \TeX, and its derivatives are used to typeset Japanese documents in \TeX. \pTeX\ is an engine extension of \TeX: so it can produce high-quality Japanese documents without using very complicated macros. But this point is a mixed blessing: \pTeX\ is left behind from other extensions of \TeX, especially \eTeX\ and pdf\TeX, and from changes about Japanese processing in computers (\textit{e.g.}, the UTF-8 encoding). Recently extensions of \pTeX, namely \upTeX\ (Unicode-implementation of \pTeX) and \epTeX\ (merging of \pTeX\ and \eTeX\ extension), have developed to fill those gaps to some extent, but gaps still exist. However, the appearance of \LuaTeX\ changed the whole situation. With using Lua `callbacks', users can customize the internal processing of \LuaTeX. So there is no need to modify sources of engines to support Japanese typesetting: to do this, we only have to write Lua scripts for appropriate callbacks. % %<*ja> 従来,「\TeX を用いて日本語組版を行う」といったとき,エンジンとしては ASCII \pTeX やそれの拡張物が用いられることが一般的であった.\pTeX は\TeX のエンジン拡張であり,(少々仕様上不便な点はあるものの)商業印刷の分野に も用いられるほどの高品質な日本語組版を可能としている.だが,それは弱点に もなってしまった:\pTeX という(組版的に)満足なものがあったため,海外で 行われている数々の\TeX の拡張──例えば\eTeX や\pdfTeX ──や,TrueType, OpenType, Unicodeといった計算機で日本語を扱う際の状況の変化に追従すること を怠ってしまったのだ. ここ数年,若干状況は改善されてきた.現在手に入る大半の\pTeX バイナリでは 外部UTF-8入力が利用可能となり,さらにUnicode化を推進し,\pTeX の内部処理 までUnicode化した\upTeX も開発されている.また,\pTeX に\eTeX 拡張をマー ジした\epTeX も登場し,\TeX\ Live\ 2011では\pLaTeX が\epTeX の上で動作す るようになった.だが,\pdfTeX 拡張(pdf直接出力やmicro-typesetting)を \pTeX に対応させようという動きはなく,海外とのgapは未だにあるのが現状であ る. しかし,\LuaTeX の登場で,状況は大きく変わることになった.Luaコードで `callback'を書くことにより,\LuaTeX の内部処理に割り込みをかけることが可 能となった.これは,エンジン拡張という真似をしなくても,Luaコードとそれに 関する\TeX マクロを書けば,エンジン拡張とほぼ同程度のことができるようになっ たということを意味する.\LuaTeX-jaは,このアプローチによってLuaコード・ \TeX マクロによって日本語組版を\LuaTeX の上で実現させようという目的で開発 が始まったパッケージである. % \subsection{Major Changes from \pTeX} %<*en> The \LuaTeX-ja package is under much influence of \pTeX\ engine. The initial target of development was to implement features of \pTeX. However, \emph{\LuaTeX-ja is not a just porting of \pTeX; unnatural specifications/behaviors of \pTeX\ were not adopted}. % %<*ja> \LuaTeX-jaは,\pTeX に多大な影響を受けている.初期の開発目標は,\pTeX の機 能をLuaコードにより実装することであった.しかし,開発が進むにつれ,\pTeX の完全な移植は不可能であり,また\pTeX における実装がいささか不可解になっ ているような状況も発見された.そのため,\textbf{\LuaTeX-ja は,もはや \pTeX の完全な移植は目標とはしない.\pTeX における不自然な仕様・挙動があ れば,そこは積極的に改める.} % The followings are major changes from \pTeX: \begin{itemize} \item A Japanese font is a tuple of a `real' font, a Japanese font metric (\textbf{JFM}, for short), and an optional string called `variation'. \item In \pTeX, a linebreak after Japanese character is ignored (and doesn't yield a space), since linebreaks (in source files) are permitted almost everywhere in Japanese texts. However, \LuaTeX-ja doesn't have this function completely, because of a specification of \LuaTeX. \item The insertion process of glues/kerns between two Japanese characters and between a Japanese character and other characters (we refer these glues/kerns as \textbf{JAglue}) is rewritten from scratch. \begin{itemize} \item As \LuaTeX's internal character handling is `node-based' (\textit{e.g.}, \verb+of{}fice+ doesn't prevent ligatures), the insertion process of \textbf{JAglue} is now `node-based'. \item Furthermore, nodes between two characters which have no effects in linebreak (\textit{e.g.}, \verb+\special+ node) are ignored in the insertion process. \item In the process, two Japanese fonts which differ in their `real' fonts only are identified. \end{itemize} \item At the present, vertical typesetting (\emph{tategaki}), is not supported in \LuaTeX-ja. \end{itemize} For detailed information, see Part~\ref{part-imp}. \subsection{Notations} In this document, the following terms and notations are used: \begin{itemize} \item Characters are divided into two types: \begin{itemize} \item \textbf{JAchar}: standing for Japanese characters such as Hiragana, Katakana, Kanji and other punctuation marks for Japanese. \item \textbf{ALchar}: standing for all other characters like alphabets. \end{itemize} We say `alphabetic fonts' for fonts used in \textbf{ALchar}, and `Japanese fonts' for fonts used in \textbf{JAchar}. \item A word in a sans-serif font (like \Param{prebreakpenalty}) means an internal parameter for Japanese typesetting, and it is used as a key in \verb+\ltjsetparameter+ command. \item A word in typewriter font with underline (like \Pkg{fontspec}) means a package of a class of \LaTeX. \item The word `primitive' is used not only for primitives in \LuaTeX, but also for control sequences that defined in the core module of \LuaTeX-ja. \item In this document, natural numbers start from~0. \end{itemize} \subsection{About the project} \paragraph{Project Wiki} Project Wiki is under construction. \begin{itemize} \item \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage%28en%29} (English) \item \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage} (Japanese) \end{itemize} This project is hosted by SourceForge.JP. \paragraph{Members}\ %<*en> \begin{multienumerate} \def\labelenumi{$\bullet$} \mitemxxx{Hironori KITAGAWA}{Kazuki MAEDA}{Takayuki YATO} \mitemxxx{Yusuke KUROKI}{Noriyuki ABE}{Munehiro YAMAMOTO} \mitemx{Tomoaki HONDA} \end{multienumerate} % %<*ja> \begin{multienumerate} \def\labelenumi{$\bullet$} \mitemxxx{Hironori KITAGAWA}{Kazuki MAEDA}{Takayuki YATO} \mitemxxx{Yusuke KUROKI}{Noriyuki ABE}{Munehiro YAMAMOTO} \mitemx{Tomoaki HONDA} \end{multienumerate} % % \paragraph{Acknowledgments} -- 挿入するならここ \clearpage \section{Getting Started} \subsection{Installation} To install the \LuaTeX-ja\ package, you will need: \begin{itemize} \item \LuaTeX\ (version 0.65.0-beta or later) and its supporting packages.\\ If you are using \TeX~Live~2011 or current W32\TeX, you don't have to worry. \item The source archive of \LuaTeX-ja, of course{\tt:)} \end{itemize} The installation methods are as follows: \begin{enumerate} \item Download the source archive. At the present, \LuaTeX-ja has no official release, so you have to retrieve the archive from the repository. You can retrieve the Git repository via \begin{verbatim} $ git clone git://git.sourceforge.jp/gitroot/luatex-ja/luatexja.git \end{verbatim} or download the archive of HEAD in \texttt{master} branch from \begin{flushleft} \url{http://git.sourceforge.jp/view?p=luatex-ja/luatexja.git;a=snapshot;h=HEAD;sf=tgz}. \end{flushleft} Note that the forefront of development may not be in \texttt{master} branch. \item Extract the archive. You will see {\tt src/} and several other sub-directories. \item Copy all the contents of {\tt src/} into one of your \texttt{TEXMF} tree. \item If {\tt mktexlsr} is needed to update the filename database, make it so. \end{enumerate} \subsection{Cautions} \begin{itemize} \item The encoding of your source file must be UTF-8. No other encodings, such as EUC-JP or Shift-JIS, are not supported. \item May be conflict with other packages. For example, the default setting of \textbf{JAchar} in the present version does not coexist with the \Pkg{unicode-math} package. Putting the following line in preamble makes that mathematical symbols will be typeset correctly, but several Japanese characters will be treated as an \textbf{ALchar} as side-effect: \begin{verbatim} \ltjsetparameter{jacharrange={-3, -8}} \end{verbatim} \end{itemize} \subsection{Using in plain \TeX}\label{ssec-plain} To use \LuaTeX-ja in plain \TeX, simply put the following at the beginning of the document: \begin{verbatim} \input luatexja.sty \end{verbatim} This does minimal settings (like {\tt ptex.tex}) for typesetting Japanese documents: \begin{itemize} \item The following 6~Japanese fonts are preloaded: \begin{center} \begin{tabular}{ccccc} \toprule \textbf{classification}&\textbf{font name}&\bf `10\,pt'&\bf`7\,pt'&\bf`5\,pt'\\\midrule \emph{mincho}&Ryumin-Light &\verb+\tenmin+&\verb+\sevenmin+&\verb+\fivemin+\\ \emph{gothic}&GothicBBB-Medium&\verb+\tengt+ &\verb+\sevengt+ &\verb+\fivegt+\\ \bottomrule \end{tabular} \end{center} \begin{itemize} \item The `Q' is a unit used in Japanese phototypesetting, and $1\,\textrm{Q}=0.25\,\textrm{mm}$. This length is stored in a dimension \verb+\jQ+. \item It is widely accepted that the font `Ryumin-Light' and `GothicBBB-Medium' aren't embedded into PDF files, and PDF reader substitute them by some external Japanese fonts (\textit{e.g.}, Kozuka Mincho is used for Ryumin-Light in Adobe Reader). We adopt this custom to the default setting. \item A character in an alphabetic font is generally smaller than a Japanese font in the same size. So actual size specification of these Japanese fonts is in fact smaller than that of alphabetic fonts, namely scaled by 0.962216. \end{itemize} \item The amount of glue that are inserted between a \textbf{JAchar} and an \textbf{ALchar} (the parameter \Param{xkanjiskip}) is set to \[ (0.25\cdot 13.5\,\textrm{Q})^{+1\,\text{pt}}_{-1\,\text{pt}} = {27\over 32}\,\mathrm{mm}^{+1\,\text{pt}}_{-1\,\text{pt}}. \] \end{itemize} \subsection{Using in \LaTeX}\label{ssec-ltx} \paragraph{\LaTeXe} Using in \LaTeXe\ is basically same. To set up the minimal environment for Japanese, you only have to load {\tt luatexja.sty}: \begin{verbatim} \usepackage{luatexja} \end{verbatim} It also does minimal settings (counterparts in \pLaTeX\ are {\tt plfonts.dtx} and {\tt pldefs.ltx}): \begin{itemize} \item {\tt JY3} is the font encoding for Japanese fonts (in horizontal direction).\\ When vertical typesetting is supported by \LuaTeX-ja in the future, {\tt JT3} will be used for vertical fonts. \item Two font families {\tt mc} and {\tt gt} are defined: \begin{center} \begin{tabular}{ccccc} \toprule \textbf{classification}&\textbf{family}&\verb+\mdseries+&\verb+\bfseries+&\textbf{scale}\\\midrule \emph{mincho}&\tt mc&Ryumin-Light &GothicBBB-Medium&0.962216\\ \emph{gothic}&\tt gt&GothicBBB-Medium&GothicBBB-Medium&0.962216\\ \bottomrule \end{tabular} \end{center} Remark that the bold series in both family are same as the medium series of \emph{gothic} family. This is a convention in \pLaTeX. \item Japanese characters in math mode are typeset by the font family {\tt mc}. \end{itemize} However, above settings are not sufficient for Japanese-based documents. To typeset Japanese-based documents, You are better to use class files other than {\tt article.cls}, {\tt book.cls}, and so on. At the present, we have the counterparts of \Pkg{jclasses} (standard classes in \pLaTeX) and \Pkg jsclasses (classes by Haruhiko Okumura), namely, \Pkg{ltjclasses} and \Pkg{ltjsclasses}. \paragraph{{\tt\char92 CID, {\tt\char92 UTF}} and macros in OTF package} Under \pTeX, \Pkg{otf} package (developed by Shuzaburo Saito) is used for typesetting characters which is in Adobe-japan1-6 CID but not in JIS~X~0208. Since this package is widely used, \LuaTeX-ja supports some of functions in \Pkg{otf} package. \begin{LTXexample} 森\UTF{9DD7}外と内田百\UTF{9592}とが\UTF{9AD9}島屋に行く。 \CID{7652}飾区の\CID{13706}野家, 葛飾区の吉野家 \end{LTXexample} %lltjlisting.sty要修正?:↑「森」の直後で改行. \subsection{Changing Fonts}\label{ssub-chgfnt} \paragraph{Remark: Japanese Characters in Math Mode} Since \pTeX\ supports Japanese characters in math mode, there are sources like the following: \begin{LTXexample} $f_{高温}$~($f_{\text{high temperature}}$). \[ y=(x-1)^2+2\quad{}よって\quad y>0 \] $5\in{}素:=\{\,p\in\mathbb N:\text{$p$ is a prime}\,\}$. \end{LTXexample} We (the project members of \LuaTeX-ja) think that using Japanese characters in math mode are allowed if and only if these are used as identifiers. In this point of view, \begin{itemize} \item The lines 1~and~2 above are not correct, since `高温' in above is used as a textual label, and `よって' is used as a conjunction. \item However, the line~3 is correct, since `素' is used as an identifier. \end{itemize} Hence, in our opinion, the above input should be corrected as: \begin{LTXexample} $f_{\text{高温}}$~% ($f_{\text{high temperature}}$). \[ y=(x-1)^2+2\quad \mathrel{\text{よって}}\quad y>0 \] $5\in{}素:=\{\,p\in\mathbb N:\text{$p$ is a prime}\,\}$. \end{LTXexample} %BUG?: \{\}がなければ「素」がでない.上の段落の「よって」もでてない. We also believe that using Japanese characters as identifiers is rare, hence we don't describe how to change Japanese fonts in math mode in this chapter. For the method, please see Part~\ref{part-ref}. \paragraph{plain \TeX} To change Japanese fonts in plain \TeX, you must use the primitive \verb+\jfont+. So please see Part~\ref{part-ref}. \paragraph{NFSS2} For \LaTeXe, \LuaTeX-ja simply adopted the font selection system from that of \pLaTeXe\ (in {\tt plfonts.dtx}). \begin{itemize} \item Two control sequences \verb+\mcdefault+ and \verb+\gtdefault+ are used to specify the default font families for \emph{mincho} and \emph{gothic}, respectively. Default values: \texttt{mc} for \verb+\mcdefault+ and \texttt{gt} for \verb+\gtdefault+. \item Commands \verb+\fontfamily+, \verb+\fontseries+, \verb+\fontshape+ and \verb+\selectfont+ can be used to change attributes of Japanese fonts. \begin{center} \begin{tabular}{cccccc} \toprule &\textbf{encoding}&\textbf{family}&\textbf{series}&\textbf{shape}&\textbf{selection}\\\midrule alphabetic fonts &\verb+\romanencoding+&\verb+\romanfamily+&\verb+\romanseries+&\verb+\romanshape+ &\verb+\useroman+\\ Japanese fonts &\verb+\kanjiencoding+&\verb+\kanjifamily+&\verb+\kanjiseries+&\verb+\kanjishape+ &\verb+\usekanji+\\ both&---&--&\verb+\fontseries+&\verb+\fontshape+&---\\ auto select&\verb+\fontencoding+&\verb+\fontfamily+&---&---&\verb+\usefont+\\ \bottomrule \end{tabular} \end{center} %<*ja> ここで,\verb+\fontencoding{}+は,引数により和文側か欧文側かの どちらかが切り替わる.例えば,次の入力で最初の\verb+\fontencoding+ の呼び出しは和文フォントのエンコーディングを\texttt{JT3}に変更し, 2回目の呼びだしでは欧文フォント側を\texttt{T1}へと変更する. \begin{verbatim} \fontencoding{JY3}\fontencoding{T1} \end{verbatim} \verb+\fontfamily+も引数により和文側,欧文側,\textbf{あるいは両方}のフォ ントファミリが切り替わる. 詳細はSubsection~\ref{ssub-nfsspat}を参照すること. % \item For defining a Japanese font family, use \verb+\DeclareKanjiFamily+ instead of \verb+\DeclareFontFamily+. However, in the present implementation, using \verb+\DeclareFontFamily+ doesn't cause any problem. \end{itemize} \paragraph{fontspec} To coexist with the \Pkg{fontspec} package, it is needed to load \Pkg{luatexja-fontspec} package in the preamble. This additional package automatically loads \Pkg{luatexja} and \Pkg{fontspec} package, if needed. In \Pkg{luatexja-fontspec} package, the following 7~commands are defined as counterparts of original commands in the \Pkg{fontspec} package: \begin{center} \begin{tabular}{ccccc} \toprule Japanese fonts &\verb+\jfontspec+&\verb+\setmainjfont+&\verb+\setsansjfont+&\verb+\newjfontfamily+\\ alphabetic fonts &\verb+\fontspec+&\verb+\setmainfont+&\verb+\setsansfont+&\verb+\newfontfamily+\\ \midrule Japanese fonts &\verb+\newjfontface+&\verb+\defaultjfontfeatures+&\verb+\addjfontfeatures+\\ alphabetic fonts &\verb+\newfontface+&\verb+\defaultfontfeatures+&\verb+\addfontfeatures+\\ \bottomrule \end{tabular} \end{center} 使用例 Note that there is no command named \verb+\setmonojfont+, since it is popular for Japanese fonts that nearly all Japanese glyphs have same widths. Also note that the kerning feature is set off by default in these 7~commands, since this feature and \textbf{JAglue} will clash (see \ref{para-kern}). \section{Changing Parameters} There are many parameters in \LuaTeX-ja. And due to the behavior of \LuaTeX, most of them are not stored as internal register of \TeX, but as an original storage system in \LuaTeX-ja. Hence, to assign or acquire those parameters, you have to use commands \verb+\ltjsetparameter+ and \verb+\ltjgetparameter+. \subsection{Editing the range of \textbf{JAchar}s} To edit the range of \textbf{JAchar}s, You have to assign a non-zero natural number which is less than 217 to the character range first. This can be done by using \verb+\ltjdefcharrange+ primitive. For example, the next line assigns whole characters in Supplementary Multilingual Plane and the character `漢' to the range number~100. \begin{lstlisting} \ltjdefcharrange{100}{"10000-"1FFFF,`漢} \end{lstlisting} This assignment of numbers to ranges are always global, so you should not do this in the middle of a document. If some character has been belonged to some non-zero numbered range, this will be overwritten by the new setting. For example, whole SMP belong the range~4 in the default setting of \LuaTeX-ja, and if you specify the above line, then SMP will belong the range~100 and be removed from the range~4. After assigning numbers to ranges, the {\sf jacharrange} parameter can be used to customize which character range will be treated as ranges of \textbf{JAchar}s, as the following line (this is just the default setting of \LuaTeX-ja): \begin{verbatim} \ltjsetparameter{jacharrange={-1, +2, +3, -4, -5, +6, +7, +8}} \end{verbatim} The argument to {\sf jacharrange} parameter is a list of integer. Negative interger $-n$ in the list means that `the character range~$n$ is ...'. \paragraph{Default Setting} Lua\TeX-ja predefines eight character ranges for convinience. They are determined from the following data: \begin{itemize} \item Blocks in Unicode~6.0. \item The \texttt{Adobe-Japan1-UCS2} mapping between a CID Adobe-Japan1-6 and Unicode. \item The \texttt{PXbase} bundle for \upTeX\ by Takayuki Yato. \end{itemize} Now we describe these eight ranges. The alphabet `J' or `A' after the number shows whether characters in the range is treated as \textbf{JAchar}s or not by default. These settings are similar to \texttt{prefercjk} ... \begin{description} \item[Range~8${}^{\text{J}}$] Symbols in the intersection of the upper half of ISO~8859-1 (Latin-1 Supplement) and JIS~X~0208 (a basic character set for Japanese). This character range consists of the following charatcers: \begin{multicols}{2} \begin{itemize} \def\ch#1#2{\item \char"#1\ ({\tt U+00#1}, #2)}%" \ch{A7}{Section Sign} \ch{A8}{Umlaut or diaeresis} \ch{B0}{Degree sign} \ch{B1}{Plus-minus sign} \ch{B4}{Spacing acute} \ch{B6}{Paragraph sign} \ch{D7}{Multiplication sign} \ch{F7}{Division Sign} \end{itemize} \end{multicols} \item[Range~1${}^{\text{A}}$] Latin characters that some of them are included in Adobe-Japan1-6. This range consist of the following Unicode ranges, \emph{except characters in the range~8 above}: \begin{multicols}{2} \begin{itemize} \item {\tt U+0080}--{\tt U+00FF}: Latin-1 Supplement \item {\tt U+0100}--{\tt U+017F}: Latin Extended-A \item {\tt U+0180}--{\tt U+024F}: Latin Extended-B \item {\tt U+0250}--{\tt U+02AF}: IPA Extensions \item {\tt U+02B0}--{\tt U+02FF}: Spacing Modifier Letters \item {\tt U+0300}--{\tt U+036F}: Combining Diacritical Marks \item {\tt U+1E00}--{\tt U+1EFF}: Latin Extended Additional \par\ \end{itemize} \end{multicols} \item[Range~2${}^{\text{J}}$] Greek and Cyrillic letters. JIS~X~0208 (hence most of Japanese fonts) has some of these characters. \begin{multicols}{2} \begin{itemize} \item {\tt U+0370}--{\tt U+03FF}: Greek and Coptic \item {\tt U+0400}--{\tt U+04FF}: Cyrillic \item {\tt U+1F00}--{\tt U+1FFF}: Greek Extended \\\ \end{itemize} \end{multicols} \item[Range~3${}^{\text{J}}$] Punctuations and Miscellaneous symbols. The block list is indicated in Table~\ref{table-rng3}. \begin{table}[!tb] \caption{Unicode blocks in predefined character range~3.}\label{table-rng3} \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%" \begin{center}\small \begin{tabular}{llll} "2000--"206F&General Punctuation& "2070--"209F&Superscripts and Subscripts\\ "20A0--"20CF&Currency Symbols& "20D0--"20FF&Comb.\ Diacritical Marks for Symbols\\ "2100--"214F&Letterlike Symbols& "2150--"218F&Number Forms\\ "2190--"21FF&Arrows& "2200--"22FF&Mathematical Operators\\ "2300--"23FF&Miscellaneous Technical& "2400--"243F&Control Pictures\\ "2500--"257F&Box Drawing& "2580--"259F&Block Elements\\ "25A0--"25FF&Geometric Shapes& "2600--"26FF&Miscellaneous Symbols\\ "2700--"27BF&Dingbats& "2900--"297F&Supplemental Arrows-B\\ "2980--"29FF&Misc.\ Mathematical Symbols-B& "2B00--"2BFF&Miscellaneous Symbols and Arrows\\ "E000--"F8FF&Private Use Area& \end{tabular} \end{center} \end{table} \item[Range~4${}^{\text{A}}$] Characters usually not in Japanese fonts. This range consists of almost all Unicode blocks which are not in other predefined ranges. Hence, instead of showing the block list, we put the definition of this range itself: \begin{lstlisting} \ltjdefcharrange{4}{% "500-"10FF, "1200-"1DFF, "2440-"245F, "27C0-"28FF, "2A00-"2AFF, "2C00-"2E7F, "4DC0-"4DFF, "A4D0-"A82F, "A840-"ABFF, "FB50-"FE0F, "FE20-"FE2F, "FE70-"FEFF, "FB00-"FB4F, "10000-"1FFFF} % non-Japanese \end{lstlisting} \item[Range~5${}^{\text{A}}$] Surrogates and Supplementary Private Use Areas. \item[Range~6${}^{\text{J}}$] Characters used in Japanese. The block list is indicated in Table~\ref{table-rng6}. \begin{table}[!tb] \caption{Unicode blocks in predefined character range~6.}\label{table-rng6} \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%" \begin{center}\small \begin{tabular}{llll} "2460--"24FF&Enclosed Alphanumerics& "2E80--"2EFF&CJK Radicals Supplement\\ "3000--"303F&CJK Symbols and Punctuation& "3040--"309F&Hiragana\\ "30A0--"30FF&Katakana& "3190--"319F&Kanbun\\ "31F0--"31FF&Katakana Phonetic Extensions& "3200--"32FF&Enclosed CJK Letters and Months\\ "3300--"33FF&CJK Compatibility& "3400--"4DBF&CJK Unified Ideographs Extension A\\ "4E00--"9FFF&CJK Unified Ideographs& "F900--"FAFF&CJK Compatibility Ideographs\\ "FE10--"FE1F&Vertical Forms& "FE30--"FE4F&CJK Compatibility Forms\\ "FE50--"FE6F&Small Form Variants& "{20}000--"{2F}FFF&(Supplementary Ideographic Plane) \end{tabular} \end{center} \end{table} \item[Range~7${}^{\text{J}}$] Characters used in CJK languages, but not included in Adobe-Japan1-6. The block list is indicated in Table~\ref{table-rng7}. \begin{table}[!tb] \caption{Unicode blocks in predefined character range~7.}\label{table-rng7} \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%" \begin{center}\small \begin{tabular}{llll} "1100--"11FF&Hangul Jamo& "2F00--"2FDF&Kangxi Radicals\\ "2FF0--"2FFF&Ideographic Description Characters& "3100--"312F&Bopomofo\\ "3130--"318F&Hangul Compatibility Jamo& "31A0--"31BF&Bopomofo Extended\\ "31C0--"31EF&CJK Strokes& "A000--"A48F&Yi Syllables\\ "A490--"A4CF&Yi Radicals& "A830--"A83F&Common Indic Number Forms\\ "AC00--"D7AF&Hangul Syllables& "D7B0--"D7FF&Hangul Jamo Extended-B \end{tabular} \end{center} \end{table} \end{description} \subsection{\Param{kanjiskip} and \Param{xkanjiskip}}\label{subs-kskip} \textbf{JAglue} is divided into the following three categories: \begin{itemize} \item Glues/kerns specified in JFM. If \verb+\inhibitglue+ is issued around a Japanese character, this glue will be not inserted at the place. \item The default glue which inserted between two \textbf{JAchar}s ({\sf kanjiskip}). \item The default glue which inserted between a \textbf{JAchar} and an \textbf{ALchar} (\Param{xkanjiskip}). \end{itemize} The value (a skip) of \Param{kanjiskip} or \Param{xkanjiskip} can be changed as the following. \begin{lstlisting} \ltjsetparameter{kanjiskip={0pt plus 0.4pt minus 0.4pt}, xkanjiskip={0.25\zw plus 1pt minus 1pt}} \end{lstlisting} It may occur that JFM contains the data of `ideal width of {\sf kanjiskip}' and/or `ideal width of \Param{xkanjiskip}'. To use these data from JFM, set the value of \Param{kanjiskip} or \Param{xkanjiskip} to \verb+\maxdimen+. \subsection{Insertion Setting of \Param{xkanjiskip}} It is not desirable that \Param{xkanjiskip} is inserted between every boundary between \textbf{JAchar}s and \textbf{ALchar}s. For example, \Param{xkanjiskip} should not be inserted after opening parenthesis (\textit{e.g.}, compare `(あ' and `(\hskip\ltjgetparameter{xkanjiskip}あ'). \LuaTeX-ja can control whether \Param{xkanjiskip} can be inserted before/after a character, by changing \Param{jaxspmode} for \textbf{JAchar}s and \Param{alxspmode} parameters \textbf{ALchar}s respectively. \begin{LTXexample} \ltjsetparameter{jaxspmode={`あ,preonly}, alxspmode={`\!,postonly}} pあq い!う \end{LTXexample} The second argument {\tt preonly} means `the insertion of \Param{xkanjiskip} is allowed before this character, but not after'. the other possible values are {\tt postonly}, {\tt allow} and {\tt inhibit}. For the compatibility with \pTeX, natural numbers between 0~and~3 are also allowed as the second argument\footnote{But we don't recommend this: since numbers 1~and~2 have opposite meanings in \Param{jaxspmode} and \Param{alxspmode}.}. If you want to enable/disable all insertions of \Param{kanjiskip} and \Param{xkanjiskip}, set \Param{autospacing} and \Param{autoxspacing} parameters to {\tt false}, respectively. \subsection{Shifting Baseline} To make a match between a Japanese font and an alphabetic font, sometimes shifting of the baseline of one of the pair is needed. In \pTeX, this is achieved by setting \verb+\ybaselineshift+ to a non-zero length (the baseline of alphabetic fonts is shifted below). However, for documents whose main language is not Japanese, it is good to shift the baseline of Japanese fonts, but not that of alphabetic fonts. Because of this, \LuaTeX-ja can independently set the shifting amount of the baseline of alphabetic fonts (\Param{yalbaselineshift} parameter) and that of Japanese fonts (\Param{yjabaselineshift} parameter). \begin{LTXexample} \vrule width 150pt height 0.4pt depth 0pt\hskip-120pt \ltjsetparameter{yjabaselineshift=0pt, yalbaselineshift=0pt}abcあいう \ltjsetparameter{yjabaselineshift=5pt, yalbaselineshift=2pt}abcあいう \end{LTXexample} Here the horizontal line in above is the baseline of a line. There is an interesting side-effect: characters in different size can be vertically aligned center in a line, by setting two parameters appropriately. The following is an example (beware the value is not well tuned): \begin{LTXexample} xyz漢字 {\scriptsize \ltjsetparameter{yjabaselineshift=-1pt, yalbaselineshift=-1pt} XYZひらがな }abcかな \end{LTXexample} \subsection{Cropmark} Cropmark is a mark for indicating 4~corners and horizontal/vertical center of the paper. In Japanese, we call cropmark as tombo(w). \pLaTeX\ and this \LuaTeX-ja support `tombow' by their kernel. The following steps are needed to typeset cropmark: \begin{enumerate} \item First, define the banner which will be printed at the upper left of the paper. This is done by assigning a token list to \verb+\@bannertoken+. For example, the following sets banner as `{\tt filename (2012-01-01 17:01)}': \begin{verbatim} \makeatletter \hour\time \divide\hour by 60 \@tempcnta\hour \multiply\@tempcnta 60\relax \minute\time \advance\minute-\@tempcnta \@bannertoken{% \jobname\space(\number\year-\two@digits\month-\two@digits\day \space\two@digits\hour:\two@digits\minute)}% \end{verbatim} \item ... \end{enumerate} \part{Reference}\label{part-ref} \section{Font Metric and Japanese Font} \subsection{\texttt{\char92jfont} primitive} To load a font as a Japanese font, you must use the \verb+\jfont+ primitive instead of~\verb+\font+, while \verb+\jfont+ admits the same syntax used in~\verb+\font+. \LuaTeX-ja automatically loads \Pkg{luaotfload} package, so TrueType/OpenType fonts with features can be used for Japanese fonts: \begin{LTXexample} \jfont\tradgt={file:ipaexg.ttf:script=latn;% +trad;-kern;jfm=ujis} at 14pt \tradgt{}当/体/医/区 \end{LTXexample} Note that the defined control sequence (\verb+\tradgt+ in the example above) using \verb+\jfont+ is not a \textit{font\_def} token, hence the input like \verb+\fontname\tradgt+ causes a error. We denote control sequences which are defined in \verb+\jfont+ by . \paragraph{Prefix \texttt{psft}} Besides \texttt{file:}\ and \texttt{name:}\ prefixes, \texttt{psft:}\ can be used a prefix in \verb+\jfont+ (and~\verb+\font+) primitive. Using this prefix, you can specify a `name-only' Japanese font which will be not embedded to PDF. Typical use of this prefix is to specify the `standard' Japanese fonts, namely, `Ryumin-Light' and `GothicBBB-Medium'. For kerning or other informations, that of Kozuka Mincho Pr6N Regular (this is a font by Adobe Inc., and included in Japanese Font Packs for Adore Reader) will be used. \paragraph{JFM} As noted in Introduction, a JFM has measurements of characters and glues/kerns that are automatically inserted for Japanese typesetting. The structure of JFM will be described in the next subsection. At the calling of \verb+\jfont+ primitive, you must specify which JFM will be used for this font by the following keys: \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item[jfm=] Specify the name of JFM. A file named \texttt{jfm-.lua} will be searched and/or loaded. The followings are JFMs shipped with Lua\TeX-ja: \begin{description} \item[\tt jfm-ujis.lua] A standard JFM in Lua\TeX-ja. This JFM is based on \verb+upnmlminr-h.tfm+, a metric for UTF/OTF package that is used in \upTeX. When you use the \Pkg{luatexja-otf} package, please use this JFM. \item[\tt jfm-jis.lua] A counterpart for \verb+jis.tfm+, `JIS font metric' which is widely used in \pTeX. A major difference of \texttt{jfm-ujis.lua} and this \texttt{jfm-jis.lua} is that most haracters under \texttt{jfm-ujis.lua} are square-shaped, while that under \texttt{jfm-jis.lua} are horizontal rectangles. \item[\tt jfm-min.lua] A counterpart for \verb+min10.tfm+, which is one of the default Japanese font metric shipped with \pTeX. There are notable difference between this JFM and other 2~JFMs, as shown in Table~\ref{tab-difjfm}. \end{description} \item[jfmvar=] Sometimes there is a need that \end{list} \begin{table}[t] \caption{Differences between JFMs shipped with \LuaTeX-ja} \label{tab-difjfm} \begin{center} \def\r#1{{\jfont\g=psft:Ryumin-Light:jfm=#1 at 14.43324pt \g \setbox0=\vtop{\hsize=7\zw\noindent ◆◆◆◆◆◆◆ ある日モモちゃんがお使いで迷子になって泣きました.}\copy0 \vrule height 0pt depth \dp0}} \def\s#1{{\jfont\g=psft:Ryumin-Light:jfm=#1 at 14.43324pt \g \setbox0=\vtop{\hsize=7\zw\noindent ちょっと!何}\copy0}} \def\t#1{{\jfont\g=psft:Ryumin-Light:jfm=#1 at 19.24432pt \g \setbox0=\hbox{漢}% \vrule width 0.4pt height\ht0 depth\dp0\kern-.2pt\copy0 \kern-\wd0\vrule width\wd0height .2pt depth .2pt \kern-\wd0\raise\ht0\hbox{\vrule width\wd0height .2pt depth .2pt}% \kern-\wd0\lower\dp0\hbox{\vrule width\wd0height .2pt depth .2pt}% \kern-.2pt\vrule width 0.4pt height\ht0 depth \dp0}} \begin{tabular}{rccc} \toprule &\tt jfm-ujis.lua&\tt jfm-jis.lua&\tt jfm-min.lua\\ \midrule Example~1&\r{ujis}&\r{jis}&\r{min}\\ Example~2&\s{ujis}&\s{jis}&\s{min}\\ Bounding Box&\t{ujis}&\t{jis}&\t{min}\\ \bottomrule \end{tabular} \end{center} \end{table} \paragraph{Note: kern feature}\label{para-kern} Some fonts have information for inter-glyph spacing. However, this information is not well-compatible with \LuaTeX-ja. More concretely, this kerning space from this information are inserted \emph{before} the insertion process of \textbf{JAglue}, and this causes incorrect spacing between two characters when both a glue/kern from the data in the font and it from JFM are present. \begin{itemize} \item You should specify {\tt -kern} in {\tt\char92jfont} primitive, when you want to use other font features, such as {\tt script=...}\,. \item If you want to use Japanese fonts in proportinal width, and use information from this font, use \texttt{jfm-prop.lua} for its JFM, and ... TODO: kanjiskip? \end{itemize} \subsection{Structure of JFM file} A JFM file is a Lua script which has only one function call: \begin{verbatim} luatexja.jfont.define_jfm { ... } \end{verbatim} Real data are stored in the table which indicated above by \verb+{ ... }+. So, the rest of this subsection are devoted to describe the structure of this table. Note that all lengths in a JFM file are floating-point numbers in design-size unit. \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item[dir=] (required) The direction of JFM. At the present, only \texttt{'yoko'} is supported. \item[zw=] (required) The amount of the length of the `full-width'. \item[zh=] (required) \item[kanjiskip=\{, , \}] (optional) This field specifies the `ideal' amount of \Param{kanjiskip}. As noted in Subsection~\ref{subs-kskip}, if the parameter \Param{kanjiskip} is \verb+\maxdimen+, the value specified in this field is actually used (if this field is not specified in JFM, it is regarded as 0\,pt). Note that and fields are in design-size unit too. \item[xkanjiskip=\{, , \}] (optional) Like the \Param{kanjiskip} field, this field specifies the `ideal' amount of \Param{xkanjiskip}. \end{list} Besides from above fields, a JFM file have several sub-tables those indices are natural numbers. The table indexed by~$i\in\omega$ stores informations of `character class'~$i$. At least, the character class~0 is always present, so each JFM file must have a sub-table whose index is \texttt{[0]}. Each sub-table (its numerical index is denoted by $i$) has the following fields: \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item[chars=\{, ...\}] (required except character class~0) This field is a list of characters which are in this character type~$i$. This field is not required if $i=0$, since all \textbf{JAchar} which are not in any character class other than 0 (hence, the character class~0 contains most of \textbf{JAchar}s). In the list, a character can be specified by its code number, or by the character itself (as a string of length~1). Moreover, there are `imaginary characters' which specified in the list. We will describe these later. \item[width=, height=, depth=, italic=]\ (required) Specify width of characters in character class~$i$, height, depth and the amount of italic correction. All characters in character class~$i$ are regarded that its width, height and depth are as values of these fields. But there is one exception: if \texttt{'prop'} is specified in \texttt{width} field, width of a character becomes that of its `real' glyph \item[left=, down=, align=]\ These fields are for adjusting the position of the `real' glyph. Legal values of \texttt{align} field are \texttt{'left'}, \texttt{'middle'} and \texttt{'right'}. If one of these 3~fields are omitted, \texttt{left} and \texttt{down} are treated as~0, and \texttt{align} field is treated as \texttt{'left'}. The effects of these 3~fields are indicated in Figure~\ref{fig-pos}. In most cases, \texttt{left} and \texttt{down} fields are~0, while it is not uncommon that the \texttt{align} field is \texttt{'middle'} or \texttt{'right'}. For example, setting the \texttt{align} field to \texttt{'right'} is practically needed when the current character class is the class for opening delimiters'. \begin{figure}[!tb] \begin{minipage}{0.4\textwidth}% \begin{center}\unitlength=10pt\small \begin{picture}(15,12)(-1,-4) \color{black!10!white}% real glyph :step1 \put(0,0){\vrule width 12\unitlength height 8\unitlength depth 3\unitlength} \color{red!20!white}% real glyph :step1 \put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength} \color{red}% real glyph \thicklines \put(-1,-1.5){\vector(0,1){7}\vector(0,-1){2.5}\vector(1,0){6}} \put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}} \put(-1,5.5){\line(1,0){6}} \put(-1,-4){\line(1,0){6}} \color{green!20!white}% real glyph :step1 \put(3,0){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength} \color{black}% real glyph :step1 \thicklines \put(0,0){\vector(0,1){8}\line(0,-1){3}\vector(1,0){12}} \put(12,0){\line(0,1){8}\vector(0,-1){3}} \put(0,8){\line(1,0){12}} \put(0,-3){\line(1,0){12}} \put(0.2,4){\makebox(0,0)[l]{\texttt{height}}} \put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}} \put(6,0.2){\makebox(0,0)[b]{\texttt{width}}} \color{green!50!black}% real glyph :step1 \thicklines \put(3,0){\vector(0,1){7}\vector(0,-1){2.5}\vector(1,0){6}} \put(9,0){\line(0,1){7}\line(0,-1){2.5}} \put(3,7){\line(1,0){6}} \put(3,-2.5){\line(1,0){6}} \newsavebox{\eqdist} \savebox{\eqdist}(0,0)[b]{% \thinlines \put(-0.08,0.2){\line(0,-1){0.4}}% \put(0.08,0.2){\line(0,-1){0.4}}} \put(1.5,0){\usebox{\eqdist}} \put(10.5,0){\usebox{\eqdist}} \color{blue}% shifted \thicklines \put(3,-1.5){\vector(-1,0){4}} \put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}} \put(3,0){\vector(0,-1){1.5}} \put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}} \end{picture} \end{center} \end{minipage}% \begin{minipage}{0.6\textwidth}% Consider a node containing Japanese character whose value of the \texttt{align} field is \texttt{'middle'}. \begin{itemize} \item The black rectangle is a frame of the node. Its width, height and depth are specified by JFM. \item Since the \texttt{align} field is \texttt{'middle'}, the `real' glyph is centered horizontally (the green rectangle). \item Furthermore, the glyph is shifted according to values of fields \texttt{left} and \texttt{down}. The ultimate position of the real glyph is indicated by the red rectangle. \end{itemize} \end{minipage} \caption{The position of the `real' glyph.} \label{fig-pos} \end{figure} \item[kern={\{[$j$]=, ...\}}] \item[glue={\{[$j$]=\{, , \}, ...\}}] \end{list} %<*en> \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item['lineend'] An ending of a line. \item['diffmet'] Used at a boundary between two \textbf{JAchar}s whose JFM or size is different. \item['boxbdd'] The beginning/ending of a horizontal box, and the beginging of a noindented paragraph. \item['parbdd'] The beginning of an (indented) paragraph. \item['jcharbdd'] A boundary between \textbf{JAchar} and anything else (such as \textbf{ALchar}, kern, glue, ...). \item[$-1$] The left/right boundary of an inline math formula. \end{list} % %<*ja> 上で説明した通り,\texttt{chars}フィールド中にはいくつかの「特殊文字」も 指定可能である.これらは,大半が\pTeX のJFMグルーの挿入処理ではみな「文字 クラス0の文字」として扱われていた文字であり,その結果として\pTeX より細か い組版調整ができるようになっている.以下のその一覧を述べる: \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item['lineend'] 行の終端を表す. \item['diffmet'] \item['boxbdd'] hboxの先頭と末尾,及びインデントされていない (\verb+\noindent+で開始された)段落の先頭を表す. \item['parbdd'] 通常の(\verb+\noindent+で開始されていない)段落の先頭. \item['jcharbdd'] 和文文字と「その他のもの」(欧文文字,glue,kern等)との境界. \item[$-1$] 行中数式と地の文との境界. \end{list} \paragraph{\pTeX 用和文フォントメトリックの移植} 以下に,\pTeX 用和文フォントメトリックを\LuaTeX-ja用に移植する場合の注意点を挙げておく. \begin{itemize} \item 実際に出力される和文フォントのサイズがdesign sizeとなる. このため,例えば$1\,\textrm{zw}$がdesign sizeの0.962216倍であるJISフォン トメトリック等を移植する場合は, \begin{itemize} \item JFM中の全ての数値を$1/0.962216$倍しておく. \item \TeX ソース中で使用するところで,サイズ指定を0.962216倍にする. \LaTeX でのフォント宣言なら,例えば次のように: \begin{verbatim} \DeclareFontShape{JY3}{mc}{m}{n}{<-> s*[0.962216] psft:Ryumin-Light:jfm=jis}{} \end{verbatim} \end{itemize} \item 上に述べた特殊文字は,\texttt{'boxbdd'}を除き文字クラスを全部0とする (JFM中に単に書かなければよい). \item \texttt{'boxbdd'}については,それのみで一つの文字クラスを形成し,その 文字クラスに関してはglue/kernの設定はしない. これは,\pTeX では, hboxの先頭・末尾とインデントされていない(\verb+\noindent+で開始さ れた)段落の先頭にはJFMグルーは入らないという仕様を実現させるためである. \item \pTeX の組版を再現させようというのが目的であれば以上の注意を守れば十分である. ところで,\pTeX では通常の段落の先頭にJFMグルーが残るという仕様があるので, 段落先頭の開き括弧は全角二分下がりになる.全角下がりを実現させるに は,段落の最初に手動で\verb+\inhibitglue+を追加するか,あるいは \verb+\everypar+のhackを行い,それを自動化させるしかなかった. 一方,\LuaTeX-jaでは,\texttt{'parbdd'}によって,それがJFM側で調整できるよ うになった.例えば,\LuaTeX-ja同梱のJFMのように,\texttt{'boxbdd'}と同じ文字クラスに \texttt{'parbdd'}を入れれば全角下がりとなる. \begin{LTXexample} \jfont\g=psft:Ryumin-Light:jfm=test \g \parindent1\zw\noindent{}◆◆◆◆◆ \par{}「◆◆←二分下がり \par{}【◆◆←全角下がり \par{}〔◆◆←全角二分下がり \end{LTXexample} \end{itemize} % \subsection{Math Font Family} \TeX\ handles fonts in math formulas by 16~font families\footnote{Omega, Aleph, \LuaTeX~and $\varepsilon$-\kern-.125em(u)\pTeX can handles 256~families, but an external package is needed to support this in plain \TeX\ and \LaTeX.}, and each family has three fonts: \verb+\textfont+, \verb+\scriptfont+ and \verb+\scriptscriptfont+. \LuaTeX-ja's handling of Japanese fonts in math formulas is similar; Table~\ref{tab-math} shows counterparts to \TeX's primitives for math font families. There is no relation between the value of \verb+\fam+ and that of \verb+\jfam+; with appropreate settings, you can set both \verb+\fam+ and \verb+\jfam+ to~the same value. \begin{table}[!tb] \caption{Primitives for Japanese math fonts.} \label{tab-math} \begin{center}\def\{{\char`\{}\def\}{\char`\}} \begin{tabular}{lll} \toprule &Japanese fonts&alphabetic fonts\\ \midrule font family&\verb+\jfam+${}\in [0,256)$&\verb+\fam+\\ text size&\tt\Param{jatextfont}\,=\{,\}&\tt\verb+\textfont+=\\ script size&\tt\Param{jascriptfont}\,=\{,\}&\tt\verb+\scriptfont+=\\ scriptscript size&\tt\Param{jascriptscriptfont}\,=\{,\}&\tt\verb+\scriptscriptfont+=\\ \bottomrule \end{tabular} \end{center} \end{table} \subsection{Callbacks} Like \LuaTeX\ itself, \LuaTeX-ja also has callbacks. These callbacks can be accessed via \verb+luatexbase.add_to_callback+ function and so on, as other callbacks \begin{list}{}% {\def\makelabel#1{\bfseries#1}} \item[\texttt{luatexja.load\_jfm} callback] With this callback you can overwrite JFMs. \begin{verbatim} function ( jfm_info, jfm_name) return
new_jfm_info end \end{verbatim} The argument \verb+jfm_info+ contains a table similar to the table in a JFM file, except this argument has \texttt{chars} field which contains character codes whose character class is not~0. An example of this callback is the \texttt{ltjarticle} class, with forcefully assigning character class~0 to \texttt{'parbdd'} in the JFM \texttt{jfm-min.lua}. This callback doesn't replace any code of \LuaTeX-ja. \item[\texttt{luatexja.define\_font} callback] This callback and the next callback form a pair, and you can assign letters which don't have fixed codepoints in Unicode to non-zero character classes. This \texttt{luatexja.define\_font} callback is called just when new Japanese font is loaded. \begin{verbatim} function (
jfont_info, font_number) return
new_jfont_info end \end{verbatim} You may assume that \verb+jfont_info+ has the following fields: \begin{description} \item[\tt jfm] The index number of JFM. \item[\tt size] Font size in a scaled point (${}=2^{-16}\,\textrm{pt}$). \item[\tt var] The value specified in \texttt{jfmvar=...} at a call of \verb+\jfont+. \end{description} The returned table \verb+new_jfont_info+ also should include these three fields. The \verb+font_number+ is a font number. A good example of this and the next callbacks is the \Pkg{luatexja-otf} package, supporting \verb+"AJ1-xxx"+ form for Adobe-Japan1 CID characters in a JFM. This callback doesn't replace any code of \LuaTeX-ja. \item[\texttt{luatexja.find\_char\_class} callback] This callback is called just when \LuaTeX-ja inready to determine which character class a character \verb+chr_code+ belongs. A function used in this callback should be in the following form: \begin{lstlisting}[numbers=left] function ( char_class,
jfont_info, chr_code) if char_class~=0 then return char_class else .... return ( new_char_class or 0) end end \end{lstlisting} The argument \verb+char_class+ is the result of \LuaTeX-ja's default routine or previous function calls in this callback, hence this argument may not be 0. Moreover, the returned \verb+new_char_class+ should be as same as \verb+char_class+ when \verb+char_class+ is not~0, otherwise you will overwrite the \LuaTeX-ja's default routine. This callback doesn't replace any code of \LuaTeX-ja. \end{list} \section{Parameters} \subsection{{\tt\char92 ltjsetparameter} primitive} As noted before, \verb+\ltjsetparameter+ and \verb+\ltjgetparameter+ are primitives for accessing most parameters of \LuaTeX-ja. One of the main reason that \LuaTeX-ja didn't adopted the syntax similar to that of \pTeX\ (\textit{e.g.},~\verb+\prebreakpenalty`)=10000+) is the position of \verb+hpack_filter+ callback in the source of \LuaTeX, see Section~\ref{sec-para}. \verb+\ltjsetparameter+ and \verb+\ltjglobalsetparameter+ are primitives for assigning parameters. These take one argument which is a \texttt{=} list. Allowed keys are described in the next subsection. The difference between \verb+\ltjsetparameter+ and \verb+\ltjglobalsetparameter+ is only the scope of assignment; \verb+\ltjsetparameter+ does a local assignment and \verb+\ltjglobalsetparameter+ does a global one. They also obey the value of \verb+\globaldefs+, like other assignment. \verb+\ltjgetparameter+ is the primitive for acquiring parameters. It always takes a parameter name as first argument, and also takes the additional argument---a character code, for example---in some cases. \begin{LTXexample} \ltjgetparameter{differentjfm}, \ltjgetparameter{autospacing}, \ltjgetparameter{prebreakpenalty}{`)}. \end{LTXexample} \emph{The return value of\/ {\normalfont\tt\char92ltjgetparameter} is always a string}. This is outputted by \texttt{tex.write()}, so any character other than space~`{\tt\char32}'~(U+0020) has the category code 12~(other), while the space has 10~(space). \subsection{List of Parameters} The following is the list of parameters which can be specificated by the \verb+\ltjsetparameter+ command. [\verb+\cs+] indicates the counterpart in \pTeX, and symbols beside each parameter has the following meaning: \begin{itemize} \item No mark: values at the end of the paragraph or the hbox are adopted in the whole paragraph/hbox. \item `\ast' : local parameters, which can change everywhere inside a paragraph/hbox. \item `\dagger': assignments are always global. \end{itemize} \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item[\Param{jcharwidowpenalty}\,=] [\verb+\jcharwidowpenalty+] Penalty value for supressing orphans. This penalty is inserted just after the last \textbf{JAchar} which is not regarded as a (Japanese) punctuation mark. \item[\Param{kcatcode}\,=\{,\}]\ An additional attributes having each character whose character code is . At the present version, the lowermost bit of indicates whether the character is considered as a punctuation mark (see the description of \Param{jcharwidowpenalty} above). \item[\Param{prebreakpenalty}\,=\{,\}] [\verb+\prebreakpenalty+]\ %<*ja> 文字コードの\textbf{JAchar}が行頭にくることを抑止するために, この文字の前に挿入/追加されるペナルティの量を指定する. 例えば閉じ括弧「〗」は絶対に行頭にきてはならないので,標準で読み込まれる \texttt{luatexja-kinsoku.tex}において \begin{verbatim} \ltjsetparameter{prebreakpenalty={`〙,10000}} \end{verbatim} と,最大値の10000が指定されている.他にも,小書きのカナなど,絶対禁止とい うわけではないができれば行頭にはきて欲しくない場合に,0と 10000の間の値を指定するのも有用であろう. \begin{verbatim} \ltjsetparameter{prebreakpenalty={`ゕ,150}} \end{verbatim} % \item[\Param{postbreakpenalty}\,=\{,\}] [\verb+\postbreakpenalty+] %<*ja> 文字コードの\textbf{JAchar}が行末にくることを抑止するために, この文字の後に挿入/追加されるペナルティの量を指定する. \pTeX では,\verb+\prebreakpenalty+, \verb+\postbreakpenalty+において, \begin{itemize} \item 一つの文字に対して,pre, postどちらか一つしか指定することができなかっ た(後から指定した方で上書きされる). \item pre, post合わせて256文字分の情報を格納することしかできなかった. \end{itemize} という制限があったが,\LuaTeX-ja ではこれらの制限は解消されている. % \item[\Param{jatextfont}\,=\{,\}] [\verb+\textfont+ in \TeX] \item[\Param{jascriptfont}\,=\{,\}] [\verb+\scriptfont+ in \TeX] \item[\Param{jascriptscriptfont}\,=\{,\}] [\verb+\scriptscriptfont+ in \TeX] \item[\Param{yjabaselineshift}\,=$^\ast$]\ \item[\Param{yalbaselineshift}\,=$^\ast$] [\verb+\ybaselineshift+] \item[\Param{jaxspmode}\,=\{,\}] [\verb+\inhibitxspcode+] Setting whether inserting \Param{xkanjiskip} is allowed before/after a \textbf{JAchar} whose character code is . The followings are allowed for : \begin{description} \item[0, \texttt{inhibit}] Insertion of \Param{xkanjiskip} is inhibited before the charater, nor after the charater. \item[2, \texttt{preonly}] Insertion of \Param{xkanjiskip} is allowed before the charater, but not after. \item[1, \texttt{postonly}] Insertion of \Param{xkanjiskip} is allowed after the charater, but not before. \item[3, \texttt{allow}] Insertion of \Param{xkanjiskip} is allowed before the charater and after the charater. This is the default value. \end{description} \item[\Param{alxspmode}\,=\{,\}] [\verb+\xspcode+] Setting whether inserting \Param{xkanjiskip} is allowed before/after a \textbf{ALchar} whose character code is . The followings are allowed for : \begin{description} \item[0, \texttt{inhibit}] Insertion of \Param{xkanjiskip} is inhibited before the charater, nor after the charater. \item[1, \texttt{preonly}] Insertion of \Param{xkanjiskip} is allowed before the charater, but not after. \item[2, \texttt{postonly}] Insertion of \Param{xkanjiskip} is allowed after the charater, but not before. \item[3, \texttt{allow}] Insertion of \Param{xkanjiskip} is allowed both before the charater and after the charater. This is the default value. \end{description} Note that parameters \Param{jaxspmode} and \Param{alxspmode} use a common table. \item[\Param{autospacing}\,=$^\ast$] [\verb+\autospacing+] \item[\Param{autoxspacing}\,=$^\ast$] [\verb+\autoxspacing+] \item[\Param{kanjiskip}\,=] [\verb+\kanjiskip+] \item[\Param{xkanjiskip}\,=] [\verb+\xkanjiskip+] \item[\Param{differentjfm}\,=$^\dagger$] Specify how glues/kerns between two \textbf{JAchar}s whose JFM (or size) are different. The allowed arguments are the followings: \begin{description} \item[\texttt{average}] \item[\texttt{both}] \item[\texttt{large}] \item[\texttt{small}] \end{description} \item[\Param{jacharrange}\,=$^\ast$] \item[\Param{kansujichar}\,=\{, \}] [\verb+\kansujichar+] \end{list} \section{Other Primitives} \subsection{Primitives for Compatibility} The following primtives are implemented for compatibility with \pTeX: \begin{list}{}{\def\makelabel{\ttfamily\char92 }} \item[kuten] \item[jis] \item[euc] \item[sjis] \item[ucs] \item[kansuji] \end{list} \subsection{{\tt\char92 inhibitglue}} The primitive \verb+\inhibitglue+ suppresses the insertion of \textbf{JAglue}. The following is an example, using a special JFM that there will be a glue between the beginning of a box and `あ', and also between `あ' and `ウ'. \begin{LTXexample} \jfont\g=psft:Ryumin-Light:jfm=test \g あウあ\inhibitglue{}ウ\inhibitglue\par あ\par\inhibitglue{}あ \par\inhibitglue\hrule{}あoff\inhibitglue ice \end{LTXexample} With the help of this example, we remark the specification of \verb+\inhibitglue+: \begin{itemize} \item The call of \verb+\inhibitglue+ in the (internal) vertical mode is effective at the beginning of the next paragraph. This is realized by hacking \verb+\everypar+. \item The call of \verb+\inhibitglue+ in the (restricted) horizontal mode is only effective on the spot; does not get over boundary of paragraphs. Moreover, \verb+\inhibitglue+ cancels ligatures and kernings, as shown in line~4 of above example. \item The call of \verb+\inhibitglue+ in math mode is just ignored. \end{itemize} \section{Control Sequences for \LaTeXe} \subsection{Patch for NFSS2}\label{ssub-nfsspat} As described in Subsection~\ref{ssec-ltx}, \LuaTeX-ja simply adopted \texttt{plfonts.dtx} in \pLaTeXe\ for the Japanese patch for NFSS2. For an convinience, we will describe commands which are not described in Subsection~\ref{ssub-chgfnt}. \begin{cslist}% \item[DeclareYokoKanjiEncoding\{\}\{\}\{\}] In NFSS2 under \LuaTeX-ja, distinction between alphabetic font families and Japanese font families is only made by its encoding. For example, encodings OT1 and T1 are for alphabetic font families, and a Japanese font family cannot have these encodings. This command defines a new encoding scheme for Japanese font family (in horizontal direction). \item[DeclareKanjiEncodingDefaults\{\}\{\}] \item[DeclareKanjiSubstitution\{\}\{\}\{\}\{\}] \item[DeclareErrorKanjiFont\{\}\{\}\{\}\{\}\{\}] The above 3~commands are just the counterparts for \verb+DeclareFontEncodingDefaults+ and~others. \item[reDeclareMathAlphabet\{\}\{\}\{\}] 和文・欧文の数式用フォントファミリを一度に変更する命令を作成する. 具体的には,欧文数式用フォントファミリ変更の命令と,和文数式用フォ ントファミリ変更の命令の2つを同時に行う命令として を(再)定義する.実際の使用ではに同じものを指定する,すなわち,に和文側も変 更させるようにするのが一般的と思われる. 本コマンドの使用については,\pLaTeX 配布中の\texttt{plfonts.dtx}に詳しく 注意点が述べられているので,そちらを参照されたい. \item[DeclareRelationFont\{\}\{\}\{\}\{\}\\ \hfill\{\}\{\}\{\}\{\}] %<*en> This command sets the `accompanied' alphabetic font family (given by the latter 4~arguments) with respect to a Japanese font family given by the former 4~arguments. % %<*ja> いわゆる「従属欧文」を設定するための命令である.前半の4引数で表される和文フォントファミリに対して, そのフォントに対応する「従属欧文」フォントファミリを後半の4引数により与える. % \item[SetRelationFont] This command is almost same as \verb+\DeclareRelationFont+, except that this command does a local assignment, where \verb+\DeclareRelationFont+ does a global assignment. \item[userelfont] Change current alphabetic font encoding/family/\dots\ to the `accompanied' alphabetic font family with respect to current Japanese font family, which was set by \verb+\DeclareRelationFont+ or \verb+SetRelationFont+. Like \verb+\fontfamily+, \verb+\selectfont+ is required to take an effect. \item[adjustbaseline] ... \item[fontfamily\{\}] {\let\item\origitem As in \LaTeXe, this command changes current font family (alphabetic, Japanese,~\emph{or both}) to . Which family will be changed is determined as follows: \begin{itemize} \item Let current encoding scheme for Japanese fonts be . Current Japanese font family will be changed to , if one of the following two conditions is met: \begin{itemize} \item The family under the encoding is already defined by \verb+\DeclareKanijFamily+. \item A font definition named \texttt{.fd} (the filename is all lowercase) exists. \end{itemize} \item Let current encoding scheme for Japanese fonts be . For alphabetic font family, the criterion as above is used. \item There is a case which none of the above applies, that is, the font family named doesn't seem to be defined neither under the encoding , nor under . In this case, the default family for font substitution is used for alphabetic and Japanese fonts. Note that current encoding will not be set to , unlike the original inplementation in \LaTeX. \end{itemize} } \end{cslist} As closing this subsection, we shall introduce an example of \verb+SetRelationFont+ and \verb+\userelfont+: \begin{LTXexample} \gtfamily{}あいうabc \SetRelationFont{JY3}{gt}{m}{n}{OT1}{pag}{m}{n} \userelfont\selectfont{}あいうabc \end{LTXexample} \subsection{Cropmark/`tombow'} \section{Extensions} \subsection{{\tt luatexja-fontspec.sty}} \subsection{{\tt luatexja-otf.sty}} This optional package supports typesetting charaters in Adobe-Japan1. {\tt luatexja-otf.sty} offers the following 2~low-level commands: \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}} \item[\char92CID\{\}] Typeset a character whose CID number is . \item[\char92UTF\{\}] Typeset a character whose character code is (in hexadecimal). This command is similar to \verb+\char"+,\ %" but please remind remarks below. \end{list} \paragraph{Remarks} Characters by \verb+\CID+ and \verb+\UTF+ commands are different from ordinary characters in the following points: \begin{itemize} \item Always treated as \textbf{JAchar}s. \item Processing codes for supporting OpenType features (\textit{e.g.}, glyph replacement and kerning) by the \Pkg{luaotfload} package is not performed to these characters. \end{itemize} \paragraph{Additionally Syntax of JFM} {\tt luatexja-otf.sty} extends the syntax of JFM; the entries of {\tt chars} table in JFM now allows a string in the form \verb+'AJ1-xxx'+, which stands for the character whose CID number in Adobe-Japan1 is \verb+xxx+. \part{Implementations}\label{part-imp} \section{Storing Parameters}\label{sec-para} \subsection{Used Dimensions, Attributes and whatsit nodes} Here the following is the list of dimension and attributes which are used in \LuaTeX-ja. \begin{list}{}{% \def\makelabel{\ttfamily} \def\dim#1{\item[\char92 #1\ \textrm{(dimension)}]} \def\attr#1{\item[\char92 #1\ \textrm{(attribute)}]} } \dim{jQ} As explained in Subsection~\ref{ssec-plain}, \verb+\jQ+ is equal to $1\,\textrm{Q}=0.25\,\textrm{mm}$, where `Q'~(also called `級') is a unit used in Japanese phototypesetting. So one should not change the value of this dimension. \dim{jH} There is also a unit called `歯' which equals to $0.25\,\textrm{mm}$ and used in Japanese phototypesetting. The dimension \verb+\jH+ stores this length, similar to \verb+\jQ+. \dim{ltj@zw} A temporal register for the `full-width' of current Japanese font. \dim{ltj@zh} A temporal register for the `full-height' (usually the sum of height of imaginary body and its depth) of current Japanese font. \attr{jfam} Current number of Japanese font family for math formulas. \attr{ltj@curjfnt} The font index of current Japanese font. \attr{ltj@charclass} The character class of Japanese \textit{glyph\_node}. \attr{ltj@yablshift} The amount of shifting the baseline of alphabetic fonts in scaled point ($2^{-16}\,\textrm{pt}$). \attr{ltj@ykblshift} The amount of shifting the baseline of Japanese fonts in scaled point ($2^{-16}\,\textrm{pt}$). \attr{ltj@autospc} Whether the auto insertion of \Param{kanjiskip} is allowed at the node. \attr{ltj@autoxspc} Whether the auto insertion of \Param{xkanjiskip} is allowed at the node. \attr{ltj@icflag} An attribute for distinguishing `kinds' of a node. One of the following value is assigned to this attribute: \begin{description} \item[\textit{italic} (1)] Glues from an itaric correction (\verb+\/+). This distinction of origins of glues (from explicit \verb+\kern+, or from \verb+\/+) is needed in the insertion process of \Param{xkanjiskip}. \item[\textit{packed} (2)] \item[\textit{kinsoku} (3)] Penalties inserted for the word-wrapping process of Japanese characters (\emph{kinsoku}). \item[\textit{from\_jfm} (4)] Glues/kerns from JFM. \item[\textit{line\_end} (5)] Kerns for ... \item[\textit{kanji\_skip} (6)] Glues for \Param{kanjiskip}. \item[\textit{xkanji\_skip} (7)] Glues for \Param{xkanjiskip}. \item[\textit{processed} (8)] Nodes which is already processed by ... \item[\textit{ic\_processed} (9)] Glues from an itaric correction, but also already processed. \item[\textit{boxbdd} (15)] Glues/kerns that inserted just the beginning or the ending of an hbox or a paragraph. \end{description} \attr{ltj@kcat$i$} Where $i$~is a natural number which is less than~7. These 7~attributes store bit~vectors indicating which character block is regarded as a block of \textbf{JAchar}s. \end{list} Furthermore, \LuaTeX-ja uses several `user-defined' whatsit nodes for typesetting. All those nodes store a natural number (hence the node's \texttt{type} is 100). \begin{description} \item[30111] Nodes for indicating that \verb+\inhibitglue+ is specified. The \texttt{value} field of these nodes doesn't matter. \item[30112] Nodes for \LuaTeX-ja's stack system (see the next subsection). The \texttt{value} field of these nodes is current group. \item[30113] Nodes for Japanese Characters which the callback process of luaotfload won't be applied, andd the character code is stored in the \texttt{value} field. Each node having this \verb+user_id+ is converted to a `glyph\_node' \emph{after} the callback process of luaotfload. \end{description} These whatsits will be removed during the process of inserting \textbf{JAglue}s. \subsection{Stack System of \LuaTeX-ja}\label{ssec-stack} \paragraph{Background} \LuaTeX-ja has its own stack system, and most parameters of \LuaTeX-ja are stored in it. To clarify the reason, imagine the parameter \Param{kanjiskip} is stored by a skip, and consider the following source: \begin{LTXexample} \ltjsetparameter{kanjiskip=0pt}ふがふが.% \setbox0=\hbox{\ltjsetparameter{kanjiskip=5pt}ほげほげ} \box0.ぴよぴよ\par \end{LTXexample} As described in Part~\ref{part-ref}, the only effective value of \Param{kanjiskip} in an hbox is the latest value, so the value of \Param{kanjiskip} which applied in the entire hbox should be 5\,pt. However, by the implementation method of \LuaTeX, this `5\,pt' cannot be known from any callbacks. In the \texttt{tex/packaging.w} (which is a file in the source of \LuaTeX), there are the following codes: \begin{lstlisting} void package(int c) { scaled h; /* height of box */ halfword p; /* first node in a box */ scaled d; /* max depth */ int grp; grp = cur_group; d = box_max_depth; unsave(); save_ptr -= 4; if (cur_list.mode_field == -hmode) { cur_box = filtered_hpack(cur_list.head_field, cur_list.tail_field, saved_value(1), saved_level(1), grp, saved_level(2)); subtype(cur_box) = HLIST_SUBTYPE_HBOX; \end{lstlisting} Notice that \verb+unsave+ is executed \emph{before} \verb+filtered_hpack+ (this is where \verb+hpack_filter+ callback is executed): so `5\,pt' in the above source is orphaned at \texttt+unsave+, and hence it can't be accessed from \verb+hpack_filter+ callback. \paragraph{The method} The code of stack system is based on that in a post of Dev-luatex mailing list\footnote{% \texttt{[Dev-luatex] tex.currentgrouplevel}, a post at 2008/8/19 by Jonathan Sauer.}. These are two \TeX\ count registers for maintaining informations: \verb+\ltj@@stack+ for the stack level, and \verb+\ltj@@group@level+ for the \TeX's group level when the last assignment was done. Parameters are stored in one big table named \texttt{charprop\_stack\_table}, where \texttt{charprop\_stack\_table[$i$]} stores data of stack level~$i$. If a new stack level is created by \verb+\ltjsetparameter+, all data of the previous level is copied. To resolve the problem mentioned in `Background' above, \LuaTeX-ja uses another thing: When a new stack level is about to be created, a whatsit node whose type, subtype and value are 44~(\textit{user\_defined}), 30112, and current group level respectively is appended to the current list (we refer this node by \textit{stack\_flag}). This enables us to know whether assignment is done just inside a hbox. Suppose that the stack level is~$s$ and the \TeX's group level is~$t$ just after the hbox group, then: \begin{itemize} \item If there is no \textit{stack\_flag} node in the list of hbox, then no assignment was occurred inside the hbox. Hence values of parameters at the end of the hbox are stored in the stack level~$s$. \item If there is a \textit{stack\_flag} node whose value is~$t+1$, then an assignment was occurred just inside the hbox group. Hence values of parameters at the end of the hbox are stored in the stack level~$s+1$. \item If there are \textit{stack\_flag} nodes but all of their values are more than~$t+1$, then an assignment was occurred in the box, but it is done is `more internal' group. Hence values of parameters at the end of the hbox are stored in the stack level~$s$. \end{itemize} Note that to work this trick correctly, assignments to \verb+\ltj@@stack+ and \verb+\ltj@@group@level+ have to be local always, regardless the value of \verb+\globaldefs+. This problem is resolved by using \hbox{\verb+\directlua{tex.globaldefs=0}+} (this assignment is local). \section{Linebreak after Japanese Character}\label{sec-lbreak} \subsection{Reference: Behavior in \pTeX} %<*en> In~\pTeX, a linebreak after a Japanese character doesn't emit a space, since words are not separated by spaces in Japanese writings. However, this feature isn't fully implemented in \LuaTeX-ja due to the specification of callbacks in~\LuaTeX. To clarify the difference between \pTeX~and~\LuaTeX, We briefly describe the handling of a linebreak in~\pTeX, in this subsection. \pTeX's input processor can be described in terms of a finite state automaton, as that of~\TeX\ in~Section~2.5 of~\cite{texbytopic}. The internal states are as follows: \begin{itemize} \item State~$N$: new line \item State~$S$: skipping spaces \item State~$M$: middle of line \item State~$K$: after a Japanese character \end{itemize} The first three states---$N$, $S$~and~$M$---are as same as \TeX's input processor. State~$K$ is similar to state~$M$, and is entered after Japanese characters. The diagram of state transitions are indicated in Figure~\ref{fig-ptexipro}. Note that \pTeX\ doesn't leave state~$K$ after `beginning/ending of a group' characters. % %<*ja> 欧文では文章の改行は単語間でしか行わない.そのため,\TeX では,(文字の直後の)改行は 空白文字と同じ扱いとして扱われる.一方,和文ではほとんどどどこでも改行が可能なため, \pTeX では和文文字の直後の改行は単純に無視されるようになっている. このような動作は,\pTeX が\TeX からエンジンとして拡張されたことによって可能になったことである. \pTeX の入力処理部は,\TeX におけるそれと同じように,有限オートマトンとして記述することができ, 以下に述べるような4状態を持っている. \begin{itemize} \item State~$N$: 行の開始. \item State~$S$: 空白読み飛ばし. \item State~$M$: 行中. \item State~$K$: 行中(和文文字の後). \end{itemize} また,状態遷移は,図\label{fig-ptexipro}のようになっており,図中の数字は カテゴリーコードを表している.最初の3状態は\TeX の入力処理部と同じであり, 図中から状態$K$と「$j$」と書かれた矢印を取り除けば,\TeX の入力処理部と同 じものになる. この図から分かることは, \begin{quote} 行が和文文字(とグループ境界文字)で終わっていれば,改行は無視される \end{quote} ということである. % \begin{figure}[!tb] \begin{gather*} \def\sp{\text{\tt\char32}} \xymatrix{&& {\text{scan a cs}}\ar@(r,ul)[dr]&\\ \ar[r]& *++[o][F-]{N}\ar[ur]^0\ar[dd]_{d,\ g}\ar[u]^{5\ (\text{\tt\char92par})} \ar@{->}@(d,l)[ddrr]_(0.45){j}&& *++[o][F-]{S}\ar@(l,dr)[ul]^0\ar@(l,ur)[ddll]_{d,\ g}\ar[u]_{5} \ar@{->}@(r,r)[dd]^{j}\\&\\& *++[o][F-]{M}\ar[uuur]^0\ar@(r,dl)[uurr]_(0.55){10\ (\sp)} \ar[d]_{5\ ({\sp})}\ar@{->}@(dr,dl)[rr]_{j}&& *++[o][F-]{K}\ar@{->}@(ul,d)[uuul]^0\ar@{->}[ll]^{d} \ar@{->}@(ur,dr)[uu]^{10\ (\sp)}\ar@{->}[d]_5\\ &&& }\\ d:=\{3,4,6,7,8,11,12,13\},\quad g:=\{1,2\},\quad j:=(\text{Japanese characters}) \end{gather*} \begin{itemize} \item Numbers represent category codes. \item Category codes 9~(ignored), 14~(comment)~and~15~(invalid) are omitted in above diagram. \end{itemize} \caption{State transitions of \pTeX's input processor.} \label{fig-ptexipro} \end{figure} \subsection{Behavior in \LuaTeX-ja} %<*en> States in the input processoe of \LuaTeX\ is the same as that of \TeX, and they can't be customized by any callbacks. Hence, we can only use \verb+process_input_buffer+ and \verb+token_filter+ callbacks for to suppress a space by a linebreak which is after Japanese characters. However, \verb+token_filter+ callback cannot be used either, since a character in category code 5~(end-of-line) is converted into an space token \emph{in the input processor}. So we can use only the \verb+process_input_buffer+ callback. This means that suppressing a space must be done \emph{just before} an input line is read. Considering these situations, handling of an end-of-line in \LuaTeX-ja are as follows: \begin{quote} A character U+FFFFF (its category code is set to 14~(comment) by \LuaTeX-ja) is appended to an input line, \emph{before \LuaTeX\ actually process it}, if and only if the following two conditions are satisfied: \begin{enumerate} \item The category code of the character $\langle${return}$\rangle$ (whose character code is 13) is 5~(end-of-line). \item The input line matches the following `regular expression': \[ (\text{any char})^*(\textbf{JAchar}) \bigl(\{\text{catcode}=1\}\cup\{\text{catcode}=2\}\bigr)^* \] \end{enumerate} \end{quote} \paragraph{Remark} The following example shows the major difference from the behavior of \pTeX: \begin{LTXexample} \ltjsetparameter{autoxspacing=false} \ltjsetparameter{jacharrange={-6}}xあ y\ltjsetparameter{jacharrange={+6}}zあ u \end{LTXexample} \begin{itemize} \item There is no space between `x' and `y', since the line~2 ends with a \textbf{JAchar} `あ' (this `あ' considered as an \textbf{JAchar} at the ending of line~1). \item There is no space between `あ' (in the line~3) and `u', since the line~3 ends with an \textbf{ALchar} (the letter `あ' considered as an \textbf{ALchar} at the ending of line~2). \end{itemize} % %<*ja> \LuaTeX の入力処理部は\TeX のそれと全く同じであり,callbackによりユーザが カスタマイズすることはできない.このため,改行抑制の目的でユーザが利用で きそうなcallbackとしては,\verb+process_input_buffer+や \verb+token_filter+に限られてしまう.しかし,\TeX の入力処理部をよく見る と,後者も役には経たないことが分かる:改行文字は,入力処理部によってトー クン化される時に,カテゴリーコード10の32番文字へと置き換えられてしまうた め,\verb+token_filter+で非標準なトークン読み出しを行おうとしても,空白文 字由来のトークンと,改行文字由来のトークンは区別できないのだ. すると,我々のとれる道は,\verb+process_input_buffer+を用いて \LuaTeX の入力処理部に引き渡される前に入力文字列を編集するというものしかない. 以上を踏まえ,\LuaTeX-jaにおける「和文文字直後の改行抑制」の処理は,次のようになっている: \begin{quote} 各入力行に対し,\textbf{その入力行が読まれる前の内部状態で} 以下の2条件が満たされている場合,\LuaTeX-jaはU+FFFFF番の文字 \footnote{この文字はコメント文字として扱われるように\LuaTeX-ja内部で設定をしている.} を末尾に追加する.よって,その場合に改行は空白とは見做されないこととなる. \begin{enumerate} \item 改行文字(文字コード13番)のカテゴリーコードが5~(end-of-line)である. \item 入力行は次の「正規表現」にマッチしている: \[ (\text{any char})^*(\textbf{JAchar}) \bigl(\{\text{catcode}=1\}\cup\{\text{catcode}=2\}\bigr)^* \] \end{enumerate} \end{quote} この仕様は,前節で述べた\pTeX の仕様にできるだけ近づけたものとなっている.最初の条件は, \texttt{verbatim}系環境などの日本語対応マクロを書かなくてすませるためのものである. しかしながら,完全に同じ挙動が実現できたわけではない. 差異は,次の例が示すように,和文文字の範囲を変更した行の改行において見られる: \begin{LTXexample} \ltjsetparameter{autoxspacing=false} \ltjsetparameter{jacharrange={-6}}xあ y\ltjsetparameter{jacharrange={+6}}zあ u \end{LTXexample} もし\pTeX とまったく同じ挙動を示すならば,出力は 「\hbox{\ltjsetparameter{autoxspacing=false}x yzあu}」となるべきである.しかし,実際には 上のように異なる挙動となっている. \begin{itemize} \item 2行目は「あ」という和文文字で終わる(2行目を処理する前の時点では, 「あ」は和文文字扱いである)ため,直後の改行文字は無視される. \item 3行目は「あ」という欧文文字で終わる(2行目を処理する前の時点では, 「あ」は欧文文字扱いである)ため,直後の改行文字は空白に置き換わる. \end{itemize} このため,トラブルを避けるために,和文文字の範囲を\verb+\ltjsetparameter+で編集した場合, その行はそこで改行するようにした方がいいだろう. % \section{Insertion of JFM glues, \Param{kanjiskip} and \Param{xkanjiskip}} \subsection{Overview} %<*en> NOT COMPLETED % %<*ja> \LuaTeX-ja における和文処理グルーの挿入方法は,\pTeX のそれとは全く異なる. \pTeX では次のような仕様であった: \begin{itemize} \item JFMグルーの挿入は,和文文字を表すトークンを元に水平リストに(文字を表す)を 追加する過程で行われる. \item \Param{xkanjiskip}の挿入は,hboxへのパッケージングや行分割前に行われる. \item \Param{kanjiskip}はノードとしては挿入されない.パッケージングや行分割の計算時に 「和文文字を表す2つのの間には\Param{kanjiskip}がある」ものとみなされる. \end{itemize} しかし,\LuaTeX-jaでは,hboxへのパッケージングや行分割前に全ての \textbf{JAglue},即ちJFMグルー・\Param{xkanjiskip}・\Param{kanjiskip}の 3種類を一度に挿入することになっている.これは,\LuaTeX において欧文の合字・ カーニング処理がノードベースになったことに対応する変更である. \LuaTeX-jaにおける\textbf{JAglue}挿入処理では,下の図\ref{fig-clu}のよう に「塊」を単位にして行われる.大雑把にいうと,「塊」は文字とそれに付随す るノード達(アクセント位置補正用のkernや,イタリック補正)をまとめたもの であり,2つの塊の間には,ペナルティ,\verb+\vadjust+,whatsitなど,行組版 には関係しないものがある.そのため,…… % % \begin{figure}[!tb] % \unitlength=10mm % \end{figure} \subsection{Definition of a `cluster'} \begin{defn} A \emph{cluster} is a list of nodes in one of the following forms, with the \textit{id} of it: \begin{enumerate} \item Nodes whose value of\ \verb+\ltj@icflag+ is in $[3,15)$. These nodes come from a hbox which is already packaged, by unpackaging (\verb+\unhbox+). The \textit{id} is \textit{id\_pbox}. \item A inline math formula, including two \textit{math\_node}s at the boundary of it: HOGE The \textit{id} is \textit{id\_math}. \item A \textit{glyph\_node} with nodes which relate with it: HOGE The \textit{id} is \textit{id\_jglyph} or \textit{id\_glyph}, according to whether the \textit{glyph\_node} represents a Japanese character or not. \item An box-like node, that is, an hbox, an vbox and an rule (\verb+\vrule+). The \textit{id} is \textit{id\_hlist} if the node is an hbox which is not shifted vertically, or \textit{id\_box\_like} otherwise. \item A glue, a kern whose subtype is not 2~(\textit{accent}), and a discretionary break. The \textit{id} is \textit{id\_glue}, \textit{id\_kern} and \textit{id\_disc}, respectively. %Just a node which will \dots, \textit{i.e.}, a node which is \emph{not} one of the following: %\textit{ins\_node}, \textit{mark\_node}, \textit{adjust\_node}, \textit{whatsit\_node} %and \textit{penalty\_node}. \end{enumerate} We denote a cluster by \textit{Np}, \textit{Nq} and \textit{Nr}. \end{defn} Internally, a cluster is represented by a table $\textit{Np}$ with the following fields. \begin{description} \def\makelabel#1{\textbf{\textit{#1}}} \item[first, last] The first/last node of the cluster. \item[id] The \textit{id} in above definition. \item[nuc] % jachar \item[auto\_kspc, auto\_xspc] \item[xspc\_before, xspc\_after] % alchar, jachar \item[pre, post] \item[char] \item[class] \item[lend] \item[met, var] \end{description} \end{document}