android输入法02：openwnn源码解析06—候选词产生

本文将要介绍openwnn输入法，输入过程中，候选词是如何产生的。由于我们只研究前端java代码，因此我们只介绍相应的接口。实际上，输入法候选词主要是来源于后端（用c代码编写的那部分），这里涉及到一些输入法相关的模型等内容，我们在此是不做介绍的。

网上流传的openwnn源码后端都没有处理（将c转为so文件），所以都是不可直接生成可运行apk的。我编译了下C代码后的源码下载地址为：http://download.****.net/detail/xianming01/4308456。

最近看到网上有转载我的文章，但没有注明出处。由于这是系列文章单单转载一两篇读者也不一定看得懂。因此在这里做一个标记，如果看到转载的同学，可以访问我的博客http://blog.****.net/xianming01。

1、候选词来源

候选词来源分为两种，一种是需要复杂变换的，一种不需要复杂变换的。这里的定义是：

需要复杂变换：需要输入法语言模型，经过输入法后端进行变换的。

不需要复杂变换：不需要语言模型，也不需要经过输入法后端，只在前端java代码即可实现变换的。

下面我们就分别来介绍这些内容。这一部分涉及到比较多的类，列出来如下：

简单变换

LetterConverter.java

Romkan.java

RomkanFullKatakana.java

RomkanHalfKatakana.java

KanaConverter.java

复杂变换

WnnEngine.java

OpenWnnEngineJAJP.java

OpenWnnClauseConverterJAJP.java

OpenWnnDictionaryImpl.java

OpenWnnDictionaryImplJni.java

辅助类

ComposingText.java

2、 ComposingText

这是一个比较关键的类，在很多代码中都有涉及到。所以我们拿出来介绍一下。

该类实际上是表示的是当前正在编辑的串，也就是在输入框有个下划线的那一部分。

其类中的变量定义如下：

/** * The container class of composing string. * * This interface is for the class includes information about the * input string, the converted string and its decoration. * {@link LetterConverter} and {@link WnnEngine} get the input string from it, and * store the converted string into it. * * @author Copyright (C) 2009 OMRON SOFTWARE CO., LTD. All Rights Reserved. */ public class ComposingText { /** * Text layer 0. * * This text layer holds key strokes. * (ex) Romaji in Japanese. Parts of Hangul in Korean. */ public static final int LAYER0 = 0; /** * Text layer 1. * * This text layer holds the result of the letter converter. * (ex) Hiragana in Japanese. Pinyin in Chinese. Hangul in Korean. */ public static final int LAYER1 = 1; /** * Text layer 2. * * This text layer holds the result of the consecutive clause converter. * (ex) the result of Kana-to-Kanji conversion in Japanese, * Pinyin-to-Kanji conversion in Chinese, Hangul-to-Hanja conversion in Korean language. */ public static final int LAYER2 = 2; /** Maximum number of layers */ public static final int MAX_LAYER = 3; /** Composing text's layer data */ protected ArrayList<StrSegment>[] mStringLayer; /** Cursor position */ protected int[] mCursor; 这里将词分为三层，以日文为例：第一层是你输入的按键信息；第二层是假名；第三层是汉字信息。第三层由第二层得到，第二层由第一层得到。比如，上面的两个图，第一个图显示的是第二层的信息，第二层显示的是第三层的信息（按一下“变换”键）。

另外，这一个类，还用于编辑文节。因为你按下左箭头或者右箭头，可以选择整个串的不同部分。这一部分的具体代码我也没怎么看懂？

3、复杂变换

需要复杂变换的候选词来源于输入法引擎，我们来看一下文本变换的引擎接口（如何从输入获得候选词的接口）。该文件为WnnEngine：

/** * The interface of the text converter accessed from OpenWnn. * * The realization class of this interface should be an singleton class. * * @author Copyright (C) 2009, OMRON SOFTWARE CO., LTD. All Rights Reserved. */ public interface WnnEngine { /* * DEFINITION OF CONSTANTS */ /** The identifier of the learning dictionary */ public static final int DICTIONARY_TYPE_LEARN = 1; /** The identifier of the user dictionary */ public static final int DICTIONARY_TYPE_USER = 2; /* * DEFINITION OF METHODS */ /** * Initialize parameters. */ public void init(); /** * Close the converter. * * * OpenWnn calls this method when it is destroyed. */ public void close(); /** * Predict words/phrases. * * @param text The input string * @param minLen The minimum length of a word to predict (0 : no limit) * @param maxLen The maximum length of a word to predict (-1 : no limit) * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs. */ public int predict(ComposingText text, int minLen, int maxLen); /** * Convert a string. * * This method is used to consecutive/single clause convert in * Japanese, Pinyin to Kanji convert in Chinese, Hangul to Hanja * convert in Korean, etc. * * The result of conversion is set into the layer 2 in the {@link ComposingText}. * To get other candidates of each clause, call {@link #makeCandidateListOf(int)}. * * @param text The input string * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs. */ public int convert(ComposingText text); /** * Search words from the dictionaries. * * @param key The search key (stroke) * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs. */ public int searchWords(String key); /** * Search words from the dictionaries. * * @param word A word to search * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs. */ public int searchWords(WnnWord word); /** * Get a candidate. * * After {@link #predict(ComposingText, int, int)} or {@link #makeCandidateListOf(int)} or * {@code searchWords()}, call this method to get the * results. This method will return a candidate in decreasing * frequency order for {@link #predict(ComposingText, int, int)} and * {@link #makeCandidateListOf(int)}, in increasing character code order for * {@code searchWords()}. * * @return The candidate; {@code null} if there is no more candidate. */ public WnnWord getNextCandidate(); } 以上引用的代码中，我们只是将候选词来源的接口列出来了，其他一些接口都是用来辅助的，所以未予列出。从上面我们可以看出，候选词来源包括如下途径：

预测
文节变换（单文节或者多文节变换，多文节变换又称为整句变换）
从词典中搜索（预测实际上是以此为基础的

3.2 预测（词典搜索）

从词典中搜索，从字面理解就可以了。但是这里涉及到一个问题：词典格式。openwnn的词典，我估计是先用文本文件写好，然后再用某一种工具转换为现在的词典样子的。所以你去看词典，基本上看不懂。最近我们公司收购了一款基于openwnn的日文输入法，现在需要将自己的后端替换掉原来的开源后端。在研究的过程就发现，看不懂词典格式，所以很多工作就没法做了。

从词典搜索的代码没看，但是预计也就是按照词典格式去搜索，这个从实现上还是比较简单的。
我们来看一下预测的代码：

public int predict(ComposingText text, int minLen, int maxLen) { clearCandidates(); if (text == null) { return 0; } /* set mInputHiragana and mInputRomaji */ int len = setSearchKey(text, maxLen); /* set dictionaries by the length of input */ setDictionaryForPrediction(len); /* search dictionaries */ mDictionaryJP.setInUseState( true ); if (len == 0) { /* search by previously selected word */ return mDictionaryJP.searchWord(WnnDictionary.SEARCH_LINK, WnnDictionary.ORDER_BY_FREQUENCY, mInputHiragana, mPreviousWord); } else { if (mExactMatchMode) { /* exact matching */ mDictionaryJP.searchWord(WnnDictionary.SEARCH_EXACT, WnnDictionary.ORDER_BY_FREQUENCY, mInputHiragana); } else { /* prefix matching */ mDictionaryJP.searchWord(WnnDictionary.SEARCH_PREFIX, WnnDictionary.ORDER_BY_FREQUENCY, mInputHiragana); } return 1; } } 预测，实际上就是去词典搜索结果。这里有两种模式，一种是完全匹配，一种是前缀匹配。前者，比如你输入“zhongguo”，则可以得到词典中，读音为“zhongguo”的结果，比如“中国”、“种过”等；后者，比如你输入“zhongguo”，在词典中搜索到“中国人民”，“中国人民共和国”等结果。

3.3 文节变换

文节变换包括两种：单文节变换和多文节变换。单文节变换是指，比如输入“かわい”，可以变换出“可愛”这个词。多文节变换是指连续输入两个单文节，实际上每个单文节都会有很多个结果，多文节变换选择其中最合适的组合作为两个单文节的组合结果。

由于这一部分跟语言关系比较大，所以没怎么看懂，就不多介绍了。

4、简单变换

这种变换直接在前端java代码中完成。主要包括两部分：1、罗马音输入；2、英数全半角片假名变换。

4.1 罗马音输入

罗马音输入，是指，你输入拼音，输入法根据拼音生成相应的假名（类似于拼音输入）。比如你输入"kawai"，则显示的结果为"かわい"，这其中就涉及到将ka变换为か的操作。这种输入方式，通常存在于全键盘的情况下。但是我发现，我编译的openwnn源码是没有全键盘的，所以无法使用这个功能的。我下载了一个simeji日文输入法（基于openwnn开发的），其全键盘形式如下：

这一部分是在这四个类中完成的：

LetterConverter.java

Romkan.java

RomkanFullKatakana.java

RomkanHalfKatakana.java

实际上大家看源码，就是根据HashMap中国查找得出结果的。比如Romkan.java中romkanTable类的内容为：

put("la", "ぁ") put("xa", "ぁ") put("a", "あ") put("li", "ぃ") put("lyi", "ぃ") put("xi", "ぃ") put("xyi", "ぃ") put("i", "い") put("yi", "い") 这就是为什么，你输入li，会得到"ぃ"的原因。

4.2英数全半角片假名变换

这些东西在类KanaConverter.java中完成。

这个类的具体变换过程我就不介绍了。我介绍一下其中的几个HashMap，看完这个大家估计就懂了。

mHanSuujiMap的内容类似：

put( "あ", "1") put( "い", "11") put( "う", "111") put( "え", "1111") put( "お", "11111") put( "か", "2") put( "き", "22") put( "く", "222") put( "け", "2222") put( "こ", "22222")mZenSuujiMap：

put( "あ", "１") put( "い", "１１") put( "う", "１１１")mHanKataMap：

put( "あ", "ｱ") put( "い", "ｲ") put( "う", "ｳ") 然后大家在使用时，比如输入“あ”，按“英数”键，可以得到如下结果：

通过这张图，大家可以想得到那几个HashMap的作用了吧。

5、结语

通过以上的介绍，大家应该对openwnn日文输入法候选词的产生有了一定的了解了。

这里还涉及到两个问题:

虽然我们知道了这么多接口。但是输入法是如何使用这些接口并生成CandidatesView的呢？这个问题我们会在后续的文章中介绍；
java代码是如何调用后端c代码的。这里涉及jni的内容，后续会介绍。

android输入法02：openwnn源码解析06—候选词产生

相关推荐