Custom Japanese tokenization in Solr 4.0

Please Visit: http://ift.tt/1ajReyV



Custom Japanese tokenization in Solr 4.0

http://ift.tt/1aRlG4o

Untokenized phrase

Tokenized phrase

Reading, or pronunciation

Part of speech

Spaces around commas - The CSV parser is very picky about format. You should never have any spaces surrounding the commas separating fields



userdict.txt

org.apache.lucene.analysis.ja.dict.UserDictionary

org.apache.lucene.analysis.ja.dict.Dictionary

Compiling a custom dictionary for Kuromoji and Solr

http://ift.tt/1dqYaRB



from Google Plus RSS Feed for 101157854606139706613 http://ift.tt/1aRlG4o

via LifeLong Community

No comments:

Post a Comment