Life Long Programmer's Community Log: Custom Japanese tokenization in Solr 4.0

Please Visit: http://ift.tt/1ajReyV

Custom Japanese tokenization in Solr 4.0

http://ift.tt/1aRlG4o

Untokenized phrase

Tokenized phrase

Reading, or pronunciation

Part of speech

Spaces around commas - The CSV parser is very picky about format. You should never have any spaces surrounding the commas separating fields

userdict.txt

org.apache.lucene.analysis.ja.dict.UserDictionary

org.apache.lucene.analysis.ja.dict.Dictionary

Compiling a custom dictionary for Kuromoji and Solr

http://ift.tt/1dqYaRB

from Google Plus RSS Feed for 101157854606139706613 http://ift.tt/1aRlG4o

via LifeLong Community

Life Long Programmer's Community Log

Custom Japanese tokenization in Solr 4.0

No comments:

Post a Comment