The easiest way to install MeCab and ipadic is to use Homebrew, through macOS 10.13 High Sierra. I have an older Mac and stopped upgrading past 10.13, so YMMV with more recent iterations or 10.11+. This process is in contrast to my previous steps for 10.9.2 and below, where lots of file editing was required. Below I also include editing MeCab's configuration file to change dictionaries.
Install MeCab via Homebrew
$ brew install mecab $ brew install mecab-ipadic
Yup, that's it. Now you also need to install the MeCab library for Python 3 if you want to be able to use it within your scripting (like the projects I've detailed elsewhere on this website):
$ pip install mecab-python3
For reference and further information, see the MeCab homepage.
Change Your MeCab Dictionary
In this example, I'll change the dictionary to the kindai bungo unidic dictionary that I use most often (modern, ie. Meiji-Taisho period, written Japanese). You can find all of the NINJAL dictionaries for unidic here.
Open file /usr/local/etc/mecabrc and change:
dicdir = /usr/local/lib/mecab/dic/ipadic dicdir = /usr/local/lib/mecab/dic/unidic
Make sure you move your preferred dictionary to the unidic directory. Just copy and paste everything that you downloaded from the NINJAL site in there, with the directory structure intact. Now, your unidic will be your default dictionary and you can use it (remember, with -Owakati as your option for parser, not -Ochasen) in MeCab Python (and also in rmecab, if you're an R user.)
Using Custom Dictionaries (unidic)
To actually use kindai bungo, or another unidic dictionary, to process text I needed to switch from -Ochasen parsing option in MeCab (commonly shown in tutorials/docs I've found), to -Owakati. I relied on Japanese-only writeups like this one (test.py Hatena blog) to learn what options do or don't work with non-default setups.
-Owataki parsing option will return a string of the tokenized text, not a data structure with more grammatical information per "word" like -Ochasen (so you'll have to adjust what your script expects as input from this process, vs. most tutorials). See the linked blog post for sample code that helped me a lot in various projects where I was tokenizing early 20th-century text as a first step. The author also covers the NLTKJP context.
(Deprecated) Install MeCab from source
This is a legacy set of instructions from OS 10.9.2, which I'm leaving up for posterity - but this did NOT actually work for me in the end.
First, I had to do this to make the C compiler work. This may or may not be the same for you.
sudo ln -s /usr/bin/gcc /usr/bin/gcc-apple-4.2
- Install MeCab:
- Get the MeCab source
- Switch to whatever directory you downloaded it to, then...
$ tar zxfv mecab-0.996.tar.gz $ cd mecab-0.996 $ ./configure $ make $ make check
$ tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz $ cd mecab-ipadic-2.7.0-20070801 $ ./configure --with-charset=utf8
$ ./configure --with-mecab-config=~/usr/local/bin/mecab-config --prefix=~/usr/local/bin --with-charset=utf8(This actually didn't work for me. I ended up using Homebrew to install instead.)
$ make $ sudo make install
$ mecab test test 名詞,固有名詞,組織,*,*,*,* EOSIf you get this output, you have successfully installed MeCab on your Mac.