Let’s start with the obvious question, what is a tokenizer? A tokenizer in Natural Language Processing (NLP) is a text preprocessing step where the text is split into tokens. Tokens can be sentences, words, or any other unit that makes up a text.
Every NLP package has a word tokenizer implemented in it. But there is a certain challenge associated with Malayalam tokenization.(more…)