Open NLP POS Tagger Example (Maven + Eclipse)

By Dhiraj Ray, 11 July,2017   831

In this article we will be discussing about apache OpenNLP POS Tagger with an example. The example will be a maven based project and we will be using en-pos-maxent.bin model file to tag any part of speech. We will be using WhitespaceTokenizer provided by OpenNLP to tokenize the text.

What is Part-of-Speech Tagging

As per wiki, POS tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Different POS Tags Meanings

Following is the POS Tags with their corresponding meaning.


Maven Dependencies for OpenNLP

<dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>

Implementing POS Tagging using Apache OpenNLP

Following is the class that takes a chunk of text as an input parameter and tags each word. In this example, first we are using sentence detector to split a paragraph into muliple sentences and then the each sentence is then tagged using OpenNLP POS tagging. Here is the complete article on Sentence Detector.

WhitespaceTokenizer tokenizer uses white spaces to tokenize the input text. en-pos-maxent.bin is the maxent model with tag dictionary.
package com.devglan; import; import; import; import; import; import; import; import; /** * Created by only2dhir on 11-07-2017. */ public class POSTaggingExample { POSTaggerME tagger = null; POSModel model = null; public void initialize(String lexiconFileName) { try { InputStream modelStream = getClass().getResourceAsStream(lexiconFileName); model = new POSModel(modelStream); tagger = new POSTaggerME(model); } catch (IOException e) { System.out.println(e.getMessage()); } } public void tag(String text){ initialize("/en-pos-maxent.bin"); try { if (model != null) { POSTaggerME tagger = new POSTaggerME(model); if (tagger != null) { String[] sentences = detectSentences(text); for (String sentence : sentences) { String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE .tokenize(sentence); String[] tags = tagger.tag(whitespaceTokenizerLine); for (int i = 0; i < whitespaceTokenizerLine.length; i++) { String word = whitespaceTokenizerLine[i].trim(); String tag = tags[i].trim(); System.out.print(tag + ":" + word + " "); } } } } } catch (Exception e) { e.printStackTrace(); } } public String[] detectSentences(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); String sentences[] = sentenceDetector.sentDetect(paragraph); for (String sent : sentences) { System.out.println(sent); } return sentences; } }

Testing OpenNLP POS Tagger

Following is the test class to test the tagger class.

package com.devglan; import org.junit.Test; /** * Created by only2dhir on 11-07-2017. */ public class POSTaggerTest { @Test public void tag(){ POSTaggingExample tagging = new POSTaggingExample(); tagging.tag("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites"); } }




I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download the source

Further Reading on Artificial Intelligence

Suggest more topics in suggestion section or write your own article and share with your colleagues.