Open NLP POS Tagger Example (Maven + Eclipse)

Open NLP POS Tagger Example (Maven + Eclipse) thumbnail
14K
By Dhiraj 11 July, 2017

In this article we will be discussing about apache OpenNLP POS Tagger with an example. The example will be a maven based project and we will be using en-pos-maxent.bin model file to tag any part of speech. We will be using WhitespaceTokenizer provided by OpenNLP to tokenize the text.

What is Part-of-Speech Tagging

As per wiki, POS tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

 Other NLP Articles
Standford NLP Named Entity Recognition
Apache OpenNLP Maven Eclipse Example
Standford NLP Maven Example
Standford NLP POS Tagger Example
Apache OpenNLP Named Entity Recognition Example

Different POS Tags Meanings

Following is the POS Tags with their corresponding meaning.

pos-tags-meaning

Maven Dependencies for OpenNLP

pom.xml
	
    <dependencies>
	<dependency>
            <groupId>org.apache.opennlp</groupId>
            <artifactId>opennlp-tools</artifactId>
	    <version>1.8.1</version>
        </dependency>
		
	<dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
	    <version>4.12</version>
	    <scope>test</scope>
        </dependency>
   </dependencies>

Implementing POS Tagging using Apache OpenNLP

Following is the class that takes a chunk of text as an input parameter and tags each word. In this example, first we are using sentence detector to split a paragraph into muliple sentences and then the each sentence is then tagged using OpenNLP POS tagging. Here is the complete article on Sentence Detector.

WhitespaceTokenizer tokenizer uses white spaces to tokenize the input text. en-pos-maxent.bin is the maxent model with tag dictionary.

POSTaggingExample.java
package com.devglan;

import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.sentdetect.SentenceDetector;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.tokenize.WhitespaceTokenizer;

import java.io.IOException;
import java.io.InputStream;

/**
 * Created by only2dhir on 11-07-2017.
 */
public class POSTaggingExample {

    POSTaggerME tagger = null;
    POSModel model = null;

    public void initialize(String lexiconFileName) {
        try {
            InputStream modelStream =  getClass().getResourceAsStream(lexiconFileName);
            model = new POSModel(modelStream);
            tagger = new POSTaggerME(model);
        } catch (IOException e) {
            System.out.println(e.getMessage());
        }
    }

    public void tag(String text){
        initialize("/en-pos-maxent.bin");
        try {
            if (model != null) {
                POSTaggerME tagger = new POSTaggerME(model);
                if (tagger != null) {
                    String[] sentences = detectSentences(text);
                    for (String sentence : sentences) {
                        String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE
                                .tokenize(sentence);
                        String[] tags = tagger.tag(whitespaceTokenizerLine);
                        for (int i = 0; i < whitespaceTokenizerLine.length; i++) {
                            String word = whitespaceTokenizerLine[i].trim();
                            String tag = tags[i].trim();
                            System.out.print(tag + ":" + word + "  ");
                        }
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public String[] detectSentences(String paragraph) throws IOException {

        InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin");
        final SentenceModel sentenceModel = new SentenceModel(modelIn);
        modelIn.close();

        SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel);
        String sentences[] = sentenceDetector.sentDetect(paragraph);
        for (String sent : sentences) {
            System.out.println(sent);
        }
        return sentences;
    }
}

Testing OpenNLP POS Tagger

Following is the test class to test the tagger class.

package com.devglan;

import org.junit.Test;

/**
 * Created by only2dhir on 11-07-2017.
 */
public class POSTaggerTest {

    @Test
    public void tag(){
        POSTaggingExample tagging = new POSTaggingExample();
        tagging.tag("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites");
    }
}

Output

open-nlp-pos-tagger-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download source

Share

If You Appreciate This, You Can Consider:

We are thankful for your never ending support.

About The Author

author-image
A technology savvy professional with an exceptional capacity to analyze, solve problems and multi-task. Technical expertise in highly scalable distributed systems, self-healing systems, and service-oriented architecture. Technical Skills: Java/J2EE, Spring, Hibernate, Reactive Programming, Microservices, Hystrix, Rest APIs, Java 8, Kafka, Kibana, Elasticsearch, etc.

Further Reading on Artificial Intelligence