Apache Open NLP Maven Eclipse Example

Apache Open NLP Maven Eclipse Example thumbnail
13K
By Dhiraj 09 July, 2017

This tutorial is about setting up apache opennlp with maven in Eclipse or IntellijIdea. Here we will be creating an example using Sentence Detector componenet provided by apache opennlp.For this purpose we will be using en-sent.bin file that is trained on opennlp training data. So let us get started.

apache-open-nlp
 Other NLP Articles
Apache OpenNLP Named Entity Recognition Example
Standford NLP Maven Example
Standford NLP POS Tagger Example
OpenNLP POS Tagger Example
Standford NLP Named Entity Recognition

What is NLP

NLP stands for Neuro-Linguistic Programming. Neuro refers to your neurology; Linguistic refers to language; programming refers to how that neural language functions. In other words, learning NLP is like learning the language of your own mind and its referred as Natural Language Processing.

There are many existing NLP libraries available online which are already trained on most common NLP tasks such as NLTK, OpenNLP, Standford CoreNLP. In this post we will be discussing about OpenNLP and provide a basic example to get started with OpenNLP to detect sentences using maven and eclipse IDE.

Project Structure

apache-open-nlp-project-strct

Maven Dependency

opennlp-tools: It provides concrete implementations of NLP algorithms such as sentence splitting, POS-tagging etc.

pom.xml

        <groupId>com.devglan</groupId>
        <artifactId>open-nlp-demo</artifactId>
        <version>1.0-SNAPSHOT</version>
	
    <dependencies>
		<dependency>
            <groupId>org.apache.opennlp</groupId>
            <artifactId>opennlp-tools</artifactId>
			<version>1.8.1</version>
        </dependency>
		
		<dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
        </dependency>
	</dependencies>

Implementing OpenNLP SentenceDetector

SentenceDetector can detect sentences from a part of speech. OpenNLP has predefined componenet as en-sent.bin which is trained to identify sentences from a part of speech. We have this file - en-sent.bin present inside /resources folder. Once this file is loaded, we can call sentDetect() to detect the sentences from a part of speech.

SentencePosDetectorDemo.java
package com.devglan;

import opennlp.tools.sentdetect.SentenceDetector;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;

import java.io.IOException;
import java.io.InputStream;

/**
 * Created by only2dhir on 08-07-2017.
 */

public class SentenceDetectorDemo {

    public String[] detectSentence(String paragraph) throws IOException {

        InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin");
        final SentenceModel sentenceModel = new SentenceModel(modelIn);
        modelIn.close();

        SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel);
        String sentences[] = sentenceDetector.sentDetect(paragraph);
        for (String sent : sentences) {
            System.out.println(sent);
        }
        return sentences;
    }

}

Implementing OpenNLP SentencePosDetector

OpenNlp also provides ways to detect the positions of the sentences in a raw text. We can use sentPosDetect() to identify the position of the sentences from a raw text. Following is an example.

SentencePosDetectorDemo.java
package com.devglan;

import opennlp.tools.sentdetect.SentenceDetector;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;

import java.io.IOException;
import java.io.InputStream;

/**
 * Created by only2dhir on 08-07-2017.
 */
 
public class SentencePosDetectorDemo {

    public Span[] detectSentencePos(String paragraph) throws IOException {

        InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin");
            final SentenceModel sentenceModel = new SentenceModel(modelIn);
            modelIn.close();
        SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel);
            Span[] spans = sentenceDetector.sentPosDetect(paragraph);
            for (Span span : spans) {
                System.out.println(span);
            }
        return spans;
    }
}

Testing the Application

Following are some test cases to detect sentences and its position using apache OpenNLP.

SentenceDetectorTest.java
package com.devglan;

import opennlp.tools.util.Span;
import org.junit.Assert;
import org.junit.Test;

import java.io.IOException;

/**
 * Created by only2dhir on 08-07-2017.
 */
public class SentenceDetectorTest {

    @Test
    public void SentenceDetectorTest() throws IOException {
        SentenceDetectorDemo sentenceDetector = new SentenceDetectorDemo();
        String[] sentences = sentenceDetector.detectSentence("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites.");
        Assert.assertTrue(sentences != null && sentences.length > 0);
    }

    @Test
    public void SentencePosDetectorTest() throws IOException {
        SentencePosDetectorDemo sentenceDetector = new SentencePosDetectorDemo();
        Span[] spans = sentenceDetector.detectSentencePos("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites.");
        Assert.assertTrue(spans != null && spans.length > 0);
    }

}

Output

open-nlp-maven-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download source

Share

If You Appreciate This, You Can Consider:

We are thankful for your never ending support.

About The Author

author-image
A technology savvy professional with an exceptional capacity to analyze, solve problems and multi-task. Technical expertise in highly scalable distributed systems, self-healing systems, and service-oriented architecture. Technical Skills: Java/J2EE, Spring, Hibernate, Reactive Programming, Microservices, Hystrix, Rest APIs, Java 8, Kafka, Kibana, Elasticsearch, etc.

Further Reading on Artificial Intelligence