Apache Open NLP Maven Eclipse Example

By Dhiraj Ray, 09 July,2017  
586

This tutorial is about setting up apache opennlp with maven in Eclipse or IntellijIdea. Here we will be creating an example using Sentence Detector componenet provided by apache opennlp.For this purpose we will be using en-sent.bin file that is trained on opennlp training data. So let us get started.

apache-open-nlp

What is NLP

NLP stands for Neuro-Linguistic Programming. Neuro refers to your neurology; Linguistic refers to language; programming refers to how that neural language functions. In other words, learning NLP is like learning the language of your own mind and its referred as Natural Language Processing.

There are many existing NLP libraries available online which are already trained on most common NLP tasks such as NLTK, OpenNLP, Standford CoreNLP. In this post we will be discussing about OpenNLP and provide a basic example to get started with OpenNLP to detect sentences using maven and eclipse IDE.

Project Structure

apache-open-nlp-project-strct

Maven Dependency

opennlp-tools: It provides concrete implementations of NLP algorithms such as sentence splitting, POS-tagging etc.

pom.xml
<groupId>com.devglan</groupId> <artifactId>open-nlp-demo</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>

Implementing OpenNLP SentenceDetector

SentenceDetector can detect sentences from a part of speech. OpenNLP has predefined componenet as en-sent.bin which is trained to identify sentences from a part of speech. We have this file - en-sent.bin present inside /resources folder. Once this file is loaded, we can call sentDetect() to detect the sentences from a part of speech.

SentencePosDetectorDemo.java
package com.devglan; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 08-07-2017. */ public class SentenceDetectorDemo { public String[] detectSentence(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); String sentences[] = sentenceDetector.sentDetect(paragraph); for (String sent : sentences) { System.out.println(sent); } return sentences; } }

Implementing OpenNLP SentencePosDetector

OpenNlp also provides ways to detect the positions of the sentences in a raw text. We can use sentPosDetect() to identify the position of the sentences from a raw text. Following is an example.

SentencePosDetectorDemo.java
package com.devglan; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 08-07-2017. */ public class SentencePosDetectorDemo { public Span[] detectSentencePos(String paragraph) throws IOException { InputStream modelIn = getClass().getResourceAsStream("/en-sent.bin"); final SentenceModel sentenceModel = new SentenceModel(modelIn); modelIn.close(); SentenceDetector sentenceDetector = new SentenceDetectorME(sentenceModel); Span[] spans = sentenceDetector.sentPosDetect(paragraph); for (Span span : spans) { System.out.println(span); } return spans; } }

Testing the Application

Following are some test cases to detect sentences and its position using apache OpenNLP.

SentenceDetectorTest.java
package com.devglan; import opennlp.tools.util.Span; import org.junit.Assert; import org.junit.Test; import java.io.IOException; /** * Created by only2dhir on 08-07-2017. */ public class SentenceDetectorTest { @Test public void SentenceDetectorTest() throws IOException { SentenceDetectorDemo sentenceDetector = new SentenceDetectorDemo(); String[] sentences = sentenceDetector.detectSentence("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites."); Assert.assertTrue(sentences != null && sentences.length > 0); } @Test public void SentencePosDetectorTest() throws IOException { SentencePosDetectorDemo sentenceDetector = new SentencePosDetectorDemo(); Span[] spans = sentenceDetector.detectSentencePos("If you have several test classes, you can combine them into a test suite. Running a test suite executes all test classes in that suite in the specified order. A test suite can also contain other test suites."); Assert.assertTrue(spans != null && spans.length > 0); } }

Output

open-nlp-maven-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download the source

References

Apache OpenNLp

OpenNLP Models

Open NLP

Suggest more topics in suggestion section or write your own article and share with your colleagues.

Is this page helpful to you? Please give us your feedback below. We would love to hear your thoughts on these articles, it will help us improve further our learning process.

Further Reading: