OpenNLP Named Entity Recognition Example(Maven + Eclipse)

By Dhiraj Ray, 16 July,2017  

In his article we will be discussing about OpenNLP named entity recognition(NER) with maven and eclipse project. We will be using NameFinderME class provided by OpenNLP for NER with different pre-trained model files such as en-ner-location.bin, en-ner-person.bin, en-ner-organization.bin.

What is Named Entity Recognition

As per wiki, Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Eclipse Project Structure

apache-open-nlp-ner-strct

Maven Dependency

<dependencies> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>

Apache OpenNLP Named Entity Recognition

There are many pre-trained model objects provided by OpenNLP such as en-ner-person.bin,en-ner-location.bin, en-ner-organization.bin, en-ner-time.bin etc to detect named entity such as person, locaion, organization etc from a piece of text. The complete list of pre-trained model objects can be found here.

There is a common way provided by OpenNLP to detect all these named entities.First, we need to load the pre-trained models and then instantiate TokenNameFinderModel object. Following is an example.

InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStream);

After this we need to initialise NameFinderME class and use find() method to find the respective entities. This method requires tokens of a text to find named entities, hence we first require to tokenise the text.Following is an example.

NameFinderME nameFinder = new NameFinderME(model); String[] tokens = tokenize(paragraph); Span nameSpans[] = nameFinder.find(tokens);

Finding Names Using OpenNLP

Based on the above undestanding, following is the complete code to find names from a text using OpenNLP.

package com.devglan; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 15-07-2017. */ public class NameFinder { public void findName(String paragraph) throws IOException { InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStream); NameFinderME nameFinder = new NameFinderME(model); String[] tokens = tokenize(paragraph); Span nameSpans[] = nameFinder.find(tokens); for(Span s: nameSpans) System.out.println(tokens[s.getStart()]); } public String[] tokenize(String sentence) throws IOException{ InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); TokenizerME tokenizer = new TokenizerME(tokenModel); return tokenizer.tokenize(sentence); } }

Finding Location Name using Apache OpenNLP

Similar to name finder, following is an example to identify location from a text using OpenNLP

package com.devglan; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; import java.io.IOException; import java.io.InputStream; /** * Created by only2dhir on 15-07-2017. */ public class LocationFinder { public void findLocation(String paragraph) throws IOException { InputStream inputStreamNameFinder = getClass().getResourceAsStream("/en-ner-location.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); NameFinderME locFinder = new NameFinderME(model); String[] tokens = tokenize(paragraph); Span nameSpans[] = locFinder.find(tokens); for(Span span : nameSpans) System.out.println("Position - "+ span.toString() + " LocationName - " + tokens[span.getStart()]); } public String[] tokenize(String sentence) throws IOException{ InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); TokenizerME tokenizer = new TokenizerME(tokenModel); return tokenizer.tokenize(sentence); } }

Testing the Application

Following are some test cases to detect named entities using apache OpenNLP.

NERTester.java
package com.devglan; import org.junit.Test; /** * Created by only2dhir on 15-07-2017. */ public class NERTester { @Test public void nameFinderTest() throws Exception{ NameFinder nameFinder = new NameFinder(); nameFinder.findName("Where is Charlie and Mike."); } @Test public void locationFinderTest() throws Exception{ LocationFinder locFinder = new LocationFinder(); locFinder.findLocation("Charlie is in California but I don't about Mike."); } }

Output

open-nlp-ner-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download the source

References

OpenNLp Manual

OpenNLP Models

Wiki NER

Suggest more topics in suggestion section or write your own article and share with your colleagues.

Is this page helpful to you? Please give us your feedback below. We would love to hear your thoughts on these articles, it will help us improve further our learning process.

Further Reading: