OpenNLP Named Entity Recognition Example(Maven + Eclipse)

OpenNLP Named Entity Recognition Example(Maven + Eclipse) thumbnail
21K
By Dhiraj 16 July, 2017

In his article we will be discussing about OpenNLP named entity recognition(NER) with maven and eclipse project. We will be using NameFinderME class provided by OpenNLP for NER with different pre-trained model files such as en-ner-location.bin, en-ner-person.bin, en-ner-organization.bin.

What is Named Entity Recognition

As per wiki, Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Eclipse Project Structure

apache-open-nlp-ner-strct

Maven Dependency

<dependencies>
		<dependency>
            <groupId>org.apache.opennlp</groupId>
            <artifactId>opennlp-tools</artifactId>
			<version>1.8.1</version>
        </dependency>
		
		<dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
        </dependency>
	</dependencies>
 Other NLP Articles
Standford NLP Named Entity Recognition
Apache OpenNLP Maven Eclipse Example
Standford NLP Maven Example
OpenNLP POS Tagger Example
Standford NLP POS Tagger Example

Apache OpenNLP Named Entity Recognition

There are many pre-trained model objects provided by OpenNLP such as en-ner-person.bin,en-ner-location.bin, en-ner-organization.bin, en-ner-time.bin etc to detect named entity such as person, locaion, organization etc from a piece of text. The complete list of pre-trained model objects can be found here.

There is a common way provided by OpenNLP to detect all these named entities.First, we need to load the pre-trained models and then instantiate TokenNameFinderModel object. Following is an example.

 InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin");
 TokenNameFinderModel model = new TokenNameFinderModel(inputStream);
 

After this we need to initialise NameFinderME class and use find() method to find the respective entities. This method requires tokens of a text to find named entities, hence we first require to tokenise the text.Following is an example.

NameFinderME nameFinder = new NameFinderME(model);
String[] tokens = tokenize(paragraph);

Span nameSpans[] = nameFinder.find(tokens);

Finding Names Using OpenNLP

Based on the above undestanding, following is the complete code to find names from a text using OpenNLP.

package com.devglan;

import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;

import java.io.IOException;
import java.io.InputStream;

/**
 * Created by only2dhir on 15-07-2017.
 */
public class NameFinder {

    public void findName(String paragraph) throws IOException {
        InputStream inputStream = getClass().getResourceAsStream("/en-ner-person.bin");
        TokenNameFinderModel model = new TokenNameFinderModel(inputStream);
        NameFinderME nameFinder = new NameFinderME(model);
        String[] tokens = tokenize(paragraph);

        Span nameSpans[] = nameFinder.find(tokens);
        for(Span s: nameSpans)
            System.out.println(tokens[s.getStart()]);
    }

    public String[] tokenize(String sentence) throws IOException{
        InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin");
        TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer);
        TokenizerME tokenizer = new TokenizerME(tokenModel);
        return tokenizer.tokenize(sentence);
    }
}

Finding Location Name using Apache OpenNLP

Similar to name finder, following is an example to identify location from a text using OpenNLP

package com.devglan;

import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;

import java.io.IOException;
import java.io.InputStream;

/**
 * Created by only2dhir on 15-07-2017.
 */
public class LocationFinder {

    public void findLocation(String paragraph) throws IOException {
        InputStream inputStreamNameFinder = getClass().getResourceAsStream("/en-ner-location.bin");
        TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder);

        NameFinderME locFinder = new NameFinderME(model);
        String[] tokens = tokenize(paragraph);

        Span nameSpans[] = locFinder.find(tokens);
        for(Span span : nameSpans)

            System.out.println("Position - "+ span.toString() + "    LocationName - " + tokens[span.getStart()]);
    }
    public String[] tokenize(String sentence) throws IOException{
        InputStream inputStreamTokenizer = getClass().getResourceAsStream("/en-token.bin");
        TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer);

        TokenizerME tokenizer = new TokenizerME(tokenModel);
        return tokenizer.tokenize(sentence);
    }


}

Testing the Application

Following are some test cases to detect named entities using apache OpenNLP.

NERTester.java
package com.devglan;

import org.junit.Test;

/**
 * Created by only2dhir on 15-07-2017.
 */
public class NERTester {

    @Test
    public void nameFinderTest() throws Exception{
        NameFinder nameFinder = new NameFinder();
        nameFinder.findName("Where is Charlie and Mike.");
    }

    @Test
    public  void locationFinderTest() throws Exception{
        LocationFinder locFinder = new LocationFinder();
        locFinder.findLocation("Charlie is in California but I don't about Mike.");

    }
}

Output

open-nlp-ner-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download source

Share

If You Appreciate This, You Can Consider:

We are thankful for your never ending support.

About The Author

author-image
A technology savvy professional with an exceptional capacity to analyze, solve problems and multi-task. Technical expertise in highly scalable distributed systems, self-healing systems, and service-oriented architecture. Technical Skills: Java/J2EE, Spring, Hibernate, Reactive Programming, Microservices, Hystrix, Rest APIs, Java 8, Kafka, Kibana, Elasticsearch, etc.

Further Reading on Artificial Intelligence