Standford NLP Named Entity Recognition (Maven + Eclipse)

By Dhiraj Ray, 16 July,2017  

In this article we will be discussing about Standford NLP Named Entity Recognition(NER) in a java project using Maven and Eclipse. The example shown here will be using different annotators such as tokenize, ssplit, pos, lemma, ner to create StanfordCoreNLP pipelines and run NamedEntityTagAnnotation on the input text for named entity recognition using standford NLP.

Annotations and Annotator in Standford NLP

Annotations are internal data structures of Standford NLP that holds results of annotators whereas Annotators are like functions, except that they operate over Annotations instead of Objects.They do things like tokenize, parse, or NER tag sentences. Annotators and Annotations are integrated by AnnotationPipelines, which create sequences of generic Annotators.There are many annotators provided by Standford. For complete list visit - Standford CoreNLP Annotators

Other NLP Articles Apache OpenNLP Named Entity Recognition Example Apache OpenNLP Maven Eclipse Example Standford NLP Maven Example OpenNLP POS Tagger Example Standford NLP POS Tagger Example

Maven Dependencies for Standford NLP

<dependencies> <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.5.0</version> </dependency> <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.5.0</version> <classifier>models</classifier> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies>

Initializing StanfordCoreNLP using Annotators

StanfordCoreNLP is initiliazed using a set of properties.These properties contain different annotators such as tokenize, ssplit, pos, lemma, ner. Following is an example.

Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

It is also possible to give other properties to CoreNLP. Following is an example.

StanfordCoreNLP pipeline = new StanfordCoreNLP( PropertiesUtils.asProperties( "annotators", "tokenize,ssplit,pos,lemma,parse,natlog", "ssplit.isOneSentence", "true", "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz", "tokenize.language", "en"));

Running Annotators in Standford NLP

Once the pipeline is initialized, we basically run different annotators provided by Standford on any piece of text to extract corresponding information. Following is an example.

Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

Interpreting the output

After these annotators are executed on the text, we basically interpret the information. All these informations are available in the annotations provided by Standford. Following is an example to interpret different sentences of a text after applying SentenceAnnotation.

List sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

Named Entity Recognition using Standford NLP

After the sentences are extracted, we first tokenise the sentence and then extract named entities from the tokens.Following is an example.

for (CoreMap sentence : sentences) { for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class); if (!inEntity) { if (!"O".equals(ne)) { inEntity = true; currentEntity = ""; currentEntityType = ne; } } if (inEntity) { if ("O".equals(ne)) { inEntity = false; switch (currentEntityType) { case "PERSON": System.out.println("Extracted Person - " + currentEntity.trim()); break; case "ORGANIZATION": System.out.println("Extracted Organization - " + currentEntity.trim()); break; case "LOCATION": System.out.println("Extracted Location - " + currentEntity.trim()); break; case "DATE": System.out.println("Extracted Date " + currentEntity.trim()); break; } }else{ currentEntity += " " + token.originalText(); } } } }

Output

For a sample text such as Charlie is working as Software Engineer in CenturyLink India Pvt. Ltd., Bangalore from October, 2014 to till date, following is the output.

standford-nlp-ner-output

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

References

Standford CoreNLP

Standford Annotators

Stanford Tagger DOC

Suggest more topics in suggestion section or write your own article and share with your colleagues.

Is this page helpful to you? Please give us your feedback below. We would love to hear your thoughts on these articles, it will help us improve further our learning process.

Further Reading: