Named Entity Recognition (NER) Example with SpaCy

       Named Entity Recognition (NER) is a method that extracts entities from text and categorizes them into predefined classes. These entities can include individuals, locations, dates, monetary values, and more. NER plays an important role in information extraction, transforming raw text into structured data.

    In this blog post, we'll learn the concept of NER and its application with SpaCy library in Python. The tutorial covers:
  1. The concept of NER
  2. NER implementation with Spacy
  3. Visualizing NER results
  4. Conclusion

     Let's get started.

 

The concept of NER  

    In the field of NLP, a key challenge is teaching machines to understand and extract meaningful information from text. NER addresses this challenge by spotting and categorizing entities with specific names. It involves identifying and classifying entities, such as people's names, organizations, locations, dates, and more, within unstructured text.

    NER often involves the use of trained models. NER is a supervised machine learning task, and the typical approach is to train a model on labeled datasets that contain examples of text with annotated named entities.

     NER is used in a wide range of applications, from information retrieval and question-answering systems to chatbots and content categorization. It is a cornerstone for understanding of unstructured text data.


Implementation of NER with Python

    Let's delve into the practical implementation of NER using 'SpaCy' library. If you haven’t installed SpaCy and ist library yet, you can do so using below commands: 

 
pip install spacy 
python -m spacy download en_core_web_sm 
 

    Once installed, we load SpaCy and the 'en_core_web_sm' model, which is a small English language model pre-trained by SpaCy as shown below example. Then we process a given text with Spacy and extract name entities. The doc.ents attribute provides access to the named entities recognized in the processed text, along with their associated entity types.

 
import spacy

# Load the English language model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = """John Doe has been working for Abcd Inc. as
a senior engineer since 2010 in California."""
 
# Process the text with spaCy
doc = nlp(text)

# Extract named entities
entities = [(ent.text, ent.label_) for ent in doc.ents]

# Print the named entities
print("Named Entities:")
for entity in entities:
print(f"{entity[0]}: {entity[1]}")

 

The output for the extracted entities is as follows:

 
Named Entities: John Doe: PERSON Abcd Inc.: ORG 2010: DATE California: GPE 
 
 
The entities in this result are identified and categorized into different types. "PERSON" refers to the names of people, "ORG" refers to names of companies, institutions, or any organized entity, "DATE" refers to expressions that represent dates or periods, and "GPE" refers to names of countries, cities, or states.
 
    Note that SpaCy provides larger language models like 'en_core_web_md' and 'en_core_web_lg' with more extensive vocabulary and potentially improved performance for certain tasks. You can choose the model based on your specific requirements.
 

Visualizing NER Results 

    SpaCy provides a visual way to understand NER results using displaCy. Below example shows how to visualize the extracted entities. In the serve() method, you can set any available port and view it in a browser.
 
 
from spacy import displacy

# Visualize NER
displacy.serve(doc, style="ent", port=7002)
 

Conclusion
 
    In this tutorial, we've explored the fundamentals of Named Entity Recognition and demonstrated its application using spaCy. NER plays a pivotal role in enhancing the understanding of textual data, and spaCy's user-friendly interface makes it accessible for developers across various skill levels.
 
 
References:
  1. SpaCy Documentation: https://spacy.io/
  2. Named Entity Recognition with spaCy: https://spacy.io/usage/linguistic-features#named-entities



No comments:

Post a Comment