Stanza bad performance on Named Entity Identification

Ask Question

Asked 1 year, 2 months ago

Modified 1 year, 2 months ago

Viewed 37 times

I'm using both SpaCy and Stanza to identify named entities in very short string (brand names and business names):

   # BUILDING THE MODELS

#-----stanza
sen = stanza.Pipeline ("en")
smlp = stanza.MultilingualPipeline()

#----spacy
spl = spacy.load(en_core_web_lg-3.7.1)
spt = spacy.load(en_core_web_trf-3.7.3)

# TESTING THE MODELS
name = 'The Port of Peri Peri'
print(name)

# spacy
print('\n SPACY------------------------------------------------------------------')
print('spacy spl')
doc = spl(name)
for token in doc:
    print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_)
print('-----------------')

print('spacy trf')
doc = spt(name)
for token in doc:
    print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_)

#stanza
print('\n STANZA----------------------------------------------------------------')
print('stanza sen')
doc = sen(name)
for sent in doc.sentences:
    for token in sent.tokens:
        for word in token.words:
            print(word.text, word.xpos, word.upos, word.deprel, token.ner)
print('-----------------')

print('stanza smlp')
doc = smlp(name)
for sent in doc.sentences:
    for token in sent.tokens:
        for word in token.words:
            print(word.text, word.xpos, word.upos, word.deprel, token.ner)
print('-----------------')

Output:

The Port of Peri Peri

 SPACY------------------------------------------------------------------
spacy spl
The False Xxx DT DET det ORG B
Port False Xxxx NNP PROPN ROOT ORG I
of False xx IN ADP prep ORG I
Peri False Xxxx NNP PROPN compound ORG I
Peri False Xxxx NNP PROPN pobj ORG I
-----------------
spacy trf
The True Xxx DT DET det FAC B
Port True Xxxx NNP PROPN ROOT FAC I
of True xx IN ADP prep FAC I
Peri True Xxxx NNP PROPN compound FAC I
Peri True Xxxx NNP PROPN pobj FAC I

 STANZA----------------------------------------------------------------
stanza sen
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------
stanza smlp
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------

So, if you look at the ends of the output lines, while SpaCy identifies the place as ORGanization or FACility, Stanza identifies it as PERSON, which is rediculous: Person names do not usually start with "The" or have "of" in them. My question is, is there any improvements I can make to the Stanza models or is this as good as they get?

edited Sep 24, 2024 at 17:08

asked Sep 24, 2024 at 16:07

LearningScholar

2944 silver badges12 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Stanza bad performance on Named Entity Identification

0

Your Answer

Hot Network Questions

Stanza bad performance on Named Entity Identification

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions