1
$\begingroup$

I'm using both SpaCy and Stanza to identify named entities in very short string (brand names and business names):

   # BUILDING THE MODELS

#-----stanza
sen = stanza.Pipeline ("en")
smlp = stanza.MultilingualPipeline()

#----spacy
spl = spacy.load(en_core_web_lg-3.7.1)
spt = spacy.load(en_core_web_trf-3.7.3)

# TESTING THE MODELS
name = 'The Port of Peri Peri'
print(name)

# spacy
print('\n SPACY------------------------------------------------------------------')
print('spacy spl')
doc = spl(name)
for token in doc:
    print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_)
print('-----------------')

print('spacy trf')
doc = spt(name)
for token in doc:
    print(token.text, token.is_oov, token.shape_, token.tag_, token.pos_, token.dep_, token.ent_type_, token.ent_iob_)

#stanza
print('\n STANZA----------------------------------------------------------------')
print('stanza sen')
doc = sen(name)
for sent in doc.sentences:
    for token in sent.tokens:
        for word in token.words:
            print(word.text, word.xpos, word.upos, word.deprel, token.ner)
print('-----------------')

print('stanza smlp')
doc = smlp(name)
for sent in doc.sentences:
    for token in sent.tokens:
        for word in token.words:
            print(word.text, word.xpos, word.upos, word.deprel, token.ner)
print('-----------------')

Output:

The Port of Peri Peri

 SPACY------------------------------------------------------------------
spacy spl
The False Xxx DT DET det ORG B
Port False Xxxx NNP PROPN ROOT ORG I
of False xx IN ADP prep ORG I
Peri False Xxxx NNP PROPN compound ORG I
Peri False Xxxx NNP PROPN pobj ORG I
-----------------
spacy trf
The True Xxx DT DET det FAC B
Port True Xxxx NNP PROPN ROOT FAC I
of True xx IN ADP prep FAC I
Peri True Xxxx NNP PROPN compound FAC I
Peri True Xxxx NNP PROPN pobj FAC I

 STANZA----------------------------------------------------------------
stanza sen
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------
stanza smlp
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------

So, if you look at the ends of the output lines, while SpaCy identifies the place as ORGanization or FACility, Stanza identifies it as PERSON, which is rediculous: Person names do not usually start with "The" or have "of" in them. My question is, is there any improvements I can make to the Stanza models or is this as good as they get?

$\endgroup$

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.