We'll import the adaptnlp
EasyTokenTagger
class:
from adaptnlp import EasyTokenTagger
from pprint import pprint
Let's write some simple example text, and instantiate an EasyTokenTagger
:
example_text = '''Novetta Solutions is the best. Albert Einstein used to be employed at Novetta Solutions.
The Wright brothers loved to visit the JBF headquarters, and they would have a chat with Albert.'''
tagger = EasyTokenTagger()
First we will use some Transformers
models, specifically bert
.
We'll search HuggingFace
for the model we want, in this case we want to use sshleifer
's tiny-dbmdz-bert
model:
from adaptnlp import HFModelHub
hub = HFModelHub()
model = hub.search_model_by_name('sshleifer/tiny-dbmdz-bert', user_uploaded=True)[0]; model
Next we'll use our tagger
to generate some sentences:
sentences = tagger.tag_text(text=example_text, model_name_or_path = model)
And then look at some of our results:
print("List string outputs of tags:\n")
for sen in sentences['tags']:
pprint(sen)
With Flair we can follow a similar setup to earlier, searching HuggingFace for valid ner
models. In our case we'll use Flair
's ner-english-ontonotes-fast
model
from adaptnlp import FlairModelHub
hub = FlairModelHub()
model = hub.search_model_by_name('ontonotes-fast')[0]; model
Then we'll tag the string:
sentences = tagger.tag_text(text = example_text, model_name_or_path = model)
And we can get back a JSON of each word and its entities:
pprint(sentences[0]['entities'][:5])
We can simply pass in "pos"
, but let's use our search API to find an english POS tagger:
hub.search_model_by_task('pos')
We'll use the pos-english-fast
model
model = hub.search_model_by_name('pos-english-fast')[0]; model
sentences = tagger.tag_text(text = example_text, model_name_or_path = model)
Then just as before, we get a JSON of our POS:
pprint(sentences[0]['entities'][:5])
models = hub.search_model_by_task('chunk'); models
We'll use the fast
model again:
model = models[0]; model
sentences = tagger.tag_text(text = example_text, model_name_or_path = model)
Let's view our results.
pprint(sentences[0]['entities'][:5])
models = hub.search_model_by_task("frame"); models
Again we will use the "fast" model:
model = models[0]; model
sentences = tagger.tag_text(text = example_text, model_name_or_path = model)
pprint(sentences[0]['entities'][:5])
Notice:Pay attention to the "fast" versus regular naming. "fast" models are designed to be extremely efficient on the CPU, and are worth checking out
Tag Tokens with All Loaded Models At Once
As different taggers are loaded into memory, we can tag with all of them at once, for example we'll make a new EasyTokenTagger
and load in a ner
and pos
tagger:
tagger = EasyTokenTagger()
_ = tagger.tag_text(text=example_text, model_name_or_path="flair/ner-english-ontonotes")
_ = tagger.tag_text(text=example_text, model_name_or_path="pos")
Before finally using both at once:
sentences = tagger.tag_all(text=example_text)
And now we can look at the entities tagged of each kind:
sentences[0][:5]