Sequence Classification (or Text Classification) is the NLP task of predicting a label for a sequence of words.
For example, a string of That movie was terrible because the acting was bad
could be tagged with a label of negative
. A string of That movie was great because the acting was good
could be tagged with a label of positive
.
A model that can predict sentiment from text is called a sentiment classifier, which is an example of a sequence classification model.
Below, we'll walk through how we can use AdaptNLP's EasySequenceClassification module to easily do the following:
- Load pre-trained models and tag data using mini-batched inference
- Train and fine-tune a pre-trained model on your own dataset
- Evaluate your model
We'll first get started by importing the EasySequenceClassifier class from AdaptNLP and instantiating the
EasySequenceClassifier
class object.
from adaptnlp import EasySequenceClassifier
from pprint import pprint
classifier = EasySequenceClassifier()
With this class we can dynamically load models to run on inference.
Let's use the HFModelHub
to search for some pre-trained sequence classification models to use:
from adaptnlp import HFModelHub
hub = HFModelHub()
We can either seach by task or by model name. Below is an example of the associated models HuggingFace has come out with:
hub.search_model_by_task('text-classification')
For this example though we will tag some text with a model that NLP Town has trained called nlptown/bert-base-multilingual-uncased-sentiment
. Let's find it in the model hub:
model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; model
This is a multi-lingual model that predicts how many stars (1-5) a text review has given a product. More information can be found via. the Transformers model card here
Next we can perform some inference. First let's write some example text:
example_text = "This didn't work at all"
Then we can tell our classifier to tag some text with tag_text
:
sentences = classifier.tag_text(
text=example_text,
model_name_or_path=model,
mini_batch_size=1
)
Now let's look at our outputs:
It's easy to pass in multiple sentences at once as well (in an array). Let's try that now:
multiple_text = ["This didn't work well at all.",
"I really liked it.",
"It was really useful.",
"It broke after I bought it."]
We'll pass it into the classifier
just like before:
sentences = classifier.tag_text(
text=multiple_text,
model_name_or_path=model,
mini_batch_size=2
)
And we can check the outputs again:
mini_batch_size
parameter to run mini-batch inference against your data for faster run time.You can set model_name_or_path
to any of Transformer's or Flair's pre-trained sequence classification models.
Let's tag some text with another model, specifically Oliver Guhr's German sentiment model called oliverguhr/german-sentiment-bert
.
First we'll write some german text:
german_text = ["Das hat überhaupt nicht gut funktioniert.",
"Ich mochte es wirklich.",
"Es war wirklich nützlich.",
"Es ist kaputt gegangen, nachdem ich es gekauft habe."]
And then tag it:
sentences = classifier.tag_text(
german_text,
model_name_or_path="oliverguhr/german-sentiment-bert",
mini_batch_size=1
)
ModelHub
classes, or you can directly pass in the string to the model you wantLet's look at the output:
Don't forget you can still quickly run inference with the multi-lingual review sentiment model you loaded in earlier (memory permitting)! Just change the model_name_or_path
param to the model you used before.