Performing Sequence Classification with AdaptNLP
 

Sequence Classification (or Text Classification) is the NLP task of predicting a label for a sequence of words.

For example, a string of That movie was terrible because the acting was bad could be tagged with a label of negative. A string of That movie was great because the acting was good could be tagged with a label of positive.

A model that can predict sentiment from text is called a sentiment classifier, which is an example of a sequence classification model.

Below, we'll walk through how we can use AdaptNLP's EasySequenceClassification module to easily do the following:

  1. Load pre-trained models and tag data using mini-batched inference
  2. Train and fine-tune a pre-trained model on your own dataset
  3. Evaluate your model

Loading Pretrained Models and Tag Data using Mini-Batched Inference

We'll first get started by importing the EasySequenceClassifier class from AdaptNLP and instantiating the EasySequenceClassifier class object.

from adaptnlp import EasySequenceClassifier
from pprint import pprint

classifier = EasySequenceClassifier()

With this class we can dynamically load models to run on inference.

Let's use the HFModelHub to search for some pre-trained sequence classification models to use:

from adaptnlp.model_hub import HFModelHub
hub = HFModelHub()

We can either seach by task or by model name. Below is an example of the associated models HuggingFace has come out with:

hub.search_model_by_task('text-classification')
[Model Name: distilbert-base-uncased-finetuned-sst-2-english, Tasks: [text-classification],
 Model Name: roberta-base-openai-detector, Tasks: [text-classification],
 Model Name: roberta-large-mnli, Tasks: [text-classification],
 Model Name: roberta-large-openai-detector, Tasks: [text-classification]]

For this example though we will tag some text with a model that NLP Town has trained called nlptown/bert-base-multilingual-uncased-sentiment. Let's find it in the model hub:

model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; model
Model Name: nlptown/bert-base-multilingual-uncased-sentiment, Tasks: [text-classification]

This is a multi-lingual model that predicts how many stars (1-5) a text review has given a product. More information can be found via. the Transformers model card here

Next we can perform some inference. First let's write some example text:

example_text = "This didn't work at all"

Then we can tell our classifier to tag some text with tag_text:

sentences = classifier.tag_text(
    text=example_text,
    model_name_or_path=model,
    mini_batch_size=1
)
2021-04-20 19:15:43,548 loading file nlptown/bert-base-multilingual-uncased-sentiment
/opt/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2073: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
  warnings.warn(

Now let's look at our outputs:

print("Tag Score Outputs:\n")
for sentence in sentences:
    pprint({sentence.to_original_text(): sentence.labels})
Tag Score Outputs:

{"This didn't work at all": [1 star (0.8421),
                             2 stars (0.1379),
                             3 stars (0.018),
                             4 stars (0.0012),
                             5 stars (0.0007)]}

It's easy to pass in multiple sentences at once as well (in an array). Let's try that now:

multiple_text = ["This didn't work well at all.",
                 "I really liked it.",
                 "It was really useful.",
                 "It broke after I bought it."]

We'll pass it into the classifier just like before:

sentences = classifier.tag_text(
    text=multiple_text,
    model_name_or_path=model,
    mini_batch_size=2
)

And we can check the outputs again:

print("Tag Score Outputs:\n")
for sentence in sentences:
    pprint({sentence.to_original_text(): sentence.labels})
Tag Score Outputs:

{"This didn't work well at all.": [1 star (0.622),
                                   2 stars (0.3356),
                                   3 stars (0.0403),
                                   4 stars (0.0016),
                                   5 stars (0.0005)]}
{'I really liked it.': [1 star (0.0032),
                        2 stars (0.0048),
                        3 stars (0.054),
                        4 stars (0.4813),
                        5 stars (0.4567)]}
{'It was really useful.': [1 star (0.006),
                           2 stars (0.0093),
                           3 stars (0.0701),
                           4 stars (0.4136),
                           5 stars (0.501)]}
{'It broke after I bought it.': [1 star (0.4489),
                                 2 stars (0.3935),
                                 3 stars (0.1416),
                                 4 stars (0.0121),
                                 5 stars (0.0039)]}

You can set model_name_or_path to any of Transformer's or Flair's pre-trained sequence classification models.

Let's tag some text with another model, specifically Oliver Guhr's German sentiment model called oliverguhr/german-sentiment-bert.

First we'll write some german text:

german_text = ["Das hat überhaupt nicht gut funktioniert.",
               "Ich mochte es wirklich.",
               "Es war wirklich nützlich.",
               "Es ist kaputt gegangen, nachdem ich es gekauft habe."]

And then tag it:

sentences = classifier.tag_text(
    german_text,
    model_name_or_path="oliverguhr/german-sentiment-bert",
    mini_batch_size=1
)

Let's look at the output:

print("Tag Score Outputs:\n")
for sentence in sentences:
    pprint({sentence.to_original_text(): sentence.labels})
Tag Score Outputs:

{'Das hat überhaupt nicht gut funktioniert.': [positive (0.0008),
                                               negative (0.9991),
                                               neutral (0.0)]}
{'Ich mochte es wirklich.': [positive (0.7023),
                             negative (0.2029),
                             neutral (0.0947)]}
{'Es war wirklich nützlich.': [positive (0.9813),
                               negative (0.0184),
                               neutral (0.0002)]}
{'Es ist kaputt gegangen, nachdem ich es gekauft habe.': [positive (0.0042),
                                                          negative (0.9957),
                                                          neutral (0.0001)]}

Don't forget you can still quickly run inference with the multi-lingual review sentiment model you loaded in earlier (memory permitting)! Just change the model_name_or_path param to the model you used before.

Let's release the german sentiment model to free up some memory for our next step...training!

classifier.release_model(model_name_or_path="oliverguhr/german-sentiment-bert")

Train and Fine-Tune a Pre-Trained Model on Your Own Dataset

Let's imagine you have your own dataset with text/label pairs you'd like to create a sequence classification model for.

With the easy sequence classifier, you can take advantage of transfer learning by fine-tuning pre-trained models on your own custom datasets.

Note: The EasySequenceClassifier is integrated heavily with the datasets.Dataset and transformers.Trainer class objects, so please check out the datasets and transformers documentation for more information.

We'll first need a "custom" dataset to start training our model. Our EasySequenceClassifier.train()) method can run with either datasets.Dataset objects or CSV data file paths. Since the datasets library makes it so easy, we'll use the datasets.load_dataset() method to load in the IMDB Sentiment dataset. We'll show an example with a CSV later.

from datasets import load_dataset

train_dataset, eval_dataset = load_dataset('imdb', split=['train[:1%]', 'test[:1%]'])

# Uncomment below if you want to use all the data so you don't spend an hour+ on training and evaluation
#train_dataset, eval_dataset = load_dataset('imdb', split=['train', 'test'])

pprint(vars(train_dataset.info))
Reusing dataset imdb (/root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/90099cb476936b753383ba2ae6ab2eae419b2e87f71cd5189cb9c8e5814d12a3)
{'builder_name': 'imdb',
 'citation': '@InProceedings{maas-EtAl:2011:ACL-HLT2011,\n'
             '  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  '
             'Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, '
             'Christopher},\n'
             '  title     = {Learning Word Vectors for Sentiment Analysis},\n'
             '  booktitle = {Proceedings of the 49th Annual Meeting of the '
             'Association for Computational Linguistics: Human Language '
             'Technologies},\n'
             '  month     = {June},\n'
             '  year      = {2011},\n'
             '  address   = {Portland, Oregon, USA},\n'
             '  publisher = {Association for Computational Linguistics},\n'
             '  pages     = {142--150},\n'
             '  url       = {http://www.aclweb.org/anthology/P11-1015}\n'
             '}\n',
 'config_name': 'plain_text',
 'dataset_size': 133190346,
 'description': 'Large Movie Review Dataset.\n'
                'This is a dataset for binary sentiment classification '
                'containing substantially more data than previous benchmark '
                'datasets. We provide a set of 25,000 highly polar movie '
                'reviews for training, and 25,000 for testing. There is '
                'additional unlabeled data for use as well.',
 'download_checksums': {'http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz': {'checksum': 'c40f74a18d3b61f90feba1e17730e0d38e8b97c05fde7008942e91923d1658fe',
                                                                                           'num_bytes': 84125825}},
 'download_size': 84125825,
 'features': {'label': ClassLabel(num_classes=2, names=['neg', 'pos'], names_file=None, id=None),
              'text': Value(dtype='string', id=None)},
 'homepage': 'http://ai.stanford.edu/~amaas/data/sentiment/',
 'license': '',
 'post_processed': None,
 'post_processing_size': None,
 'size_in_bytes': 217316171,
 'splits': {'test': SplitInfo(name='test', num_bytes=32650697, num_examples=25000, dataset_name='imdb'),
            'train': SplitInfo(name='train', num_bytes=33432835, num_examples=25000, dataset_name='imdb'),
            'unsupervised': SplitInfo(name='unsupervised', num_bytes=67106814, num_examples=50000, dataset_name='imdb')},
 'supervised_keys': None,
 'version': 1.0.0}

Let's take a brief look at what the IMDB Sentiment dataset looks like. We can see that the label column has two classes of 0 and 1. You can see the name of the classes mapped to the integers with train_dataset.features["names"].

train_dataset.set_format(type="pandas", columns=["text", "label"])
train_dataset[:]
label text
0 1 Bromwell High is a cartoon comedy. It ran at t...
1 1 Homelessness (or Houselessness as George Carli...
2 1 Brilliant over-acting by Lesley Ann Warren. Be...
3 1 This is easily the most underrated film inn th...
4 1 This is not the typical Mel Brooks film. It wa...
... ... ...
245 1 That hilarious line is typical of what these n...
246 1 Faith and Mortality... viewed through the lens...
247 1 The unlikely duo of Zero Mostel and Harry Bela...
248 1 *some spoilers*<br /><br />I was pleasantly su...
249 1 ... and I DO mean it. If not literally (after ...

250 rows × 2 columns

Let's reformat it back into a more "pythonic" dataset:

train_dataset.set_format(columns=["text", "label"])

Uncomment below to see training done with CSV files. The cell below will just save the datasets.Dataset objects you have in train_dataset and eval_dataset as CSVs and will train the model with the CSV file paths. Ignore to just continue to training.

#eval_dataset.set_format(type="pandas", columns=["text", "label"])

#train_dataset[:].to_csv("./IMDB train.csv", index=False)
#eval_dataset[:].to_csv("./IMDB eval.csv", index=False)

#train_dataset = "./IMDB train.csv"
#eval_dataset = "./IMDB eval.csv"

One of the first things we'll need to specify before we start training are the training arguments. Training arguments consist mainly of the hyperparameters we want to provide the model. These may include batch size, initial learning rate, number of epochs, etc.

We will be using the transformers.TrainingArguments data class to store our training args. These are compatible with the transformers.Trainer as well as AdaptNLP's train methods. For more documention on the TrainingArguments class, please look here. There are a lot of arguments available, but we will pass in the important args and use default values for the rest.

The training arguments below specify the output directory for you model and checkpoints.

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./models',
    num_train_epochs=1,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    evaluation_strategy="steps",
    logging_dir='./logs',
    save_steps=100
)

Now we can run the built-in train() method by passing in the training arguments. The training method will also be where you specify your data arguments which include the your train and eval datasets, the pre-trained model ID (this should have been loaded from your earlier cells, but can be loaded dynamically), text column name, label column name, and ordered label names (only required if loading in paths to CSV data file for dataset args).

Please checkout AdaptNLP's package reference for more information here.

classifier.train(training_args=training_args,
                 train_dataset=train_dataset,
                 eval_dataset=eval_dataset,
                 model_name_or_path="nlptown/bert-base-multilingual-uncased-sentiment",
                 text_col_nm="text",
                 label_col_nm="label",
                 label_names=["positive","negative"]
                )

Evaluate your model

After training, you can evaluate the model with the eval dataset you passed in for training.

classifier.evaluate(model_name_or_path="nlptown/bert-base-multilingual-uncased-sentiment")
[63/63 00:02]
{'eval_loss': 0.017184646800160408,
 'eval_accuracy': 1.0,
 'eval_f1': array([1.]),
 'eval_precision': array([1.]),
 'eval_recall': array([1.]),
 'eval_runtime': 2.7201,
 'eval_samples_per_second': 91.91,
 'epoch': 1.0}

Now you can see it's a little weird that we're still using the model_name_or_path of the pre-trained model we fine-tuned and took advantage of via. transfer learning. We can release the model we've fine-tuned, and then load it back in using the directory that we've serialized the fine-tuned model.

classifier.release_model(model_name_or_path="nlptown/bert-base-multilingual-uncased-sentiment")
sentences = classifier.tag_text(
    multiple_text,
    model_name_or_path="./models",
    mini_batch_size=1
)

print("Tag Score Outputs:\n")
for sentence in sentences:
    pprint({sentence.to_original_text(): sentence.labels})
2021-04-20 19:43:56,203 loading file ./models
/opt/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2073: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
  warnings.warn(
Tag Score Outputs:

{"This didn't work well at all.": [neg (0.263), pos (0.737)]}
{'I really liked it.': [neg (0.1309), pos (0.8691)]}
{'It was really useful.': [neg (0.184), pos (0.816)]}
{'It broke after I bought it.': [neg (0.2716), pos (0.7284)]}

And we're done!