Tuning a base Language model on the IMDB dataset



## Introduction

In this tutorial we will be showing an end-to-end example of fine-tuning a Transformer language model on a custom dataset in CSV file format.

By the end of this you should be able to:

1. Build a dataset with the LanguageModelDatasets class, and their DataLoaders
2. Build a LanguageModelTuner quickly, find a good learning rate, and train with the One-Cycle Policy
3. Save that model away, to be used with deployment or other HuggingFace libraries
4. Apply inference using both the Tuner available function as well as with the EasyTextGenerator class within AdaptNLP

## Installing the Library

This tutorial utilizies the latest AdaptNLP version, as well as parts of the fastai library. Please run the below code to install them:

!pip install adaptnlp -U


(or pip3)

## Getting the Dataset

First we need a dataset. We will use the fastai library to download the IMDB_SAMPLE dataset, a subset of IMDB Movie Reviews.

from fastai.data.external import URLs, untar_data


URLs holds a namespace of many data endpoints, and untar_data is a function that can download and extract any data from a given URL.

data_path = untar_data(URLs.IMDB_SAMPLE)


If we look at what was downloaded, we will find a texts.csv file:

data_path.ls()

(#1) [Path('/root/.fastai/data/imdb_sample/texts.csv')]

This is our data we want to use. This CSV is formatted with a table of columns with label, text, and is_valid dictating whether it is part of the validation set or not.

Now that we have the dataset, and we know the format it is in, let's pick a viable model to train with

## Picking a Model with the Hub

AdaptNLP has a HFModelHub class that allows you to communicate with the HuggingFace Hub and pick a model from it, as well as a namespace HF_TASKS class with a list of valid tasks we can search by.

Let's try and find one suitable for sequence classification.

First we need to import the class and generate an instance of it:

from adaptnlp import HFModelHub, HF_TASKS

hub = HFModelHub()


Next we can search for a model:

models = hub.search_model_by_task(HF_TASKS.TEXT_GENERATION)


Let's look at a few:

models[:10]

[Model Name: distilgpt2, Tasks: [text-generation],
Model Name: xlnet-large-cased, Tasks: [text-generation]]

These are models specifically tagged with the text-generation tag, so you may not see a few models you would expect such as bert_base_cased.

We'll use that first model, distilgpt2:

model = models[0]

model

Model Name: distilgpt2, Tasks: [text-generation]

Now that we have picked a model, let's use the data API to prepare our data

## Building TaskDatasets with LanguageModelDatasets

Each task has a high-level data wrapper around the TaskDatasets class. In our case this is the LanguageModelDatasets class:

from adaptnlp import LanguageModelDatasets


There are multiple different constructors for the LanguageModelDatasets class, and you should never call the main constructor directly.

We will be using from_csvs, which wraps around the from_dfs constructor:

#### LanguageModelDatasets.from_csvs[source]

LanguageModelDatasets.from_csvs(train_csv:Path, text_col:str, tokenizer_name:str, block_size:int=128, masked_lm:bool=False, valid_csv:Path=None, split_func:callable=None, split_pct:float=0.1, tokenize_kwargs:dict={}, auto_kwargs:dict={}, **kwargs)

Builds LanguageModelDatasets from a single csv or set of csvs. A convience constructor for from_dfs

Parameters:

• train_csv : <class 'pathlib.Path'>

A training csv file

• text_col : <class 'str'>

The name of the text column

• tokenizer_name : <class 'str'>

The name of the tokenizer

• block_size : <class 'int'>, optional

The size of each block

• masked_lm : <class 'bool'>, optional

Whether the language model is a MLM

• valid_csv : <class 'pathlib.Path'>, optional

An optional validation csv

• split_func : <built-in function callable>, optional

Optionally a splitting function similar to RandomSplitter

• split_pct : <class 'float'>, optional

What % to split the df between training and validation

• tokenize_kwargs : <class 'dict'>, optional

kwargs for the tokenize function

• auto_kwargs : <class 'dict'>, optional

kwargs for the AutoTokenizer.from_pretrained constructor

• kwargs : <class 'inspect._empty'>

Anything you would normally pass to the tokenizer call (such as max_length, padding) should go in tokenize_kwargs, and anything going to the AutoTokenizer.from_pretrained constructor should be passed to the auto_kwargs.

In our case we only have a train_csv and we have a tokenizer name. We also want to split 90%/10% (which is the default)

Also, we will set a block_size of 128, and it is not a masked language model:

dsets = LanguageModelDatasets.from_csvs(
train_csv=data_path/'texts.csv',
text_col='text',
tokenizer_name=model.name,
block_size=128,
)

No value for max_length set, automatically adjusting to the size of the model and including truncation
Sequence length set to: 1024



And finally turn it into some AdaptiveDataLoaders.

These are just fastai's DataLoaders class, but it overrides a few functions to have it work nicely with HuggingFace's Dataset class

#### LanguageModelDatasets.dataloaders[source]

LanguageModelDatasets.dataloaders(batch_size=8, shuffle_train=True, collate_fn=default_data_collator, mlm_probability:float=0.15, path='.', device=None)

Build DataLoaders from self

Parameters:

• batch_size : <class 'int'>, optional

A batch size

• shuffle_train : <class 'bool'>, optional

Whether to shuffle the training dataset

• collate_fn : <class 'function'>, optional

A custom collation function

• mlm_probability : <class 'float'>, optional

• path : <class 'str'>, optional

• device : <class 'NoneType'>, optional

dls = dsets.dataloaders(batch_size=8)


Finally, let's view a batch of data with the show_batch function:

dls.show_batch()

Input Labels
0 2".<br /><br />It starts out trying to borrow its comic relief style of Star Wars, but mercifully (since the humor doesn't work) gives up on comedy and plays it serious. In that sense, it's superior to the Star Wars franchise, which started with a clever sense of humor, and eventually deteriorated to Jar-Jar's annoying silliness.<br /><br />The agricultural details were apparently drawn by someone who had never seen a farm. The harvester was driving through the unharvested middle of a field, dumping silage onto unharvested crops, rather than working from one side to the other and dumping the silage onto already-harvested rows or into a truck. Corn (maize) was pouring out the grain chute, but the farm lands were drawn like a wheat field.<br /><br />When it was time for Kim's father had to face his fate, there wasn't any dramatic weight to the scene. That could have been partly the fault of the English-language voice actor, but the drawings didn't show much weight either. Kim's reactions in that scene were similarly unconvincing.<br /><br />Similarly, when a character named Henderson was killed, Chris showed very little reaction, even though they were apparently supposed to have been close. (Henderson's death is no spoiler; his name isn't revealed until his death scene.) She seems to promptly forget him. Someone's expression of sympathy shows more feeling than she does. I think the voice actor deserves most of the blame in that case; there's at least a hint of feeling in the drawings of Chris.<br /><br />On several occasions, villains fail to accomplish their orders. A villain leader often punishes those failures with miserable deaths. I can't say whether that's lifted from Star Wars, or if that comes from an earlier source -- possibly the Lensman books.<br /><br />There's a scene where a space ship crash-lands. As it plunges toward the ground, parts are break off the ship. But so many pieces are fall off that there should be nothing left of it by the time it lands.<br /><br />While in most cases Chris seems like a competent, tough space hero, there's a scene where she shrieks like an incompetent damsel in distress. Someone tough enough to get over Henderson's death so quickly should at least be able to shout, "help, it's got me and I can't 2".<br /><br />It starts out trying to borrow its comic relief style of Star Wars, but mercifully (since the humor doesn't work) gives up on comedy and plays it serious. In that sense, it's superior to the Star Wars franchise, which started with a clever sense of humor, and eventually deteriorated to Jar-Jar's annoying silliness.<br /><br />The agricultural details were apparently drawn by someone who had never seen a farm. The harvester was driving through the unharvested middle of a field, dumping silage onto unharvested crops, rather than working from one side to the other and dumping the silage onto already-harvested rows or into a truck. Corn (maize) was pouring out the grain chute, but the farm lands were drawn like a wheat field.<br /><br />When it was time for Kim's father had to face his fate, there wasn't any dramatic weight to the scene. That could have been partly the fault of the English-language voice actor, but the drawings didn't show much weight either. Kim's reactions in that scene were similarly unconvincing.<br /><br />Similarly, when a character named Henderson was killed, Chris showed very little reaction, even though they were apparently supposed to have been close. (Henderson's death is no spoiler; his name isn't revealed until his death scene.) She seems to promptly forget him. Someone's expression of sympathy shows more feeling than she does. I think the voice actor deserves most of the blame in that case; there's at least a hint of feeling in the drawings of Chris.<br /><br />On several occasions, villains fail to accomplish their orders. A villain leader often punishes those failures with miserable deaths. I can't say whether that's lifted from Star Wars, or if that comes from an earlier source -- possibly the Lensman books.<br /><br />There's a scene where a space ship crash-lands. As it plunges toward the ground, parts are break off the ship. But so many pieces are fall off that there should be nothing left of it by the time it lands.<br /><br />While in most cases Chris seems like a competent, tough space hero, there's a scene where she shrieks like an incompetent damsel in distress. Someone tough enough to get over Henderson's death so quickly should at least be able to shout, "help, it's got me and I can't
3 I thought this film was alright; much better than I expected it to be. I was skeptical at first - the idea of a computer virus that can also infect people seemed a little ludicrous to me. But in the end, I thought the film handled the concept well (even if some scenes were a little clichéd).<br /><br />The cast was quite good, and the two leads seemed to take their roles very seriously. I couldn't help thinking, though, that Janine Turner is a bit of a Geena Davis look-a-like. Maybe it's just her face or the make-up, hair and clothes she had in this movie but it just kept nagging at the back of my mind the whole time.<br /><br />While it's not a'must see' or a great film by any standard, 'Fatal Error' is an entertaining flick that will keep you watching until the end.While I count myself as a fan of the Babylon 5 television series, the original movie that introduced the series was a weak start. Although many of the elements that would later mature and become much more compelling in the series are there, the pace of The Gathering is slow, the makeup somewhat inadequate, and the plot confusing. Worse, the characterization in the premiere episode is poor. Although the ratings chart shows that many fans are willing to overlook these problems, I remember The Gathering almost turned me off off what soon grew into a spectacular series.How unfortunate, to have so many of my "a" list, and good "b" list actors agree to do this movie, but they did, and that is what sucked me into watching it. I had never heard of this movie, but there was Cuba Gooding Jr. right on the DVD cover, and James Woods in the background how bad can it be? In a word Very! This movie starts o.k. has some twists and turns, then just lays an egg. The ending was so weak, it was as if the writer got called away and his 4 year old son sat down at the type writer and hacked out the ending. How ironic a for a movie titled "The end game" to have such a poor one. These are the types of movies that can move "a" list actors to the "b" list in hurry. I hope Cuba Gooding JR, and James Woods don't make a habit of this.A definite no. A resounding NO. This movie is an absolute dud.<br /><br />Having I thought this film was alright; much better than I expected it to be. I was skeptical at first - the idea of a computer virus that can also infect people seemed a little ludicrous to me. But in the end, I thought the film handled the concept well (even if some scenes were a little clichéd).<br /><br />The cast was quite good, and the two leads seemed to take their roles very seriously. I couldn't help thinking, though, that Janine Turner is a bit of a Geena Davis look-a-like. Maybe it's just her face or the make-up, hair and clothes she had in this movie but it just kept nagging at the back of my mind the whole time.<br /><br />While it's not a'must see' or a great film by any standard, 'Fatal Error' is an entertaining flick that will keep you watching until the end.While I count myself as a fan of the Babylon 5 television series, the original movie that introduced the series was a weak start. Although many of the elements that would later mature and become much more compelling in the series are there, the pace of The Gathering is slow, the makeup somewhat inadequate, and the plot confusing. Worse, the characterization in the premiere episode is poor. Although the ratings chart shows that many fans are willing to overlook these problems, I remember The Gathering almost turned me off off what soon grew into a spectacular series.How unfortunate, to have so many of my "a" list, and good "b" list actors agree to do this movie, but they did, and that is what sucked me into watching it. I had never heard of this movie, but there was Cuba Gooding Jr. right on the DVD cover, and James Woods in the background how bad can it be? In a word Very! This movie starts o.k. has some twists and turns, then just lays an egg. The ending was so weak, it was as if the writer got called away and his 4 year old son sat down at the type writer and hacked out the ending. How ironic a for a movie titled "The end game" to have such a poor one. These are the types of movies that can move "a" list actors to the "b" list in hurry. I hope Cuba Gooding JR, and James Woods don't make a habit of this.A definite no. A resounding NO. This movie is an absolute dud.<br /><br />Having

When training a language model, the input and output are made to be the exact same, so there isn't a shown noticable difference here.

## Building Tuner

Next we need to build a compatible Tuner for our problem. These tuners contain good defaults for our problem space, including loss functions and metrics.

First let's import the LanguageModelTuner and view it's documentation

from adaptnlp import LanguageModelTuner


## classLanguageModelTuner[source]

LanguageModelTuner(dls:DataLoaders, model_name, tokenizer=None, language_model_type:LMType='causal', loss_func=CrossEntropyLoss(), metrics=[<fastai.metrics.Perplexity object at 0x7faf54c22070>], opt_func=Adam, additional_cbs=None, expose_fastai_api=False, **kwargs) :: AdaptiveTuner

An AdaptiveTuner with good defaults for Language Model fine-tuning Valid kwargs and defaults:

• lr:float = 0.001
• splitter:function = trainable_params
• cbs:list = None
• path:Path = None
• model_dir:Path = 'models'
• wd:float = None
• wd_bn_bias:bool = False
• train_bn:bool = True
• moms: tuple(float) = (0.95, 0.85, 0.95)

Parameters:

• dls : <class 'fastai.data.core.DataLoaders'>

• model_name : <class 'inspect._empty'>

A HuggingFace model

• tokenizer : <class 'NoneType'>, optional

A HuggingFace tokenizer

• language_model_type : <class 'fastcore.basics.LMType'>, optional

The type of language model to use

• loss_func : <class 'fastai.losses.CrossEntropyLossFlat'>, optional

A loss function

• metrics : <class 'list'>, optional

Metrics to monitor the training with

• opt_func : <class 'function'>, optional

A fastai or torch Optimizer

• additional_cbs : <class 'NoneType'>, optional

Additional Callbacks to have always tied to the Tuner,

• expose_fastai_api : <class 'bool'>, optional

Whether to expose the fastai API

• kwargs : <class 'inspect._empty'>

Next we'll pass in our DataLoaders, the name of our model, and the tokenizer:

tuner = LanguageModelTuner(dls, model.name, dls.tokenizer)


By default we can see that it used CrossEntropyLoss as our loss function, and Perplexity as our metric

tuner.loss_func

FlattenedLoss of CrossEntropyLoss()
_ = [print(m.name) for m in tuner.metrics]

perplexity


Finally we just need to train our model!

## Fine-Tuning

To fine-tune, AdaptNLP's tuner class provides only a few functions to work with. The important ones are the tune and lr_find class.

As the Tuner uses fastai under the hood, lr_find calls fastai's Learning Rate Finder to help us pick a learning rate. Let's do that now:

#### AdaptiveTuner.lr_find[source]

AdaptiveTuner.lr_find(start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=valley)

Runs fastai's LR Finder

Parameters:

• start_lr : <class 'float'>, optional

• end_lr : <class 'int'>, optional

• num_it : <class 'int'>, optional

• stop_div : <class 'bool'>, optional

• show_plot : <class 'bool'>, optional

• suggest_funcs : <class 'function'>, optional

tuner.lr_find()

/opt/venv/lib/python3.8/site-packages/fastai/callback/schedule.py:270: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ro" (-> color='r'). The keyword argument will take precedence.
ax.plot(val, idx, 'ro', label=nm, c=color)

SuggestedLRs(valley=6.30957365501672e-05)

It recommends a learning rate of around 5e-5, so we will use that.

lr = 5e-5


Let's look at the documentation for tune:

#### AdaptiveTuner.tune[source]

AdaptiveTuner.tune(epochs:int, lr:float=None, strategy:Strategy='fit_one_cycle', callbacks:list=[], **kwargs)

Fine tune self.model for epochs with an lr and strategy

Parameters:

• epochs : <class 'int'>

Number of iterations to train for

• lr : <class 'float'>, optional

If None, finds a new learning rate and uses suggestion_method

• strategy : <class 'fastcore.basics.Strategy'>, optional

A fitting method

• callbacks : <class 'list'>, optional

Extra fastai Callbacks

• kwargs : <class 'inspect._empty'>

We can pass in a number of epochs, a learning rate, a strategy, and additional fastai callbacks to call.

Valid strategies live in the Strategy namespace class, and consist of:

from adaptnlp import Strategy


In this tutorial we will train with the One-Cycle policy, as currently it is one of the best schedulers to use.

tuner.tune(3, lr, strategy=Strategy.OneCycle)

epoch train_loss valid_loss perplexity time
0 4.061049 3.879425 48.396393 00:55
1 3.973648 3.857744 47.358402 00:54
2 3.900359 3.858645 47.401066 00:54

## Saving Model

Now that we have a trained model, let's save those weights away.

Calling tuner.save will save both the model and the tokenizer in the same format as how HuggingFace does:

#### AdaptiveTuner.save[source]

AdaptiveTuner.save(save_directory)

Save a pretrained model to a save_directory

Parameters:

• save_directory : <class 'inspect._empty'>

A folder to save our model to

tuner.save('good_model')

'good_model'

## Performing Inference

There are two ways to get predictions, the first is with the .predict method in our tuner. This is great for if you just finished training and want to see how your model performs on some new data! The other method is with AdaptNLP's inference API, which we will show afterwards

### In Tuner

First let's write a sentence to test with

sentence = "Hugh Jackman is a terrible "


And then predict with it:

#### LanguageModelTuner.predict[source]

LanguageModelTuner.predict(text:Union[List[str], str], bs:int=64, num_tokens_to_produce:int=50, **kwargs)

Predict some text for sequence classification with the currently loaded model

Parameters:

• text : typing.Union[typing.List[str], str]

Some text or list of texts to do inference with

• bs : <class 'int'>, optional

A batch size to use for multiple texts

• num_tokens_to_produce : <class 'int'>, optional

Number of tokens to generate

• kwargs : <class 'inspect._empty'>
tuner.predict(sentence, num_tokens_to_produce=13)

100.00% [1/1 00:00<00:00]
{'generated_text': ["Hugh Jackman is a terrible icky, and I'm not sure if he's a good actor"]}

### With the Inference API

Next we will use the EasyTextGenerator class, which AdaptNLP offers:

from adaptnlp import EasyTextGenerator


We simply construct the class:

classifier = EasyTextGenerator()


And call the tag_text method, passing in the sentence, the location of our saved model, and some names for our classes:

classifier.generate(
sentence,
model_name_or_path='good_model',
num_tokens_to_produce=13
)

100.00% [1/1 00:00<00:00]
{'generated_text': ["Hugh Jackman is a terrible icky, and I'm not sure if he's a good actor"]}

And we got the exact same output!