Tuning a base Language model on the IMDB dataset
 

Introduction

In this tutorial we will be showing an end-to-end example of fine-tuning a Transformer language model on a custom dataset in text files format.

By the end of this you should be able to:

  1. Build a dataset with the LanguageModelDatasets class, and their DataLoaders
  2. Build a LanguageModelTuner quickly, find a good learning rate, and train with the One-Cycle Policy
  3. Save that model away, to be used with deployment or other HuggingFace libraries
  4. Apply inference using both the Tuner available function as well as with the EasyTextGenerator class within AdaptNLP

Installing the Library

This tutorial utilizies the latest AdaptNLP version, as well as parts of the fastai library. Please run the below code to install them:

!pip install adaptnlp -U

(or pip3)

Getting the Dataset

First we need a dataset. We will use the fastai library to download the full IMDB Movie Reviews dataset

from fastai.data.external import URLs, untar_data

URLs holds a namespace of many data endpoints, and untar_data is a function that can download and extract any data from a given URL.

Combining both, we can download the data:

data_path = untar_data(URLs.IMDB)

If we look at what was downloaded, we will find a train and test folder:

data_path.ls()
(#7) [Path('/root/.fastai/data/imdb/test'),Path('/root/.fastai/data/imdb/README'),Path('/root/.fastai/data/imdb/train'),Path('/root/.fastai/data/imdb/imdb.vocab'),Path('/root/.fastai/data/imdb/tmp_clas'),Path('/root/.fastai/data/imdb/unsup'),Path('/root/.fastai/data/imdb/tmp_lm')]

In each are folders seperating each text file by class:

(data_path/'train').ls()
(#4) [Path('/root/.fastai/data/imdb/train/pos'),Path('/root/.fastai/data/imdb/train/neg'),Path('/root/.fastai/data/imdb/train/unsupBow.feat'),Path('/root/.fastai/data/imdb/train/labeledBow.feat')]

As a result, we can say the dataset follows the following format:

  • train
    • class_a
      • text1.txt
      • text2.txt
      • ...
    • class_b
      • text1.txt
      • ...
  • test (or valid)
    • class_a
      • text1.txt
      • ...
    • class_b
      • text1.txt
      • ...

Now that we have the dataset, and we know the format it is in, let's pick a viable model to train with

Picking a Model with the Hub

AdaptNLP has a HFModelHub class that allows you to communicate with the HuggingFace Hub and pick a model from it, as well as a namespace HF_TASKS class with a list of valid tasks we can search by.

Let's try and find one suitable for sequence classification.

First we need to import the class and generate an instance of it:

from adaptnlp import HFModelHub, HF_TASKS
hub = HFModelHub()

Next we can search for a model:

models = hub.search_model_by_task(HF_TASKS.TEXT_GENERATION)

Let's look at a few:

models[:10]
[Model Name: distilgpt2, Tasks: [text-generation],
 Model Name: gpt2-large, Tasks: [text-generation],
 Model Name: gpt2-medium, Tasks: [text-generation],
 Model Name: gpt2-xl, Tasks: [text-generation],
 Model Name: gpt2, Tasks: [text-generation],
 Model Name: openai-gpt, Tasks: [text-generation],
 Model Name: transfo-xl-wt103, Tasks: [text-generation],
 Model Name: xlnet-base-cased, Tasks: [text-generation],
 Model Name: xlnet-large-cased, Tasks: [text-generation]]

These are models specifically tagged with the text-generation tag, so you may not see a few models you would expect such as bert_base_cased.

We'll use that first model, distilgpt2:

model = models[0]
model
Model Name: distilgpt2, Tasks: [text-generation]

Now that we have picked a model, let's use the data API to prepare our data

Each task has a high-level data wrapper around the TaskDatasets class. In our case this is the LanguageModelDatasets class:

from adaptnlp import LanguageModelDatasets

There are multiple different constructors for the LanguageModelDatasets class, and you should never call the main constructor directly.

We will be using from_folders method:

LanguageModelDatasets.from_folders[source]

LanguageModelDatasets.from_folders(train_path:Path, tokenizer_name:str, block_size:int=128, masked_lm:bool=False, valid_path:Path=None, split_func:callable=None, split_pct:float=0.1, tokenize_kwargs:dict={}, auto_kwargs:dict={})

Builds LanguageModelDatasets from a folder or group of folders

Parameters:

  • train_path : <class 'pathlib.Path'>

    The path to the training data

  • tokenizer_name : <class 'str'>

    The name of the tokenizer

  • block_size : <class 'int'>, optional

    The size of each block

  • masked_lm : <class 'bool'>, optional

    Whether the language model is a MLM

  • valid_path : <class 'pathlib.Path'>, optional

    An optional validation path

  • split_func : <built-in function callable>, optional

    Optionally a splitting function similar to RandomSplitter

  • split_pct : <class 'float'>, optional

    What % to split the df between training and validation

  • tokenize_kwargs : <class 'dict'>, optional

    kwargs for the tokenize function

  • auto_kwargs : <class 'dict'>, optional

    kwargs for the AutoTokenizer.from_pretrained constructor

Anything you would normally pass to the tokenizer call (such as max_length, padding) should go in tokenize_kwargs, and anything going to the AutoTokenizer.from_pretrained constructor should be passed to the auto_kwargs.

In our case we have a train_path and valid_path

from fastcore.basics import patch

Also, we will set a block_size of 128, and it is not a masked language model:

dsets = LanguageModelDatasets.from_folders(
    train_path=data_path/'train',
    valid_path=data_path/'test',
    tokenizer_name=model.name,
    block_size=128,
    masked_lm=False
)
Using custom data configuration default-1f2b71eec4880b46
Reusing dataset text_no_new_line (/root/.cache/huggingface/datasets/text_no_new_line/default-1f2b71eec4880b46/0.0.0)
Using custom data configuration default-04d8fbd2bd2108a0
Reusing dataset text_no_new_line (/root/.cache/huggingface/datasets/text_no_new_line/default-04d8fbd2bd2108a0/0.0.0)
No value for `max_length` set, automatically adjusting to the size of the model and including truncation
Sequence length set to: 1024




And finally turn it into some AdaptiveDataLoaders.

These are just fastai's DataLoaders class, but it overrides a few functions to have it work nicely with HuggingFace's Dataset class

LanguageModelDatasets.dataloaders[source]

LanguageModelDatasets.dataloaders(batch_size=8, shuffle_train=True, collate_fn=default_data_collator, mlm_probability:float=0.15, path='.', device=None)

Build DataLoaders from self

Parameters:

  • batch_size : <class 'int'>, optional

    A batch size

  • shuffle_train : <class 'bool'>, optional

    Whether to shuffle the training dataset

  • collate_fn : <class 'function'>, optional

    A custom collation function

  • mlm_probability : <class 'float'>, optional

    Token masking probablity for Masked Language Models

  • path : <class 'str'>, optional

  • device : <class 'NoneType'>, optional

dls = dsets.dataloaders(batch_size=8)

Finally, let's view a batch of data with the show_batch function:

dls.show_batch()
Input Label
0 dad."<br /><br />This film makes periodic appearances on TV but today my teenage son and I saw it in a theater with quite a few youngsters present. It was great to see computer-besotted kids in an affluent community respond with cheers and applause to special effects that must seem primitive to them.<br /><br />"Thief of Bagdad" is a pre-war Hollywood classic from a time when strong production values often resulted in enduringly attractive and important releases. This is one of the best of its kind.<br /><br />9/10.This movie is simply incredible! I had expected something quite different form the film that I actually saw. However, it is very insightful in that it shows the aggressive nature of human sexuality and its linkage with animal behavior. Let me warn those among the readers of this article who are easily offended by content that is all too sexual, for the explicit sexual nature of this film feels like a high-brow sort of pornography. It even features a scene that comes extremely close to rape.<br /><br />Meanwhile, I strongly suggest seeing this rare work of "sexual art". Every minute of the picture breathes the sexual spirit of the seventies, by the way. One should not forget how times have changed!<br /><br />Go see it! It´s worth your money and time!If you have ever read and enjoyed a novel by Tom Robbins you will appreciate this movie as a whole-hearted attempt to translate his outrageously unconventional writing style into a workable piece of big screen art. The actors and the direction of this film are both good. <br /><br />The only trouble with the film, as I can see it, is that Robbins can relate ideas and sentiments with his words that were still beyond Hollywood's capabilities at the time this film was shot.<br /><br />Given both the irreverence of today's movies, as well as the willingness and abilityof today's audiences to delve into the bizarre, I think "Even Cowgirls... would receive a better reception today than it did when it was originally released.With Iphigenia, Mikhali Cacoyannis is perhaps the first film director to have successfully brought the feel of ancient Greek theatre to the screen. His own screenplay, an adaptation of Euripides' tragedy, was far from easy, compared to that of the other two films of the trilogy he directed. The story has been very carefully deconstructed from Euripides' version and dad."<br /><br />This film makes periodic appearances on TV but today my teenage son and I saw it in a theater with quite a few youngsters present. It was great to see computer-besotted kids in an affluent community respond with cheers and applause to special effects that must seem primitive to them.<br /><br />"Thief of Bagdad" is a pre-war Hollywood classic from a time when strong production values often resulted in enduringly attractive and important releases. This is one of the best of its kind.<br /><br />9/10.This movie is simply incredible! I had expected something quite different form the film that I actually saw. However, it is very insightful in that it shows the aggressive nature of human sexuality and its linkage with animal behavior. Let me warn those among the readers of this article who are easily offended by content that is all too sexual, for the explicit sexual nature of this film feels like a high-brow sort of pornography. It even features a scene that comes extremely close to rape.<br /><br />Meanwhile, I strongly suggest seeing this rare work of "sexual art". Every minute of the picture breathes the sexual spirit of the seventies, by the way. One should not forget how times have changed!<br /><br />Go see it! It´s worth your money and time!If you have ever read and enjoyed a novel by Tom Robbins you will appreciate this movie as a whole-hearted attempt to translate his outrageously unconventional writing style into a workable piece of big screen art. The actors and the direction of this film are both good. <br /><br />The only trouble with the film, as I can see it, is that Robbins can relate ideas and sentiments with his words that were still beyond Hollywood's capabilities at the time this film was shot.<br /><br />Given both the irreverence of today's movies, as well as the willingness and abilityof today's audiences to delve into the bizarre, I think "Even Cowgirls... would receive a better reception today than it did when it was originally released.With Iphigenia, Mikhali Cacoyannis is perhaps the first film director to have successfully brought the feel of ancient Greek theatre to the screen. His own screenplay, an adaptation of Euripides' tragedy, was far from easy, compared to that of the other two films of the trilogy he directed. The story has been very carefully deconstructed from Euripides' version and
1 up his seat for one of the intended victims, flees with his tail in-between his legs, rather than face immanent death with the school kids he's promised not to leave behind.<br /><br />It's more of character study, and a come to Jesus moment for one character, than a story about the genocide in "RAWANDA". This movie didn't have to take place in RAWANDA, it could have taken place any one of the Genocidal hell holes going around this world at any given time.This is one seriously disturbed movie. Even Though the boys deserved some of what they got.....the sadistic gruesome executions were "slightly" over the top. The only character showing some conscience early in the hunt was killed off before he could offer some help to the sad plot.<br /><br />At the beginning of the movie, there looked to be some promise of a mediocre affair, but this was just a ploy to lull the viewers into a false sense of security, before the joy of what was to come. <br /><br />The only thing that could have saved the movie for me was if Jack Nicholson had jumped out of the bushes and yelled, "and, where is the batman?". Kim Basinger could have screamed. <br /><br />Now that would have been cool!I stopped by BB and picked up 4 zombie flicks to watch over the weekend. Now, I understand that the effects will be cheesy, the acting will be sub-par, and the sets will be suspect. So I'm not expecting much. But it should at least have a story. Stories don't cost a thing except time.....apparently, they didn't have any time either.<br /><br />"Zombie Nation" had 5 zombies that appeared near the end of the movie that all looked like new wave hookers. The picture of the zombie on the front cover NEVER appears in the movie. It was absolutely agonizing to watch and had nothing to offer the genre.<br /><br />The running time is only 81 minutes but it felt like 2 hours. According to my wife (who could only hear the movie since she was on the computer in another room), it sounded like zombie porn....which if you think about, sounds kinda gross.....but it wasn't even that good.<br /><br />The only suggestion I can make is that maybe the writer tried to do too many things and ended up with an incoherent mess.<br /><br up his seat for one of the intended victims, flees with his tail in-between his legs, rather than face immanent death with the school kids he's promised not to leave behind.<br /><br />It's more of character study, and a come to Jesus moment for one character, than a story about the genocide in "RAWANDA". This movie didn't have to take place in RAWANDA, it could have taken place any one of the Genocidal hell holes going around this world at any given time.This is one seriously disturbed movie. Even Though the boys deserved some of what they got.....the sadistic gruesome executions were "slightly" over the top. The only character showing some conscience early in the hunt was killed off before he could offer some help to the sad plot.<br /><br />At the beginning of the movie, there looked to be some promise of a mediocre affair, but this was just a ploy to lull the viewers into a false sense of security, before the joy of what was to come. <br /><br />The only thing that could have saved the movie for me was if Jack Nicholson had jumped out of the bushes and yelled, "and, where is the batman?". Kim Basinger could have screamed. <br /><br />Now that would have been cool!I stopped by BB and picked up 4 zombie flicks to watch over the weekend. Now, I understand that the effects will be cheesy, the acting will be sub-par, and the sets will be suspect. So I'm not expecting much. But it should at least have a story. Stories don't cost a thing except time.....apparently, they didn't have any time either.<br /><br />"Zombie Nation" had 5 zombies that appeared near the end of the movie that all looked like new wave hookers. The picture of the zombie on the front cover NEVER appears in the movie. It was absolutely agonizing to watch and had nothing to offer the genre.<br /><br />The running time is only 81 minutes but it felt like 2 hours. According to my wife (who could only hear the movie since she was on the computer in another room), it sounded like zombie porn....which if you think about, sounds kinda gross.....but it wasn't even that good.<br /><br />The only suggestion I can make is that maybe the writer tried to do too many things and ended up with an incoherent mess.<br /><br
2 and how they painted the ones on their side. It was the ending that I hated. I was disappointed that it was earth but 150k years back. But to travel all that way just to start over? Are you kidding me? 38k people that fought for their very existence and once they get to paradise, they abandon technology? No way. Sure they were eating paper and rationing food, but that is over. They can live like humans again. They only have one good doctor. What are they going to do when someone has a tooth ache never mind giving birth... yea right. No one would have made that choice.I have to agree with some of the other comments and even go a step further. <br /><br />Nothing about this film worked, absolutely nothing. Delmar our central character makes the decision to become a surrogate mother in order to earn enough money to buy a restaurant but along the way fall for a wise ex-jailbird. At the same time her friend Hortense is trying to get her lawyer boyfriend to finally marry her. She also happens to be sleeping with Marlon who is desperately in love with her. Then there's Delmar's brother Jethro who gets involved with a former coke addict, Missy who reveals she was sexually abused by her adopted father. On the sidelines we also have the eccentricmother who has an assortment of equally odd friends, one of whom dies on the couch at the beginning of the film. So far so good but after introducing these characters and story lines addressing life, death, grief and love in the first half, the film simply loses direction. <br /><br />If the writer had only selected one or two characters and allowed us to follow their stories maybe things would have been fine but equal screen time is given to all with the result that no one story or character is fully developed. For instance, why does Delmar think she will be able to hand over her child in exchange for money, especially when the prospective parents are a creepy bigoted lawyer and his semi alcoholic and depressed wife? Why is Hortense so desperate to marry a man who is a jerk and clearly doesn't love her? How is it Missy manages to kick her coke habit overnight? Is Jethro regularly drawn to women with overwhelming problems, or is Missy the exception? Has Delmar and Jethro's mother always been on the eccentric side, or is it a more recent development? Why is Jethro so keen on Cadillacs that he and how they painted the ones on their side. It was the ending that I hated. I was disappointed that it was earth but 150k years back. But to travel all that way just to start over? Are you kidding me? 38k people that fought for their very existence and once they get to paradise, they abandon technology? No way. Sure they were eating paper and rationing food, but that is over. They can live like humans again. They only have one good doctor. What are they going to do when someone has a tooth ache never mind giving birth... yea right. No one would have made that choice.I have to agree with some of the other comments and even go a step further. <br /><br />Nothing about this film worked, absolutely nothing. Delmar our central character makes the decision to become a surrogate mother in order to earn enough money to buy a restaurant but along the way fall for a wise ex-jailbird. At the same time her friend Hortense is trying to get her lawyer boyfriend to finally marry her. She also happens to be sleeping with Marlon who is desperately in love with her. Then there's Delmar's brother Jethro who gets involved with a former coke addict, Missy who reveals she was sexually abused by her adopted father. On the sidelines we also have the eccentricmother who has an assortment of equally odd friends, one of whom dies on the couch at the beginning of the film. So far so good but after introducing these characters and story lines addressing life, death, grief and love in the first half, the film simply loses direction. <br /><br />If the writer had only selected one or two characters and allowed us to follow their stories maybe things would have been fine but equal screen time is given to all with the result that no one story or character is fully developed. For instance, why does Delmar think she will be able to hand over her child in exchange for money, especially when the prospective parents are a creepy bigoted lawyer and his semi alcoholic and depressed wife? Why is Hortense so desperate to marry a man who is a jerk and clearly doesn't love her? How is it Missy manages to kick her coke habit overnight? Is Jethro regularly drawn to women with overwhelming problems, or is Missy the exception? Has Delmar and Jethro's mother always been on the eccentric side, or is it a more recent development? Why is Jethro so keen on Cadillacs that he
3 corner.<br /><br />Peter's fate ultimately lies with the heavenly court and American prosecutor (Raymond Massey), whose jury consists of several deceased war heroes and posh British delegates. The surreal trial, which dissolves from b/w back into rich Technicolor, once the verdict is announced, may well be a dream, but the final shot in the hospital validates the predictable outcome.<br /><br />The abstract, frame filling "stairway to heaven" (the American title of the film) is used twice: the first time in b/w, when it elevates Peter and his enigmatic French guardian upwards, crossing giant statues of Peter's potential attorneys for the trial, including Abraham Lincoln and Plato. The second time, the softly lit colour stairway provides the setting for what is an iconic image in cinema - Peter and June frozen side-by-side, their marvelled eyes fixed forward in the frame, their fate sealed.<br /><br />The unlikely affection shared between Peter and June never turns mushy or verbose; it's treated with nobility and the perception that the couple are already suitable enough to be married and simply need to convince people of their love, so it can keep them together. <br /><br />The French Conductor, who can freeze time and people's bodies, obtrudes many of their key moments together, lecturing Peter about history and among his mischievous tricks, pinching Peter's 'Top 100 Game Tricks' book and his coffee cup.<br /><br />As visually inspired as other Powell/Pressburger collaborations, this was the first time they combined colour with b/w – the latter having a cheerful quality when used for the heaven scenes, and both are equally captivating. <br /><br />The outstanding script more than matches the imaginative set design, with dialogue that sounds so immediate that is doesn't feel like it was written or performed for the screen. Amusing and witty, Powell/Pressburger's writing deserves equal acclamation with their forte for colour and composition.<br /><br />Made in 1946, "A Matter of Life and Death" is one of those films that defies it age, looking fresh and inventive, even in this age where CGI would vamp up its artificial effects, probably stripping them of their emotional wonder. <br /><br />Other jarring changes would include the need for reduced average seconds for cutting and the inevitable plea to shorten dialogue so it can preserve corner.<br /><br />Peter's fate ultimately lies with the heavenly court and American prosecutor (Raymond Massey), whose jury consists of several deceased war heroes and posh British delegates. The surreal trial, which dissolves from b/w back into rich Technicolor, once the verdict is announced, may well be a dream, but the final shot in the hospital validates the predictable outcome.<br /><br />The abstract, frame filling "stairway to heaven" (the American title of the film) is used twice: the first time in b/w, when it elevates Peter and his enigmatic French guardian upwards, crossing giant statues of Peter's potential attorneys for the trial, including Abraham Lincoln and Plato. The second time, the softly lit colour stairway provides the setting for what is an iconic image in cinema - Peter and June frozen side-by-side, their marvelled eyes fixed forward in the frame, their fate sealed.<br /><br />The unlikely affection shared between Peter and June never turns mushy or verbose; it's treated with nobility and the perception that the couple are already suitable enough to be married and simply need to convince people of their love, so it can keep them together. <br /><br />The French Conductor, who can freeze time and people's bodies, obtrudes many of their key moments together, lecturing Peter about history and among his mischievous tricks, pinching Peter's 'Top 100 Game Tricks' book and his coffee cup.<br /><br />As visually inspired as other Powell/Pressburger collaborations, this was the first time they combined colour with b/w – the latter having a cheerful quality when used for the heaven scenes, and both are equally captivating. <br /><br />The outstanding script more than matches the imaginative set design, with dialogue that sounds so immediate that is doesn't feel like it was written or performed for the screen. Amusing and witty, Powell/Pressburger's writing deserves equal acclamation with their forte for colour and composition.<br /><br />Made in 1946, "A Matter of Life and Death" is one of those films that defies it age, looking fresh and inventive, even in this age where CGI would vamp up its artificial effects, probably stripping them of their emotional wonder. <br /><br />Other jarring changes would include the need for reduced average seconds for cutting and the inevitable plea to shorten dialogue so it can preserve
4 carry out for the 3 months that Bill is in New York, while Bill meets with Cleo and another woman. At the end, love is in the air for Bill and one other.............<br /><br />The picture quality and sound quality are poor in this film. The story is interspersed with musical numbers but the songs are bad and Kathryn Crawford has a terrible voice. Rogers isn't that good either. He's pleasant enough but only really comes to life when playing the drums or trombone. There is a very irritating character who plays a cab driver (Roscoe Karns) and the film is just dull.i've seen a movie thats sort of like this, were a transsexual drugs woman and he then picks there nose with a knife and rips there nose to peaces. he then slices there tongue and eats it.<br /><br />the most gruesome part of the movie is were he cuts there left eye out and starts dancing with it. he then starts to eat the woman naked.<br /><br />(i'm not sure what the movies called but i know it's a cult movie and that it was made in Germany).<br /><br />anyway THE NOSE PICKER is fairly crap.<br /><br />its a crap movie and the picture and volume quality is very rubbish.<br /><br />please don't waste you're time buying and watching this movie its totally crap.<br /><br />i prefer DAY OF THE WOMAN also known as I SPIT ON YOUR GRAVE (its one of the best cult movies ever) check out this link http://www.imdb.com/title/tt0077713/Having searched for this movie high and low, I actually found it when I least expected, playing on the Sundance Channel very early in the morning one day. Why I searched endlessly for a small vanity project that Chuck Barris that was made during the last waning years of the TV show, I haven't a clue. The film is simply put horrible. The scripted part that deals with a week that is. Of course the highlight of the film is seeing the real performers that were "too hot for TV" or rejected for some reason or other. That part is still horrid, but campy bad which was enjoyable in it's own way. Now that I saw what I sought after for so long will I watch it again in my lifetime? Resoundingly NO!! Do yourself a favor and just watch the MUCH carry out for the 3 months that Bill is in New York, while Bill meets with Cleo and another woman. At the end, love is in the air for Bill and one other.............<br /><br />The picture quality and sound quality are poor in this film. The story is interspersed with musical numbers but the songs are bad and Kathryn Crawford has a terrible voice. Rogers isn't that good either. He's pleasant enough but only really comes to life when playing the drums or trombone. There is a very irritating character who plays a cab driver (Roscoe Karns) and the film is just dull.i've seen a movie thats sort of like this, were a transsexual drugs woman and he then picks there nose with a knife and rips there nose to peaces. he then slices there tongue and eats it.<br /><br />the most gruesome part of the movie is were he cuts there left eye out and starts dancing with it. he then starts to eat the woman naked.<br /><br />(i'm not sure what the movies called but i know it's a cult movie and that it was made in Germany).<br /><br />anyway THE NOSE PICKER is fairly crap.<br /><br />its a crap movie and the picture and volume quality is very rubbish.<br /><br />please don't waste you're time buying and watching this movie its totally crap.<br /><br />i prefer DAY OF THE WOMAN also known as I SPIT ON YOUR GRAVE (its one of the best cult movies ever) check out this link http://www.imdb.com/title/tt0077713/Having searched for this movie high and low, I actually found it when I least expected, playing on the Sundance Channel very early in the morning one day. Why I searched endlessly for a small vanity project that Chuck Barris that was made during the last waning years of the TV show, I haven't a clue. The film is simply put horrible. The scripted part that deals with a week that is. Of course the highlight of the film is seeing the real performers that were "too hot for TV" or rejected for some reason or other. That part is still horrid, but campy bad which was enjoyable in it's own way. Now that I saw what I sought after for so long will I watch it again in my lifetime? Resoundingly NO!! Do yourself a favor and just watch the MUCH

When training a language model, the input and output are made to be the exact same, so there isn't a shown noticable difference here.

Building Tuner

Next we need to build a compatible Tuner for our problem. These tuners contain good defaults for our problem space, including loss functions and metrics.

First let's import the LanguageModelTuner and view it's documentation

from adaptnlp import LanguageModelTuner

class LanguageModelTuner[source]

LanguageModelTuner(dls:DataLoaders, model_name, tokenizer=None, language_model_type:LMType='causal', loss_func=CrossEntropyLoss(), metrics=[<fastai.metrics.Perplexity object at 0x7fcb71071790>], opt_func=Adam, additional_cbs=None, expose_fastai_api=False, **kwargs) :: AdaptiveTuner

An AdaptiveTuner with good defaults for Language Model fine-tuning Valid kwargs and defaults:

  • lr:float = 0.001
  • splitter:function = trainable_params
  • cbs:list = None
  • path:Path = None
  • model_dir:Path = 'models'
  • wd:float = None
  • wd_bn_bias:bool = False
  • train_bn:bool = True
  • moms: tuple(float) = (0.95, 0.85, 0.95)

Parameters:

  • dls : <class 'fastai.data.core.DataLoaders'>

    A set of DataLoaders or AdaptiveDataLoaders

  • model_name : <class 'inspect._empty'>

    A HuggingFace model

  • tokenizer : <class 'NoneType'>, optional

    A HuggingFace tokenizer

  • language_model_type : <class 'fastcore.basics.LMType'>, optional

    The type of language model to use

  • loss_func : <class 'fastai.losses.CrossEntropyLossFlat'>, optional

    A loss function

  • metrics : <class 'list'>, optional

    Metrics to monitor the training with

  • opt_func : <class 'function'>, optional

    A fastai or torch Optimizer

  • additional_cbs : <class 'NoneType'>, optional

    Additional Callbacks to have always tied to the Tuner,

  • expose_fastai_api : <class 'bool'>, optional

    Whether to expose the fastai API

  • kwargs : <class 'inspect._empty'>

Next we'll pass in our DataLoaders, the name of our model, and the tokenizer:

tuner = LanguageModelTuner(dls, model.name, dls.tokenizer)

By default we can see that it used CrossEntropyLoss as our loss function, and Perplexity as our metric

tuner.loss_func
FlattenedLoss of CrossEntropyLoss()
_ = [print(m.name) for m in tuner.metrics]
perplexity

Finally we just need to train our model!

Fine-Tuning

And all that's left is to tune. There are only 4 or 5 functions you can call on our tuner currently, and this is by design to make it simplistic. In case you don't want to be boxed in however, if you pass in expose_fastai_api=True to our earlier call, it will expose the entirety of Learner to you, so you can call fit_one_cycle, lr_find, and everything else as Tuner uses fastai under the hood.

First, let's call lr_find, which uses fastai's Learning Rate Finder to help us pick a learning rate.

AdaptiveTuner.lr_find[source]

AdaptiveTuner.lr_find(start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=valley)

Runs fastai's LR Finder

Parameters:

  • start_lr : <class 'float'>, optional

  • end_lr : <class 'int'>, optional

  • num_it : <class 'int'>, optional

  • stop_div : <class 'bool'>, optional

  • show_plot : <class 'bool'>, optional

  • suggest_funcs : <class 'function'>, optional

tuner.lr_find()
/opt/venv/lib/python3.8/site-packages/fastai/callback/schedule.py:270: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ro" (-> color='r'). The keyword argument will take precedence.
  ax.plot(val, idx, 'ro', label=nm, c=color)
SuggestedLRs(valley=7.585775892948732e-05)

It recommends a learning rate of around 5e-5, so we will use that.

lr = 5e-5

Let's look at the documentation for tune:

AdaptiveTuner.tune[source]

AdaptiveTuner.tune(epochs:int, lr:float=None, strategy:Strategy='fit_one_cycle', callbacks:list=[], **kwargs)

Fine tune self.model for epochs with an lr and strategy

Parameters:

  • epochs : <class 'int'>

    Number of iterations to train for

  • lr : <class 'float'>, optional

    If None, finds a new learning rate and uses suggestion_method

  • strategy : <class 'fastcore.basics.Strategy'>, optional

    A fitting method

  • callbacks : <class 'list'>, optional

    Extra fastai Callbacks

  • kwargs : <class 'inspect._empty'>

We can pass in a number of epochs, a learning rate, a strategy, and additional fastai callbacks to call.

Valid strategies live in the Strategy namespace class, and consist of:

from adaptnlp import Strategy

In this tutorial we will train with the One-Cycle policy, as currently it is one of the best schedulers to use.

tuner.tune(3, lr, strategy=Strategy.OneCycle)
epoch train_loss valid_loss perplexity time
0 3.907161 3.809843 45.143364 31:15
1 3.814265 3.767976 43.292336 31:22
2 3.766881 3.760747 42.980507 31:02

Saving Model

Now that we have a trained model, let's save those weights away.

Calling tuner.save will save both the model and the tokenizer in the same format as how HuggingFace does:

AdaptiveTuner.save[source]

AdaptiveTuner.save(save_directory)

Save a pretrained model to a save_directory

Parameters:

  • save_directory : <class 'inspect._empty'>

    A folder to save our model to

tuner.save('good_model')
'good_model'

Performing Inference

There are two ways to get predictions, the first is with the .predict method in our tuner. This is great for if you just finished training and want to see how your model performs on some new data! The other method is with AdaptNLP's inference API, which we will show afterwards

In Tuner

First let's write a sentence to test with

sentence = "Hugh Jackman is a terrible"

And then predict with it:

LanguageModelTuner.predict[source]

LanguageModelTuner.predict(text:Union[List[str], str], bs:int=64, num_tokens_to_produce:int=50, **kwargs)

Predict some text for sequence classification with the currently loaded model

Parameters:

  • text : typing.Union[typing.List[str], str]

    Some text or list of texts to do inference with

  • bs : <class 'int'>, optional

    A batch size to use for multiple texts

  • num_tokens_to_produce : <class 'int'>, optional

    Number of tokens to generate

  • kwargs : <class 'inspect._empty'>
tuner.predict(sentence, num_tokens_to_produce=13)
100.00% [1/1 00:00<00:00]
{'generated_text': ["Hugh Jackman is a terrible actor, and I'm not sure if he's a good actor"]}

With the Inference API

Next we will use the EasyTextGenerator class, which AdaptNLP offers:

from adaptnlp import EasyTextGenerator

We simply construct the class:

classifier = EasyTextGenerator()

And call the tag_text method, passing in the sentence, the location of our saved model, and some names for our classes:

classifier.generate(
    sentence,
    model_name_or_path='good_model',
    num_tokens_to_produce=13
)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
100.00% [1/1 00:00<00:00]
{'generated_text': ["Hugh Jackman is a terrible actor, and I'm not sure if he's a good actor"]}

And we got the exact same output!