Performing document summarization with AdaptNLP

Summarization is the NLP task of compressing one or many documents but still retain the input's original context and meaning.

Below, we'll walk through how we can use AdaptNLP's EasySummarizer module to summarize large amounts of text with state-of-the-art models.

Getting Started with EasySummarizer

First we'll import our EasySummarizer class:

from adaptnlp import EasySummarizer

Next we need some example text to use:

text = ["""Einstein’s education was disrupted by his father’s repeated failures at business. In 1894, after his company failed to get an important 
          contract to electrify the city of Munich, Hermann Einstein moved to Milan to work with a relative. Einstein was left at a boardinghouse in 
          Munich and expected to finish his education. Alone, miserable, and repelled by the looming prospect of military duty when he turned 16, Einstein 
          ran away six months later and landed on the doorstep of his surprised parents. His parents realized the enormous problems that he faced as a 
          school dropout and draft dodger with no employable skills. His prospects did not look promising.
          Fortunately, Einstein could apply directly to the Eidgenössische Polytechnische Schule (“Swiss Federal Polytechnic School”; in 1911, 
          following expansion in 1909 to full university status, it was renamed the Eidgenössische Technische Hochschule, or “Swiss Federal 
          Institute of Technology”) in Zürich without the equivalent of a high school diploma if he passed its stiff entrance examinations. His marks 
          showed that he excelled in mathematics and physics, but he failed at French, chemistry, and biology. Because of his exceptional math scores, 
          he was allowed into the polytechnic on the condition that he first finish his formal schooling. He went to a special high school run by 
          Jost Winteler in Aarau, Switzerland, and graduated in 1896. He also renounced his German citizenship at that time. (He was stateless until 1901, 
          when he was granted Swiss citizenship.) He became lifelong friends with the Winteler family, with whom he had been boarding. (Winteler’s 
          daughter, Marie, was Einstein’s first love; Einstein’s sister, Maja, would eventually marry Winteler’s son Paul; and his close friend Michele 
          Besso would marry their eldest daughter, Anna.)""",
       """Einstein would write that two “wonders” deeply affected his early years. The first was his encounter with a compass at age five. 
          He was mystified that invisible forces could deflect the needle. This would lead to a lifelong fascination with invisible forces. 
          The second wonder came at age 12 when he discovered a book of geometry, which he devoured, calling it his 'sacred little geometry 
          book'. Einstein became deeply religious at age 12, even composing several songs in praise of God and chanting religious songs on 
          the way to school. This began to change, however, after he read science books that contradicted his religious beliefs. This challenge 
          to established authority left a deep and lasting impression. At the Luitpold Gymnasium, Einstein often felt out of place and victimized 
          by a Prussian-style educational system that seemed to stifle originality and creativity. One teacher even told him that he would 
          never amount to anything."""]

And finally we'll intantiate the summarizer:

summarizer = EasySummarizer()

Summarizing with summarize

Now that we have the summarizer instantiated, we are ready to load in a model and compress the text with the built-in summarize() method.

This method takes in parameters: text, model_name_or_path, and mini_batch_size as well as optional keyword arguments from the Transformers.PreTrainedModel.generate() method.

Our first example will be with the t5-small model:

summaries = summarizer.summarize(text = text, model_name_or_path="t5-small", mini_batch_size=1, num_beams = 4, min_length=0, max_length=100, early_stopping=True)

And we can see its output below:

print("Summaries:\n")
for s in summaries['summaries']:
    print(s, "\n")
Summaries:

Hermann Einstein was left at a boardinghouse and expected to finish his education . he ran away six months later and landed on the doorstep of his surprised parents . he could apply directly to the Eidgenössische Polytechnische Schule without the equivalent of a high school diploma . 

Einstein was mystified that invisible forces could deflect the needle . the second wonder came at age 12 when he discovered a book of geometry . he became deeply religious at age 12 . 

Next we'll use the bart-large-cnn from Facebook. We can either simply pass in facebook/bart-large-cnn to our summarizer, or we can use the HFModelHub to go and search for it. Let's try that now:

from adaptnlp import HFModelHub
hub = HFModelHub()
models = hub.search_model_by_name('facebook/bart-large', user_uploaded=True); models

[Model Name: facebook/bart-large-cnn, Tasks: [summarization],
 Model Name: facebook/bart-large-mnli, Tasks: [zero-shot-classification],
 Model Name: facebook/bart-large-xsum, Tasks: [summarization],
 Model Name: facebook/bart-large, Tasks: []]

We can see that the first result is our bart-large-cnn, let's use it:

model = models[0]

And directly pass it into summarizer.summarize:

summaries = summarizer.summarize(text = text, model_name_or_path=model, mini_batch_size=1, num_beams = 2, min_length=40, max_length=300, early_stopping=True)

And finally we can view the results:

print("Summaries:\n")
for s in summaries['summaries']:
    print(s, "\n")
Summaries:

Einstein’s education was disrupted by his father’S repeated failures at business. In 1894, after his company failed to get an important contract, Einstein moved to Milan to work with a relative. Einstein was left at a boardinghouse in Munich and expected to finish his education. 

Einstein would write that two ‘wonders’ deeply affected his early years. The first was his encounter with a compass at age five. The second wonder came at age 12 when he discovered a book of geometry.