Summarization is the NLP task of compressing one or many documents but still retain the input's original context and meaning.
Below, we'll walk through how we can use AdaptNLP's EasySummarizer
module to summarize large amounts of text with state-of-the-art models.
Getting Started with EasySummarizer
First we'll import our EasySummarizer
class:
from adaptnlp import EasySummarizer
Next we need some example text to use:
text = ["""Einstein’s education was disrupted by his father’s repeated failures at business. In 1894, after his company failed to get an important
contract to electrify the city of Munich, Hermann Einstein moved to Milan to work with a relative. Einstein was left at a boardinghouse in
Munich and expected to finish his education. Alone, miserable, and repelled by the looming prospect of military duty when he turned 16, Einstein
ran away six months later and landed on the doorstep of his surprised parents. His parents realized the enormous problems that he faced as a
school dropout and draft dodger with no employable skills. His prospects did not look promising.
Fortunately, Einstein could apply directly to the Eidgenössische Polytechnische Schule (“Swiss Federal Polytechnic School”; in 1911,
following expansion in 1909 to full university status, it was renamed the Eidgenössische Technische Hochschule, or “Swiss Federal
Institute of Technology”) in Zürich without the equivalent of a high school diploma if he passed its stiff entrance examinations. His marks
showed that he excelled in mathematics and physics, but he failed at French, chemistry, and biology. Because of his exceptional math scores,
he was allowed into the polytechnic on the condition that he first finish his formal schooling. He went to a special high school run by
Jost Winteler in Aarau, Switzerland, and graduated in 1896. He also renounced his German citizenship at that time. (He was stateless until 1901,
when he was granted Swiss citizenship.) He became lifelong friends with the Winteler family, with whom he had been boarding. (Winteler’s
daughter, Marie, was Einstein’s first love; Einstein’s sister, Maja, would eventually marry Winteler’s son Paul; and his close friend Michele
Besso would marry their eldest daughter, Anna.)""",
"""Einstein would write that two “wonders” deeply affected his early years. The first was his encounter with a compass at age five.
He was mystified that invisible forces could deflect the needle. This would lead to a lifelong fascination with invisible forces.
The second wonder came at age 12 when he discovered a book of geometry, which he devoured, calling it his 'sacred little geometry
book'. Einstein became deeply religious at age 12, even composing several songs in praise of God and chanting religious songs on
the way to school. This began to change, however, after he read science books that contradicted his religious beliefs. This challenge
to established authority left a deep and lasting impression. At the Luitpold Gymnasium, Einstein often felt out of place and victimized
by a Prussian-style educational system that seemed to stifle originality and creativity. One teacher even told him that he would
never amount to anything."""]
And finally we'll intantiate the summarizer:
summarizer = EasySummarizer()
Summarizing with summarize
Now that we have the summarizer instantiated, we are ready to load in a model and compress the text with the built-in summarize()
method.
This method takes in parameters: text
, model_name_or_path
, and mini_batch_size
as well as optional keyword arguments from the Transformers.PreTrainedModel.generate()
method.
Our first example will be with the t5-small
model:
summaries = summarizer.summarize(text = text, model_name_or_path="t5-small", mini_batch_size=1, num_beams = 4, min_length=0, max_length=100, early_stopping=True)
And we can see its output below:
print("Summaries:\n")
for s in summaries['summaries']:
print(s, "\n")
Next we'll use the bart-large-cnn
from Facebook
. We can either simply pass in facebook/bart-large-cnn
to our summarizer
, or we can use the HFModelHub
to go and search for it. Let's try that now:
from adaptnlp import HFModelHub
hub = HFModelHub()
models = hub.search_model_by_name('facebook/bart-large', user_uploaded=True); models
We can see that the first result is our bart-large-cnn
, let's use it:
model = models[0]
And directly pass it into summarizer.summarize
:
summaries = summarizer.summarize(text = text, model_name_or_path=model, mini_batch_size=1, num_beams = 2, min_length=40, max_length=300, early_stopping=True)
And finally we can view the results:
print("Summaries:\n")
for s in summaries['summaries']:
print(s, "\n")