Using the Question Answering API within AdaptNLP

Question Answering

Question Answering is the NLP task of producing a legible answer from being provided two text inputs: the context and the question in regards to the context.

Examples of Question Answering models are span-based models that output a start and end index that outline the relevant "answer" from the context provided. With these models, we can extract answers from various questions and queries regarding any unstructured text.

Below, we'll walk through how we can use AdaptNLP's EasyQuestionAnswering module to extract span-based text answers from unstructured text using state-of-the-art question answering models.

Getting Started

You can use EasyQuestionAnswering to run span-based question answering models.

Providing a context and query, we get an output of top n_best_size answer predictions along with token span indices and probability scores.

First we'll import the EasyQuestionAnswering class from AdaptNLP and instantiate it:

from adaptnlp import EasyQuestionAnswering
qa_model = EasyQuestionAnswering()

Next we'll write some example context to use:

context = """, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, 
Washington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. 
It is considered one of the Big Four technology companies along with Google, Apple, and Facebook.[7][8][9] 
Amazon is known for its disruption of well-established industries through technological innovation and mass 
scale.[10][11][12] It is the world's largest e-commerce marketplace, AI assistant provider, and cloud computing 
platform[13] as measured by revenue and market capitalization.[14] Amazon is the largest Internet company by 
revenue in the world.[15] It is the second largest private employer in the United States[16] and one of the world's 
most valuable companies. Amazon is the second largest technology company by revenue. Amazon was founded by Jeff Bezos 
on July 5, 1994, in Bellevue, Washington. The company initially started as an online marketplace for books but later 
expanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon 
surpassed Walmart as the most valuable retailer in the United States by market capitalization.[17] In 2017, Amazon 
acquired Whole Foods Market for $13.4 billion, which vastly increased Amazon's presence as a brick-and-mortar 
retailer.[18] In 2018, Bezos announced that its two-day delivery service, Amazon Prime, had surpassed 100 million 
subscribers worldwide

And then finally we'll query the data with the predict_qa method.

For our example we'll run inference on Transformer's DistilBERT model which was fine-tuned on the SQUAD dataset:

top_prediction, all_nbest_json = qa_model.predict_qa(query="What does Amazon do?", context=context, n_best_size=5, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 35.47it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 4702.13it/s]

And we can take a peek at the results:

disruption of well-established industries
[OrderedDict([('text', 'disruption of well-established industries'),
              ('probability', 0.6033452454300696),
              ('start_logit', 6.112517),
              ('end_logit', 4.161786),
              ('start_index', 45),
              ('end_index', 48)]),
 OrderedDict([('text', 'disruption'),
              ('probability', 0.2976601300650915),
              ('start_logit', 6.112517),
              ('end_logit', 3.4552488),
              ('start_index', 45),
              ('end_index', 45)])]

We can also pass in multiple questions to provide even more context:

questions = ["What does Amazon do?",
             "What happened July 5, 1994?",
             "How much did Amazon acquire Whole Foods for?"]

Just make sure to pass in your context multiple times:

top_prediction, all_nbest_json = qa_model.predict_qa(query=questions, context=[context]*3, n_best_size=5, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
convert squad examples to features: 100%|██████████| 3/3 [00:00<00:00, 46.30it/s]
add example index and unique id: 100%|██████████| 3/3 [00:00<00:00, 15847.50it/s]

Our new results:

OrderedDict([('0', 'disruption of well-established industries'), ('1', 'Jeff Bezos'), ('2', '$13.4 billion')])
[OrderedDict([('text', 'Jeff Bezos'),
              ('probability', 0.5127517857731716),
              ('start_logit', 2.8254287),
              ('end_logit', 0.6868366),
              ('start_index', 119),
              ('end_index', 120)]),
 OrderedDict([('text', 'Amazon was founded by Jeff Bezos'),
              ('probability', 0.47153574087191846),
              ('start_logit', 2.7416315),
              ('end_logit', 0.6868366),
              ('start_index', 115),
              ('end_index', 120)])]