Using the Question Answering API within AdaptNLP

Question Answering

Question Answering is the NLP task of producing a legible answer from being provided two text inputs: the context and the question in regards to the context.

Examples of Question Answering models are span-based models that output a start and end index that outline the relevant "answer" from the context provided. With these models, we can extract answers from various questions and queries regarding any unstructured text.

Below, we'll walk through how we can use AdaptNLP's EasyQuestionAnswering module to extract span-based text answers from unstructured text using state-of-the-art question answering models.

Getting Started

You can use EasyQuestionAnswering to run span-based question answering models.

Providing a context and query, we get an output of top n_best_size answer predictions along with token span indices and probability scores.

First we'll import the EasyQuestionAnswering class from AdaptNLP and instantiate it:

from adaptnlp import EasyQuestionAnswering
qa_model = EasyQuestionAnswering()

Next we'll write some example context to use:

context = """Amazon.com, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, 
Washington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. 
It is considered one of the Big Four technology companies along with Google, Apple, and Facebook.[7][8][9] 
Amazon is known for its disruption of well-established industries through technological innovation and mass 
scale.[10][11][12] It is the world's largest e-commerce marketplace, AI assistant provider, and cloud computing 
platform[13] as measured by revenue and market capitalization.[14] Amazon is the largest Internet company by 
revenue in the world.[15] It is the second largest private employer in the United States[16] and one of the world's 
most valuable companies. Amazon is the second largest technology company by revenue. Amazon was founded by Jeff Bezos 
on July 5, 1994, in Bellevue, Washington. The company initially started as an online marketplace for books but later 
expanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon 
surpassed Walmart as the most valuable retailer in the United States by market capitalization.[17] In 2017, Amazon 
acquired Whole Foods Market for $13.4 billion, which vastly increased Amazon's presence as a brick-and-mortar 
retailer.[18] In 2018, Bezos announced that its two-day delivery service, Amazon Prime, had surpassed 100 million 
subscribers worldwide
"""

And then finally we'll query the data with the predict_qa method.

For our example we'll run inference on Transformer's DistilBERT model which was fine-tuned on the SQUAD dataset:

results = qa_model.predict_qa(query="What does Amazon do?", context=context, n_best_size=5, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 40.63it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 5146.39it/s]

And we can take a peek at the results:

results
{'queries': ['What does Amazon do?'],
 'best_answers': [OrderedDict([(0,
                'disruption of well-established industries'),
               (1, 'disruption'),
               (2, 'its disruption of well-established industries'),
               (3, 'its disruption'),
               (4, 'Amazon is known for its disruption')])]}
results['best_answers']
[OrderedDict([(0, 'disruption of well-established industries'),
              (1, 'disruption'),
              (2, 'its disruption of well-established industries'),
              (3, 'its disruption'),
              (4, 'Amazon is known for its disruption')])]

We can also pass in multiple questions to provide even more context:

questions = ["What does Amazon do?",
             "What happened July 5, 1994?",
             "How much did Amazon acquire Whole Foods for?"]

Just make sure to pass in your context multiple times:

results = qa_model.predict_qa(
    query=questions, 
    context=[context]*3,
    mini_batch_size=1, 
    model_name_or_path="distilbert-base-uncased-distilled-squad"
)
convert squad examples to features: 100%|██████████| 3/3 [00:00<00:00, 38.89it/s]
add example index and unique id: 100%|██████████| 3/3 [00:00<00:00, 15439.16it/s]
Warning! `n_best_size` 5 is greater than the actual number of answers 4, only returning 4 answers

Our new results:

results['best_answers']
[OrderedDict([(0, 'disruption of well-established industries'),
              (1, 'disruption'),
              (2, 'its disruption of well-established industries'),
              (3, 'its disruption'),
              (4, 'Amazon is known for its disruption')]),
 OrderedDict([(0, 'Jeff Bezos'),
              (1, 'Amazon was founded by Jeff Bezos'),
              (2, 'founded by Jeff Bezos'),
              (3, 'Bezos')]),
 OrderedDict([(0, '$13.4 billion'),
              (1, '13.4 billion'),
              (2, '$13.4 billion,'),
              (3, '13.4 billion,'),
              (4, '$')])]

If we want more information, we can pass in a DetailLevel to ask for (you can also just use the strings low, medium, and high).

This will instead return a dictionary of various items to look at. By default our results earlier were with the DetailLevel.Low

from adaptnlp import DetailLevel
results = qa_model.predict_qa(
    query="What does Amazon do?",
    context=context,
    model_name_or_path="distilbert-base-uncased-distilled-squad",
    detail_level=DetailLevel.Medium
)
convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 57.00it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 5077.85it/s]
results
{'queries': ['What does Amazon do?'],
 'best_answers': [OrderedDict([(0,
                'disruption of well-established industries'),
               (1, 'disruption'),
               (2, 'its disruption of well-established industries'),
               (3, 'its disruption'),
               (4, 'Amazon is known for its disruption')])],
 'pairings': OrderedDict([('What does Amazon do?',
               (('disruption of well-established industries',
                 'disruption',
                 'its disruption of well-established industries',
                 'its disruption',
                 'Amazon is known for its disruption'),
                tensor([0.6033, 0.2977, 0.0585, 0.0289, 0.0116])))]),
 'context': "Amazon.com, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, \nWashington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. \nIt is considered one of the Big Four technology companies along with Google, Apple, and Facebook.[7][8][9] \nAmazon is known for its disruption of well-established industries through technological innovation and mass \nscale.[10][11][12] It is the world's largest e-commerce marketplace, AI assistant provider, and cloud computing \nplatform[13] as measured by revenue and market capitalization.[14] Amazon is the largest Internet company by \nrevenue in the world.[15] It is the second largest private employer in the United States[16] and one of the world's \nmost valuable companies. Amazon is the second largest technology company by revenue. Amazon was founded by Jeff Bezos \non July 5, 1994, in Bellevue, Washington. The company initially started as an online marketplace for books but later \nexpanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon \nsurpassed Walmart as the most valuable retailer in the United States by market capitalization.[17] In 2017, Amazon \nacquired Whole Foods Market for $13.4 billion, which vastly increased Amazon's presence as a brick-and-mortar \nretailer.[18] In 2018, Bezos announced that its two-day delivery service, Amazon Prime, had surpassed 100 million \nsubscribers worldwide\n"}

As we can see, the medium detail level will return not only our queries and answers, but also a pairing with the question, its top answers, and their softmax'd probabilities.

Along with this it will return the context passed into the question.

And now let's look at the highest detail level:

results = qa_model.predict_qa(
    query="What does Amazon do?",
    context=context,
    model_name_or_path="distilbert-base-uncased-distilled-squad",
    detail_level='high'
)
convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 53.25it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 5592.41it/s]
results
{'queries': ['What does Amazon do?'],
 'best_answers': [OrderedDict([(0,
                'disruption of well-established industries'),
               (1, 'disruption'),
               (2, 'its disruption of well-established industries'),
               (3, 'its disruption'),
               (4, 'Amazon is known for its disruption')])],
 'pairings': OrderedDict([('What does Amazon do?',
               (('disruption of well-established industries',
                 'disruption',
                 'its disruption of well-established industries',
                 'its disruption',
                 'Amazon is known for its disruption'),
                tensor([0.6033, 0.2977, 0.0585, 0.0289, 0.0116])))]),
 'context': "Amazon.com, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, \nWashington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. \nIt is considered one of the Big Four technology companies along with Google, Apple, and Facebook.[7][8][9] \nAmazon is known for its disruption of well-established industries through technological innovation and mass \nscale.[10][11][12] It is the world's largest e-commerce marketplace, AI assistant provider, and cloud computing \nplatform[13] as measured by revenue and market capitalization.[14] Amazon is the largest Internet company by \nrevenue in the world.[15] It is the second largest private employer in the United States[16] and one of the world's \nmost valuable companies. Amazon is the second largest technology company by revenue. Amazon was founded by Jeff Bezos \non July 5, 1994, in Bellevue, Washington. The company initially started as an online marketplace for books but later \nexpanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon \nsurpassed Walmart as the most valuable retailer in the United States by market capitalization.[17] In 2017, Amazon \nacquired Whole Foods Market for $13.4 billion, which vastly increased Amazon's presence as a brick-and-mortar \nretailer.[18] In 2018, Bezos announced that its two-day delivery service, Amazon Prime, had surpassed 100 million \nsubscribers worldwide\n",
 'squad_example': [<transformers.data.processors.squad.SquadExample at 0x7f1d0a7aac40>],
 'n_best_json': OrderedDict([('0',
               [OrderedDict([('text',
                              'disruption of well-established industries'),
                             ('probability', 0.6033453867567354),
                             ('start_logit', 6.112513),
                             ('end_logit', 4.161786),
                             ('start_index', 45),
                             ('end_index', 48)]),
                OrderedDict([('text', 'disruption'),
                             ('probability', 0.2976593481770998),
                             ('start_logit', 6.112513),
                             ('end_logit', 3.4552462),
                             ('start_index', 45),
                             ('end_index', 45)]),
                OrderedDict([('text',
                              'its disruption of well-established industries'),
                             ('probability', 0.0585262472890109),
                             ('start_logit', 3.779499),
                             ('end_logit', 4.161786),
                             ('start_index', 44),
                             ('end_index', 48)]),
                OrderedDict([('text', 'its disruption'),
                             ('probability', 0.02887381755406165),
                             ('start_logit', 3.779499),
                             ('end_logit', 3.4552462),
                             ('start_index', 44),
                             ('end_index', 45)]),
                OrderedDict([('text', 'Amazon is known for its disruption'),
                             ('probability', 0.011595200223092338),
                             ('start_logit', 2.8671546),
                             ('end_logit', 3.4552462),
                             ('start_index', 40),
                             ('end_index', 45)])])])}

The DetailLevel.High option will also return the squad_example result, as well as the original n_best_json with detailed information about each predicted option