Question Answering is the NLP task of producing a legible answer from being provided two text inputs: the context and the question in regards to the context.
Examples of Question Answering models are span-based models that output a start and end index that outline the relevant "answer" from the context provided. With these models, we can extract answers from various questions and queries regarding any unstructured text.
Below, we'll walk through how we can use AdaptNLP's
EasyQuestionAnswering module to extract span-based text answers from unstructured text using state-of-the-art question answering models.
You can use
EasyQuestionAnswering to run span-based question answering models.
query, we get an output of top
n_best_size answer predictions along with token span indices and probability scores.
First we'll import the EasyQuestionAnswering class from AdaptNLP and instantiate it:
from adaptnlp import EasyQuestionAnswering qa_model = EasyQuestionAnswering()
Next we'll write some example context to use:
context = """Amazon.com, Inc. (/ˈæməzɒn/), is an American multinational technology company based in Seattle, Washington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is considered one of the Big Four technology companies along with Google, Apple, and Facebook. Amazon is known for its disruption of well-established industries through technological innovation and mass scale. It is the world's largest e-commerce marketplace, AI assistant provider, and cloud computing platform as measured by revenue and market capitalization. Amazon is the largest Internet company by revenue in the world. It is the second largest private employer in the United States and one of the world's most valuable companies. Amazon is the second largest technology company by revenue. Amazon was founded by Jeff Bezos on July 5, 1994, in Bellevue, Washington. The company initially started as an online marketplace for books but later expanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon surpassed Walmart as the most valuable retailer in the United States by market capitalization. In 2017, Amazon acquired Whole Foods Market for $13.4 billion, which vastly increased Amazon's presence as a brick-and-mortar retailer. In 2018, Bezos announced that its two-day delivery service, Amazon Prime, had surpassed 100 million subscribers worldwide """
And then finally we'll query the data with the
For our example we'll run inference on Transformer's DistilBERT model which was fine-tuned on the SQUAD dataset:
top_prediction, all_nbest_json = qa_model.predict_qa(query="What does Amazon do?", context=context, n_best_size=5, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 35.47it/s] add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 4702.13it/s]
And we can take a peek at the results:
disruption of well-established industries
[OrderedDict([('text', 'disruption of well-established industries'), ('probability', 0.6033452454300696), ('start_logit', 6.112517), ('end_logit', 4.161786), ('start_index', 45), ('end_index', 48)]), OrderedDict([('text', 'disruption'), ('probability', 0.2976601300650915), ('start_logit', 6.112517), ('end_logit', 3.4552488), ('start_index', 45), ('end_index', 45)])]
We can also pass in multiple questions to provide even more context:
questions = ["What does Amazon do?", "What happened July 5, 1994?", "How much did Amazon acquire Whole Foods for?"]
Just make sure to pass in your context multiple times:
top_prediction, all_nbest_json = qa_model.predict_qa(query=questions, context=[context]*3, n_best_size=5, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
convert squad examples to features: 100%|██████████| 3/3 [00:00<00:00, 46.30it/s] add example index and unique id: 100%|██████████| 3/3 [00:00<00:00, 15847.50it/s]
Our new results:
OrderedDict([('0', 'disruption of well-established industries'), ('1', 'Jeff Bezos'), ('2', '$13.4 billion')])
[OrderedDict([('text', 'Jeff Bezos'), ('probability', 0.5127517857731716), ('start_logit', 2.8254287), ('end_logit', 0.6868366), ('start_index', 119), ('end_index', 120)]), OrderedDict([('text', 'Amazon was founded by Jeff Bezos'), ('probability', 0.47153574087191846), ('start_logit', 2.7416315), ('end_logit', 0.6868366), ('start_index', 115), ('end_index', 120)])]