Replication experiement results for CovidQA, MSMARCO document and MSMARCO subset #102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Colab Environment:
OS: Ubuntu-18.04.3 LTS
Java: 11.0.8
Python: 3.6.9
GPU: Tesla P100
CovidQA
With Random
NL Question:
Keyword Query:
With BM25:
NL Question:
Keyword Query:
With MonoT5:
NL Question:
Keyword Query:
MSMARCO Document
First Half:
Second Half:
MSMARCO Passage Subset:
With monoBERT:
With monoT5:
Comments
No issues were encountered, but for the CovidQA dataset, it might be beneficial to add a section on the data preparation. I figured out how to get the
indexes/lucene-index-cord19-paragraph-2020-05-12
file, but it required some poking around the project which could probably be summarized quite quickly.I'll also try replicating the entire dev set for MS MARCO Passage, but since it needs to run for such a long time I'll try to fit that in later.