Simplify monoT5 and monoBERT boilerplate #80

lintool · 2020-09-09T13:15:49Z

There's a lot of boilerplate here: https://github.com/castorini/pygaggle#a-simple-reranking-example

Can we fold all of that into the constructor of the class? E.g., so we're left with:

reranker =  monoT5()

or

reranker =  monoBERT()

Make model_name, tokenizer_name, etc. configurable with sensical defaults.

So simple reranking gets boiled down to

from pyserini.search import SimpleSearcher
from pygaggle.rerank.base import hits_to_texts

query = Query('who proposed the geocentric theory')
searcher = SimpleSearcher('/path/to/msmarco/index/')
reranker = monoBERT()

hits = searcher.search(query.text)
reranked = reranker.rerank(query, hits_to_texts(hits))
reranked.sort(key=lambda x: x.score, reverse=True)

@rodrigonogueira4 @rodrigonogueira4 thoughts?

The text was updated successfully, but these errors were encountered:

ronakice · 2020-09-09T13:21:49Z

Yeah, I think we should do that!

rodrigonogueira4 · 2020-09-10T16:02:56Z

I agree!

yuxuan-ji · 2020-09-10T18:22:28Z

EDIT: Outdated comment

I'm a bit confused by the naming of this, is the goal to have a set of predefined rerankers w/ defaults
ex:

from pyaggle.rerank.pretrained import monoBERT, monoT5
reranker = monoBERT() or monoT5() -> Reranker # defaults to castorini/monobert-large-msmarco and castorini/monot5-base-msmarco

# similar to how huggingface transformers has:
AutoModel.from_pretrained('monoBERT')

or just to create general constructors for each reranker similar to what's in:
https://github.com/castorini/pygaggle/blob/master/pygaggle/run/evaluate_document_ranker.py#L103
i.e.

def construct_seq_class_transformer(options: DocumentRankingEvaluationOptions
                                    ) -> Reranker:
    model = AutoModelForSequenceClassification.from_pretrained(options.model, from_tf=options.from_tf)
    device = torch.device(options.device)
    model = model.to(device).eval()
    tokenizer = AutoTokenizer.from_pretrained(options.tokenizer_name)
    return SequenceClassificationTransformerReranker(model, tokenizer)

TLDR why name monoBERT vs construct_seq_class_transformer

yuxuan-ji mentioned this issue Sep 10, 2020

Simplify boilerplate for monoT5 and monoBERT #83

Merged

ronakice closed this as completed in #83 Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify monoT5 and monoBERT boilerplate #80

Simplify monoT5 and monoBERT boilerplate #80

lintool commented Sep 9, 2020

ronakice commented Sep 9, 2020

rodrigonogueira4 commented Sep 10, 2020

yuxuan-ji commented Sep 10, 2020 •

edited

Loading

Simplify monoT5 and monoBERT boilerplate #80

Simplify monoT5 and monoBERT boilerplate #80

Comments

lintool commented Sep 9, 2020

ronakice commented Sep 9, 2020

rodrigonogueira4 commented Sep 10, 2020

yuxuan-ji commented Sep 10, 2020 • edited Loading

yuxuan-ji commented Sep 10, 2020 •

edited

Loading