Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc&scripts for msmarco experiments on tpu #100

Merged
merged 4 commits into from
Oct 20, 2020
Merged

add doc&scripts for msmarco experiments on tpu #100

merged 4 commits into from
Oct 20, 2020

Conversation

MXueguang
Copy link
Member

No description provided.

```
conda install -c conda-forge httptools jsonnet --yes
pip install tensorflow==1.15 tensorflow-text==1.15 t5[gcp]
git clone https://github.com/ronakice/mesh.git
Copy link
Member

@rodrigonogueira4 rodrigonogueira4 Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the one in https://github.com/castorini/mesh? Also, make sure it is working? It's been a long time since I used it..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kelvin-jiang made some change for the mesh to fit tf2.0 recently. I think it is safer for him to update this part later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the same changes (that were in ronak's/castorini's fork) on top of the latest https://github.com/tensorflow/mesh.git, which allows us to use TF 2.0 like so:

conda init
conda create --y --name py36 python=3.6
conda activate py36
conda install -c conda-forge httptools jsonnet --yes
pip install tensorflow tensorflow-text t5[gcp]
git clone -b tf2 https://github.com/kelvin-jiang/mesh.git
pip install --editable mesh
./ctpu up --name=tpu-$URA-$EXPNO --project=$PROJECT --zone=europe-west4-a --tpu-size=v3-8 --tpu-only --noconf

I'm not sure if we've all adopted/used this, so I don't know if we should include it in these docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Could you please merge your code into https://github.com/castorini/mesh?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have write access to repo, I opened a PR instead! Let me know when it's merged (or if you have any concerns) and I can try running with it once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! @MXueguang, could you please update the instructions to point to the castorini repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigonogueira4 do we still keep tf1.15 for now or change to tf2.0

Copy link
Member

@rodrigonogueira4 rodrigonogueira4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I've added a couple of suggestions. Once this is checked in, let's adds instructions for training?

@rodrigonogueira4
Copy link
Member

@MXueguang, would let me know when this PR is ready for another review?

@ronakice ronakice self-requested a review October 17, 2020 18:00
@ronakice
Copy link
Member

Tag me too, when you are done :), thanks for this!

@rodrigonogueira4
Copy link
Member

@MXueguang, I normally had this memory error when the input file had too many examples. Hence, maybe give as input a file with only a few lines, instead of 1M, and see if it goes away.

If that does not work, maybe try doubling the VM's RAM memory?

@kelvin-jiang
Copy link
Member

Yeah I've also occasionally run into that error. It usually resolved itself when I re-setup my VM instance, but @rodrigonogueira4's comment is probably a more robust solution ^^

@MXueguang
Copy link
Member Author

yeah, i think the 16gb ram can just fit the requirement. If there is some other process using ram, it might OOM.

Also, @rodrigonogueira4 the output scores by tf2 is slightly different from tf1.15. I am still waiting for all prediction finish to see if the score stays same. ill let you know.

@MXueguang
Copy link
Member Author

tf1.15
#####################
MRR @10: 0.3980858575521913
QueriesRanked: 6980
#####################

tf2.3
#####################
MRR @10: 0.3983799517896949
QueriesRanked: 6980
#####################

The score just changed a little bit (increased slightly). So i think we are good to update to tf2. @ronakice @rodrigonogueira4

@rodrigonogueira4
Copy link
Member

tf1.15
#####################
MRR @10: 0.3980858575521913
QueriesRanked: 6980
#####################

tf2.3
#####################
MRR @10: 0.3983799517896949
QueriesRanked: 6980
#####################

The score just changed a little bit (increased slightly). So i think we are good to update to tf2. @ronakice @rodrigonogueira4

Yeah, I think so!

@rodrigonogueira4
Copy link
Member

Can I merge this PR?

Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rodrigonogueira4 rodrigonogueira4 merged commit f552b5d into castorini:master Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants