-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add doc&scripts for msmarco experiments on tpu #100
add doc&scripts for msmarco experiments on tpu #100
Conversation
docs/experiments-monot5-tpu.md
Outdated
``` | ||
conda install -c conda-forge httptools jsonnet --yes | ||
pip install tensorflow==1.15 tensorflow-text==1.15 t5[gcp] | ||
git clone https://github.com/ronakice/mesh.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the one in https://github.com/castorini/mesh
? Also, make sure it is working? It's been a long time since I used it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelvin-jiang made some change for the mesh to fit tf2.0 recently. I think it is safer for him to update this part later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the same changes (that were in ronak's/castorini's fork) on top of the latest https://github.com/tensorflow/mesh.git, which allows us to use TF 2.0 like so:
conda init
conda create --y --name py36 python=3.6
conda activate py36
conda install -c conda-forge httptools jsonnet --yes
pip install tensorflow tensorflow-text t5[gcp]
git clone -b tf2 https://github.com/kelvin-jiang/mesh.git
pip install --editable mesh
./ctpu up --name=tpu-$URA-$EXPNO --project=$PROJECT --zone=europe-west4-a --tpu-size=v3-8 --tpu-only --noconf
I'm not sure if we've all adopted/used this, so I don't know if we should include it in these docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Could you please merge your code into https://github.com/castorini/mesh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have write access to repo, I opened a PR instead! Let me know when it's merged (or if you have any concerns) and I can try running with it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! @MXueguang, could you please update the instructions to point to the castorini repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodrigonogueira4 do we still keep tf1.15 for now or change to tf2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I've added a couple of suggestions. Once this is checked in, let's adds instructions for training?
@MXueguang, would let me know when this PR is ready for another review? |
Tag me too, when you are done :), thanks for this! |
@MXueguang, I normally had this memory error when the input file had too many examples. Hence, maybe give as input a file with only a few lines, instead of 1M, and see if it goes away. If that does not work, maybe try doubling the VM's RAM memory? |
Yeah I've also occasionally run into that error. It usually resolved itself when I re-setup my VM instance, but @rodrigonogueira4's comment is probably a more robust solution ^^ |
yeah, i think the 16gb ram can just fit the requirement. If there is some other process using ram, it might OOM. Also, @rodrigonogueira4 the output scores by tf2 is slightly different from tf1.15. I am still waiting for all prediction finish to see if the score stays same. ill let you know. |
tf1.15 tf2.3 The score just changed a little bit (increased slightly). So i think we are good to update to tf2. @ronakice @rodrigonogueira4 |
Yeah, I think so! |
Can I merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.