-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1 and v2 E2E tests appear to be stomping on each other #748
Comments
So here's the problem. We give each workflow its own directory named after the workflow in the shared NFS directory. But the artifacts for all the workflows end up being copied to the same GCS bucket for gubernator because they are part of the same prow job.
As a result the junit files end up clobbering each other. |
jlewi
added a commit
to jlewi/testing
that referenced
this issue
Jul 23, 2018
* See kubeflow/trainer#748 * A test can run multiple instances of a workflow but with different parameters. * In this case we need to make sure the junit files and other artifacts copied to GCS for gubernator have unique names. * One way to make this easier is to have copy-artifacts automatically append a unique suffix to each file before copying it to GCS.
This was referenced Jul 23, 2018
jlewi
added a commit
to jlewi/k8s
that referenced
this issue
Jul 24, 2018
* It turns out that although we running the v1alpha2 tests, failures were not being properly reported in Prow because the junit xml files had the same names for the v2 pipeline as the v1 pipeline and the v2 results were being clobbered by v1. * Ensure the artifacts for each run of the E2E test have a unix name based on the TFJob version so that the E2E tests for the different TFJob versions won't clobber each other. * Log the exception in wait for condition. * Need to pass --tfjob_version to the tests so it uses the proper client. * run_gpu and run_test stage need to use a v1alpha2 version of the test workflow. * Update the tf_smoke program to accept chief as a valid worker type so that it works with v1alpha2. * In v1alpha2 we need to terminate all workers. It looks like there was a regression in v1alpha2 kubeflow#751 and we require all workers to terminate as opposed to just worker 0. * Delete a bunch of environments for the test app that shouldn't have been committed. Fix kubeflow#748
k8s-ci-robot
pushed a commit
to kubeflow/testing
that referenced
this issue
Jul 24, 2018
…x. (#183) * Copy artifacts should make the file names unique by appending a suffix. * See kubeflow/trainer#748 * A test can run multiple instances of a workflow but with different parameters. * In this case we need to make sure the junit files and other artifacts copied to GCS for gubernator have unique names. * One way to make this easier is to have copy-artifacts automatically append a unique suffix to each file before copying it to GCS. * Fix lint.
k8s-ci-robot
pushed a commit
that referenced
this issue
Jul 25, 2018
…749) * Prevent multiple versions of an E2E test from clobbering each other. * It turns out that although we running the v1alpha2 tests, failures were not being properly reported in Prow because the junit xml files had the same names for the v2 pipeline as the v1 pipeline and the v2 results were being clobbered by v1. * Ensure the artifacts for each run of the E2E test have a unix name based on the TFJob version so that the E2E tests for the different TFJob versions won't clobber each other. * Log the exception in wait for condition. * Need to pass --tfjob_version to the tests so it uses the proper client. * run_gpu and run_test stage need to use a v1alpha2 version of the test workflow. * Update the tf_smoke program to accept chief as a valid worker type so that it works with v1alpha2. * In v1alpha2 we need to terminate all workers. It looks like there was a regression in v1alpha2 #751 and we require all workers to terminate as opposed to just worker 0. * Delete a bunch of environments for the test app that shouldn't have been committed. Fix #748 * * Use kubeflow/testing@HEAD rather than the hack of pinning PR kubeflow/testing#183 which I was using to test.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's a post submit test.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/kubeflow_tf-operator/745/kubeflow-tf-operator-presubmit/872
We run 2 separate workflows for v1 and v2 but we only see test results for
The logs for the v1alpha2 test appear to indicate a problem running the job.
The text was updated successfully, but these errors were encountered: