Releases: kubeflow/trainer
Releases · kubeflow/trainer
Initial release of the TFJob operator
v0.1.0 (2018-03-29)
Closed issues:
- [v1alpha2] Implement condition update #502
- E2E tests timing out; job appears to remain in running state even though job is done. #500
- [v1alpha2] TF_CONFIG should be configurable by user #499
- [test] All log is 404 in argo #496
- Presubmit shows succeeded, but some test actually failed. #479
- Waiting pods start too long #461
- [test] Add unit test for pkg/controller #455
- Create a suitable OWNERS file in /dashboard #443
- Tide is misconfigured for this repository. #433
- CI failed to setup the cluster #420
- [docs] Add dashboard readme #411
- Make coverall results advisory and not report as failure #406
- Presubmits failing due to lint #404
- [enhancement] Fix go vet errors which not caught by the compilers #395
- User facing website for Kubeflow that details how to choose a stack #371
- [discussion] How to set clusterspec #369
- [enhancement] Rename the cmd/tf_operator to cmd/tf-operator #363
- Local releaser fails due to version_tag #360
- Helm test failure not reported to gubernator #355
- [discussion] Whether to create CRD in helm charts #353
- Should resourcelock be in the same namespace as controller? #352
- Helm test tf-job does not pass validation #351
- Move tensorflow/k8s to kubeflow/tf-operator #350
- Get rid of TensorBoard replica #347
- Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs #346
- Deprecate the ENV MY_POD_NAMESPACE and MY_POD_NAME #341
- [feature] Does tfJob support setting different label/envVar for each worker(replicas >1)? #340
- [Discussion] Time to start tagging releases for the TF operator? #339
- [discussion] Should group name be tensorflow.org or kubeflow.io or kubeflow.org? #337
- dashboard silient error during calling non-existent tfjob #335
- in dashboard, silent error when nonexistent namespace is specified #334
- Deprecate the IsDefaultPS field #329
- [Convention] Replace Tf with TF in CRD #328
- Standardise labels for issues and PRs #326
- Manage Pods directly instead of using Job controllers #325
- TfJobs dashboard not showing jobs #324
- TfJobs dashboard doesn't work with K8s API server proxy or envoy proxy #323
- Recreating a failed/successful job with same name doesn't work #322
- Releaser incorrectly tags images as "dirty" #321
- Reenable the releaser #320
- E2E tests are not isolated #318
- Need to mark prow job as failed if any tests fail #315
- Remove outdated branch wbuchwalter-patch-1 #311
- E2E test delete and recreate job with same name #310
- TrainingJob.reconcile not called periodically #309
- rename master to chief #306
- Assign resource quota for TensorBoard #304
- Jobs evicted for lack of memory, potentially add resource field to tf-job prototype #301
- [Discussion] Operators vs. controller pattern #300
- [bug] Add a default pod template for PS #297
- Bunch of pylint error messages #294
- Fix Head #293
- Operator deployment fails post-v20180108-190394d #292
- Promote last known good release #290
- [bug] metadata.ownerReferences.apiVersion is not set #288
- fail to run example job. invalid job spec: tfReplicaSpec.TfPort can''t be nil #284
- [bug] Build log 404 in https://prow.k8s.io/?repo=tensorflow%2Fk8s #282
- [feature] Seperate the CRD and controller #281
- Gaps in test coverage #280
- Regression in flag name: controller-config-file #279
- [bug] glog before flag.Parse() #275
- build new code to new image and find some problem #274
- Fix the releaser so we can build new images #270
- deploy.py gives gcloud api error '... Version "1.8.1-gke.1" is invalid.' #268
- Pods terminated without waiting #267
- Attach appropriate header (copyright) to go files #266
- suppose i've install the tfjob in my k8s cluster #265
- what's the folder pkg for? #264
- Build failing because of lint issues #256
- what's the main change between version 0.2 and version 0.3? #247
- SetupCluster failures unexpected keyword argument 'client_configuration' #242
- GPU test marked as succeeded but airflow step is failing #240
- Use Kubeflow & ksonnet to install TfJob #239
- tf_smoke.py distributed computing doesn't work on minikube #238
- example-job can not work in private k8s cluster #233
- Test failures aren't properly reported in Gubernator #229
- [CRD] Request for input and output dirs in TFJobSpec #224
- TfJob should be marked as failed if setup fails #218
- panic: runtime error: invalid memory address or nil pointer dereference can not run in k8s 1.8.5 #212
- Rethink the TFJob CRD #209
- ksonnet configs for deploying the TfJob CRD & Controller #208
- Make default TfImage configurable by users #207
- refactor the TfJob to use Informer and Controller #206
- Use Argo workflow engine for CI/CD or releases #205
- Potential issue with Tensorboard / value of simple best-practices example with tboard #202
- Investigate using buildah to...