Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare cloud dataset #269

Merged
merged 3 commits into from
Aug 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,14 @@ English tutorials(comming soon...)
To test or visit the website, find out the kubernetes ingress IP
addresses, or the NodePort.

Then open your browser and visit http://<ingress-ip-address>, or
http://<any-node-ip-address>:<NodePort>
Then open your browser and visit `http://<ingress-ip-address>`, or
`http://<any-node-ip-address>:<NodePort>`

- Prepare public dataset

You can create a Kubernetes Job for preparing the public dataset and cluster trainer files.
You can create a Kubernetes Job for preparing the public cloud dataset with RecordIO files. You should modify the YAML file as your environment:
- `<DATACENTER>`, Your cluster datacenter
- `<MONITOR_ADDR>`, Ceph monitor address
```bash
kubectl create -f k8s/prepare_dataset.yaml
```
Expand Down
13 changes: 5 additions & 8 deletions k8s/prepare_dataset.yaml
Original file line number Diff line number Diff line change
@@ -1,29 +1,26 @@
apiVersion: batch/v1
kind: Job
metadata:
name: prepare-dataset
name: prepare-cloud-dataset
spec:
template:
metadata:
name: prepare-dataset
name: prepare-cloud-dataset
spec:
volumes:
- name: data-storage
cephfs:
monitors:
- 172.19.32.166:6789
- <MONITOR_ADDR>
path: "/public"
user: "admin"
secretRef:
name: ceph-secret
containers:
- name: prepare
image: yancey1989/paddlecloud-job
env:
- name: CURRENT_DATACENTER
value: "meiyan"
command: ["python", "-c", "\"import pcloud.dataset.common as common; common.fetch_all()\""]
command: ["sh", "-c", "python -c \"import paddle.v2.dataset as dataset; dataset.common.convert('/pfs/<DATACENTER>/public/dataset')\""]
volumeMounts:
- name: data-storage
mountPath: /pfs/meiyan/public
mountPath: /pfs/<DATACENTER>/public
restartPolicy: Never