Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink Kueue.Queue end purpose #113

Closed
ArangoGutierrez opened this issue Mar 10, 2022 · 15 comments
Closed

Rethink Kueue.Queue end purpose #113

ArangoGutierrez opened this issue Mar 10, 2022 · 15 comments
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.

Comments

@ArangoGutierrez
Copy link
Contributor

After #87 , there is not much point on having a controller for kueue.queue.

  • PendingWorkloads are now tracked under ClusterQueue
  • AddmittedWorkloads are now tracked under ClusterQueue
  • Kueue.Queue.Spec is basically a pointer to ClusterQueue

If someone wants to monitor/track a workload, this would have to be done against ClusterQueue, due the heap being there. We should consider removing pkg/controller/core/queue.controller.go and api/v1aplha1/queue_types.go

@ArangoGutierrez ArangoGutierrez added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 10, 2022
@alculquicondor
Copy link
Contributor

alculquicondor commented Mar 10, 2022

#80 (comment)

Also discussed in http://bit.ly/kueue-apis :)

@ArangoGutierrez
Copy link
Contributor Author

I agree with the fact that having kueue.queue at the user lvl can introduce security risks or resource hijacking risks, a user could create 1 queue per job, and skip the line.

If Heap is at ClusterQueue, let's work on enhancing that model

@alculquicondor
Copy link
Contributor

Eventually, there will be a heap in each Queue too. And only the workloads that, added up, respect the queue limits will enter the ClusterQueue.

The use case is explained in the comment linked above.

@alculquicondor
Copy link
Contributor

For now, the only use of kueue.Queue is discoverability.

@ArangoGutierrez ArangoGutierrez changed the title Delete Kueue.Queue Rethink Kueue.Queue end purpose Mar 10, 2022
@ArangoGutierrez
Copy link
Contributor Author

re-Titled to better reflect this conversation

@ahg-g
Copy link
Contributor

ahg-g commented Mar 15, 2022

Queue can have three purposes:

  1. Discoverability; tenants will only have list access to their own namespaces only.
  2. A source of defaults for workloads (e.g., default priority, default max runtime etc.)
  3. A way to define usage limits at "experiment" level

@ArangoGutierrez
Copy link
Contributor Author

ArangoGutierrez commented Mar 15, 2022

I have been thinking about this as well, and here are my thoughts:

For 1 ClusterQueue
A user can submit multiple batches (Currently Queues) that are composed by job steps. (currently QueuedWorkloads)

Example:
A bioInfo user wants to run some genetics analysis on 3 subjects(Dog, Cat, Duck), and each analysis that requires 3 steps, A,B,C that depend on each other.
The user then creates 3 queuedworkloads

  dog  := queue{queuedworkloadStepA,queuedworkloadStepB, queuedworkloadStepC}
  cat  := queuedworkload{queuedworkloadStepA,queuedworkloadStepB, queuedworkloadStepC}
  duck  := queuedworkload{queuedworkloadStepA,queuedworkloadStepB, queuedworkloadStepC}

And queue them to the cluster queue

AddOrUpdateClusterQueue{dog}
AddOrUpdateClusterQueue{cat}
AddOrUpdateClusterQueue{duck}

Then the Jobs reconcile will grab each Queue head and push it to the ClusterQueue, where it will go to the Scheduler reconciler and jump into the APIServer. This gives a sense of queuing independence to every queue, maybe the dog queue is applied first, but the cat queue has a "smaller" data set and will finish first as example.

The nice thing of looking it like this, is that a Step or as is known currently QueuedWorkload could be many batchv1.Job that need to run in parallel (e.g MPI). Those steps that don't have parallel dependencies could be then set as another QueuedWorkload object inside a Queue.

Right now you can create a Job without creating a QueuedWorkload, the job reconciler will generate a generic one for you, what if, a user could create a single QueuedWorkload, and QueuedWorkload reconciler will create a generic Queue (a Queue of 1 ) for that single step QueuedWorkload.

This adds a more Batch like feeling to Kueue.

Thoughts?

@ahg-g
Copy link
Contributor

ahg-g commented Mar 16, 2022

I am not sure I follow your proposal and what exactly is being suggested we change. Is it that a Queue could represent a workflow? do you find the list in #113 (comment) not satisfactory in explaining the purpose of Queue?

@ArangoGutierrez
Copy link
Contributor Author

Maybe I was a bit confused, but to my point, is that I would like to add to your list, that queues are a way to define workload steps, kind of job dependency. "Do this, then this" . the list is satisfactory, I was just adding to it.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 16, 2022

We discussed initially that workflows are not in Kueue scope because there are multiple commonly used frameworks for that (like Argo and NextFlow).

We could have a Queue configuration to act the way you described (e.g., strictly one workload should run at a time), but I fear this will grow into more complex requirements that turns it into a workflow manager.

But, if we think of this in the context of better support for existing workflow managers (#74), then perhaps we can strike that balance between separation of concern and a rich feature set.

@ArangoGutierrez
Copy link
Contributor Author

Sounds good. let's re circle to this topic after 0.0.1

@ArangoGutierrez
Copy link
Contributor Author

I think my head was into workloads when I commented here. now that I give it an extra thought

@ahg-g
Copy link
Contributor

ahg-g commented Apr 4, 2022

can we close this? it seems to me Queue has a well defined scope

@ArangoGutierrez
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link
Contributor

@ArangoGutierrez: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

No branches or pull requests

4 participants