Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Workflow Interface] Refactor FLSpec and Runtime to enhance modularity #1363

Open
wants to merge 60 commits into
base: develop
Choose a base branch
from

Conversation

ishant162
Copy link
Collaborator

@ishant162 ishant162 commented Feb 11, 2025

Background
Proposal: #1317

Change Description

  • Updates to FLSpec, LocalRuntime, FederatedRuntime and Aggregator classes.
  • Shift the run() method implementation from FLSpec to Runtime.
  • Runtime now passes collaborator information to FLSpec, enabling foreach operations, necessitating changes to the execution flow.
  • Updated WorkspaceExport functionality to align with the refactor changes.
  • Documentation has been updated to align with the refactored implementation.

Verification

  • Verified all testcases from tests/github/experimental/workflow/LocalRuntime.
  • Verified all testcases from tests/github/experimental/workflow/FederatedRuntime.
  • Verified Tutorials 101_MNIST of LocalRuntime and 101_MNIST, 301_MNIST_Watermarking of FederatedRuntime.

File Modifications

  • docs/about/features_index/workflowinterface.rst
  • openfl-tutorials/experimental/workflow/101_MNIST.ipynb
  • openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/workspace/101_MNIST_FederatedRuntime.ipynb
  • openfl-tutorials/experimental/workflow/FederatedRuntime/301_MNIST_Watermarking/workspace/MNIST_Watermarking.ipynb
  • openfl/experimental/workflow/component/aggregator/aggregator.py
  • openfl/experimental/workflow/interface/fl_spec.py
  • openfl/experimental/workflow/runtime/federated_runtime.py
  • openfl/experimental/workflow/runtime/local_runtime.py
  • openfl/experimental/workflow/utilities/runtime_utils.py
  • openfl/experimental/workflow/workspace_export/export.py
  • All testcases from tests/github/experimental/workflow/LocalRuntime.
  • All testcases from tests/github/experimental/workflow/FederatedRuntime.

Changelog:

  • FLSpec Refactor.
  • LocalRuntime Refactor.
  • Update LocalRuntime Tutorial (101_MNIST) & Testcases.
  • FederatedRuntime Refactor.
  • Update FederatedRuntime Tutorial (101_MNIST, 301_MNIST_Watermarking) & Testcases.

Known Issues

  • With ray backend, prints are appearing in previous cell (observed in the openfl/develop branch)

Note:
Following changes will be done in a follow-up PR:

  • Remaining LocalRuntime & FederatedRuntime Tutorials.
  • Validate FederatedRuntime on distributed infrastructure.
  • Update CI Pipeline for Workflow API.

Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Signed-off-by: Ishant Thakare <[email protected]>
Copy link
Collaborator

@teoparvanov teoparvanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments/questions from my first read-through. It's looking good overall!

Copy link
Collaborator

@teoparvanov teoparvanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks for addressing my comments, @ishant162 !

Copy link
Collaborator

@theakshaypant theakshaypant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Are the failed CIs out of scope for this PR?

@ishant162
Copy link
Collaborator Author

ishant162 commented Feb 27, 2025

Hi @theakshaypant,

The following pipeline has been tested locally with the changes and should work once the PR is committed:

  • OpenFL PR Pipeline / Workflow Interface 101 MNIST Notebook / WF Local Without TLS (pull_request)

The following pipeline requires rework to adapt test cases and will be addressed in a follow-up PR:

  • OpenFL PR Pipeline / Workflow Functional E2E / WF Functional Without TLS (3.10) (pull_request)

I am currently debugging the following pipeline failure, which appears to be a timeout issue:

  • OpenFL PR Pipeline / Workflow Interface Tests / build (pull_request)

Copy link
Contributor

@psfoley psfoley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking temporarily so I can thoroughly review the change. My review will be completed by EOD 2/28.

Copy link
Contributor

@psfoley psfoley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ishant162 - I've reviewed the PR. No issue with the code implementation, but we need to discuss further if there is a way to get updated information about the collaborator list with the API change proposed. We can discuss in our meeting on 3/3.

Comment on lines +33 to +34
aggregator begins the flow with :code:`start` task and optionally passed in model and optimizer. The list of collaborators in the federation, :code:`self.collaborators`,
is automatically populated by the Runtime infrastructure. It serves as the participant list for executing tasks listed in :code:`self.next` and :code:`aggregated_model_validation`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this add a reserved keyword collaborators to the flow spec. I assume this is set statically at the start of the workflow execution? Is there any way to get an updated collaborator from inside the workload? Use case - for the FederatedRuntime if additional envoys registered with the director, how could they join the federation after experiment start (assume TLS is setup and handled)? With the current API (self.collaborators = self.runtime.collaborators), the flow could (in a future release) get access to this list by querying the runtime. This is a capability of the RANO fork of OpenFL used in the real world; where not all collaborators will necessarily be ready (or known) at the start of an experiment.

Copy link
Collaborator

@theakshaypant theakshaypant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also suggest making the required changes that are made in #1421 for this refactor.

Signed-off-by: Ishant Thakare <[email protected]>
@psfoley
Copy link
Contributor

psfoley commented Mar 5, 2025

This was discussed further offline. This change will result in limiting the ability to add participants dynamically in the future. This PR should not be merged until the implications are well understood. Aim to reach a decision on this by Friday 3/7.

@scngupta-dsp
Copy link
Contributor

Would also suggest making the required changes that are made in #1421 for this refactor.

@theakshaypant : Changes made in #1421 are not related to refactoring of either the FLSpec or Runtime. Therefore they should NOT be included in this PR. Please see the updated disposition in #1421

@ishant162
Copy link
Collaborator Author

This PR should be kept on hold due to a blocking issue found while updating tutorials for a follow-up PR. Investigation is ongoing, with an update expected early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants