> Minimum hardware requirement for tutorial code

I was wondering what is the minimum GPU memory requirement for running the tutorial code?
I am getting some weird runtime errors from sites (like `ProcessExecutor - INFO - process finished with execution code: -9` ), even when I try to run with 2 clients instead of 3. So I assume this might have something to do with the 8gb GPU on my machine.
Any help would be appreciated!

Posted by: unalakunal @ Aug. 4, 2022, 6:53 p.m.

We tested running all 3 clients on a single 16GB GPU without issue, but never tried with an 8GB model. I believe the error code you referenced is from the server, what is happening client-side? I believe there would be a python error if the GPU was the problem.
Can you run the example with a single client? Be sure to change the minimum number of clients from 3 to 1 in two locations -

Posted by: challenge-organizer @ Aug. 4, 2022, 9:23 p.m.

Thank you for the quick response!
Firstly, I tried the single client run, and it works without any issues.

The error I have is actually from client side, it happens at random 2 sites when I run it with 3 sites. For this case, site-1 and site-3 are down with error `ProcessExecutor - INFO - process finished with execution code: -9` and site-2 is running.
On the server, I just see this log repeatedly:
FederatedServer - INFO - Fetch task requested from client: site-2 (2409b7ed-85e5-4ef1-8d1c-0dbc055667d5)
ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1]: got task request from client
ServerRunner - INFO - [run=1, wf=scatter_gather_ctl, peer=site-2, peer_run=1]: no task currently for client - asked client to try again later
FederatedServer - INFO - Return task:__try_again__ to client:site-2 --- (2409b7ed-85e5-4ef1-8d1c-0dbc055667d5)

And I see this log repeatedly on the remaining client (site-2):
ClientRunner - INFO - [run=1]: fetching task from server ...
FederatedClient - INFO - Starting to fetch execute task.
Communicator - INFO - Received from fl_project server (407 Bytes). getTask time: 0.006459474563598633 seconds
FederatedClient - INFO - pull_task completed. Task name:__try_again__ Status:True
ClientRunner - INFO - [run=1, peer=fl_project, peer_run=1]: server asked to try again - will try in 2 secs

Posted by: unalakunal @ Aug. 4, 2022, 10:10 p.m.

Hi, could you please file an issue at https://github.com/Project-MONAI/tutorials and post your complete server and client logs there?

Posted by: hroth @ Aug. 5, 2022, 3:09 p.m.

Another hint. Please look further up in your log files. Typically you see the "process finished with execution code: -9" message because an error happened further above.

Posted by: hroth @ Aug. 5, 2022, 5:26 p.m.

You can find the issue here: https://github.com/Project-MONAI/tutorials/issues/852
I looked through the logs once again, but couldn't see any errors or warnings

Posted by: unalakunal @ Aug. 5, 2022, 7:22 p.m.
Post in this thread