University of Navarra (TECNUN) - February 2025
The practicals as well as the take-home project you will be using Flower, an open-source framework for Federated AI. With Flower you can configure and run arbitrarily large federations using either its Simulation Engine or the Deployment Engine. In this course we’ll use the former.
Setting up Your Machine
Flower works best on UNIX-based systems like Ubuntu or macOS. If you are on Windows, it is highly recommended to make use of WSL to run your Flower apps. With WSL you can also make use of the NVIDIA GPU in your Windows host. You’ll need a recent version of Windows 10 or Windows 11.
In this course you’ll need to use a GPU. If you have a laptop with a modern GPU you are welcome to use it for the practicals and the graded project. If that’s the case, you just need VSCode and Python 3.10 or 3.11. Then, skip to the section Flower Hello World.
Alternatively, if you don’t have easy access to a GPU or your GPU isn’t powerful enough, you can use a GPU instance via vast.ai. What follows are the instructions for accessing it:
- Go to vast.ai and create an account using your
@alumni.unav.es
email. Click on the[SIGN IN]
button on the top-right corner. - Send me an email with the email you used to register in the step above. I will transfer $30 in credits to your account. This amount should last for about 100 GPU hours when using an instance with an RTX 3090.
- Use this vast.ai template that comes with VSCode, Python 3.11 and CUDA 12 pre-installed. You should see the same
VSCode CUDA
template shown as in the image below under the red shade. - Then, optionally increase: the disk space allocated to anything between 64GB and 100GB, the host reliability to over 99%, and the max instance to approx 1month. For reference, see the shaded areas in green on the screenshot above.
- In vast.ai you have a wide selection of GPUs, from low-end gaming GPUs (e.g GTX 1050Ti) all the way to the latest data center gpus (e.g. H200) . For this course you likely won’t need more than an RTX 3090. In fact, you might want to choose a low-end GPU (e.g. RTX 3060) for the practicals and use a more powerful GPU later for the assignment when you have all or most of your code ready. Note you can also choose an instance with multiple GPUs. This is great if you want to run your Flower experiments fast. You can filter through all GPUs available using the drop down menu shaded in orange in the screenshot above.
- At a high level instances might seem identical. Below are two instances that we could choose from, both have an RTX 3060 and both cost about $0.1/hour. However! if you look closely you’ll see one has far more CPUs (in red) and RAM (in blue) than the other. Take these specifications into account when choosing your instance. Higher CPU count and RAM amount will give you more head room to run your Flower simulations.
- To aid with filtering the instances, you can use the panel on the left (as you did earlier). Scroll through it an set:
- Min CUDA Version:
12.3
- Machine Resources:
- CPU Cores: anything between 10-32
- CPU RAM: anything between 32-64
- Min CUDA Version:
- After applying the filters, choose an instance of your liking (e.g with an RTX 3060/80/90) that has some decent upload/download internet speeds (500Mbps/500Mbps or higher would be ok) and click on
[RENT]
🚀
Connecting to your Vast.ai Instance
⚠️ If you don’t stop your instance, you will continue being charged. When the instance is stopped, you’ll still be charged (but at a reduced rate) for the disk space used. All in all, if you want to make the most out of your $ then: (1) stop the instance when you aren’t planning to work on your project anymore; (2) consider downloading the code you wrote to your laptop (so you can reuse it later) and destroy the instance.
- If you have already rented an instance, you should see it in the
INSTANCES
panel. For example, as shown in the screenshot below. Note it is already running (it might take <5mins to be ready from the moment you clicked onRENT
). Take note of which buttons stop and destroy your instance. Also note the(i)
at the bottom right corner of your instance. It shows a detailed breakdown of the running costs associated with your instance (including those when your instance isn’t running). - Click on the IP address shown in a blue rectangle. A small window will popup as shown in the screenshot below. Copy the IP and PORT (it will be different from the one shown) shown. This is the address to connect to the VSCode server running in your instance. You’ll be able to use VSCode from your browser.
- Copy the address in your browser. If you see a warning, it is safe to ignore it and proceed with the connection. Depending on the browser you use, you might need to click on
Advanced > Proceed
, orShow Details>Visit this Website
or something similar. Once you get past this warning, you’ll see the VSCode editor as shown below. - Setup VSCode to your liking (e.g. choose a theme, install extensions) and/or continue with installing Flower.
Remember to stop (or even terminate) your instances when you are not using them. Additionally, if you have stopped your instance, others could rent it and (temporarily) prevent you from connecting to it while it’s being used. They won’t have access to your files. In summary, it’s recommended to download your files when you stop your instance (more on this in the next section).
Flower Hello World
To start with Flower, all you have to do is to install the flwr
package.
If you are using the Vast.ai setup you can execute the command below right away by opening a terminal clicking on
[☰] > Terminal > New Terminal
. If you are using your own setup you might want to do this in a new virtual environment (e.g. using Conda or pyenv).
pip install flwr
With Flower installed, you can create your first App. You can make use of one of the built-in App templates. Execute the command below and choose the PyTorch template when prompted. You can use your name when asked about your Flower username:
flwr new my-app
You’ll notice a new directory has appeared with the name my-app
. Access it from the terminal, install the dependencies and run the app. You can follow the steps shown at the end of your flwr new
command:
# Enter into the app directory
cd my-app
# Install the app dependencies (i.e. PyTorch etc)
# Inspect the pyproject.toml to see what will be installed
pip install -e .
# Run the app
flwr run .
You can find a detailed walkthrough the code in the Flower Documentation.
Sending/Retrieving files from a remote VSCode
If you want to download files from your VSCode instance in Vast.ai you can right click on them and click on Download. Howerver, this won’t work for directories. The work around is to first compress the directory into a .zip
file and then download that file individually. Let’s see how to do this with the terminal:
# Follow this syntaxt
zip -r <NAME-FOR-YOUR-ZIP-FILE>zip path/to/directory/to/compres
# For example if your direcotry is named `my-app` and is the current directory and
# you want to compress its content into a `myapp_backup.zip`, do:
zip -r myapp_backup.zip my-app/
Then you can download the generated .zip
file.
If instead you want to upload files to VSCode, they can be dragged onto the explorer directly. It also works directories. If you are dragging a .zip
file you’ll need to uncompress after uploading it using the terminal like this:
unzip path/to/my/file.zip
Practicals
Practical A
This practicals builds on top of the Flower Hello World
section above. You'll dive into the code, make some small modifications and run it a few times.
- Complete the Vast.ai setup if you wish to use this platform.
- Complete the
Flower Hello World
steps. - Check the files in your Flower App. Can you:
- The model being federated, the datasets being used ?
- How many nodes are being simulated ?
- What’s the aggregation strategy and how is it configured ?
- How many rounds of FL will run by default?
- Create a new federation with 20 nodes in your
pyproject.toml
, name itsecond-federation
and run it withflwr run . second-federation
. Do you observe double nodes being sampled now? - Add a new configuration value under the
[tool.flwr.app.config]
section in your app'spyproject.toml
(for example alr= 0.1
that defines the learning rate for theClientApp
) and ensure this config value read when building theClientApp
and that is passed to the optimizer. - Run the app overriding the run config. Learn how to do this by reading the help message you get from
flwr run --help
. - Create a Jupyter Notebook in your VSCode and visualize the partitions created by your Flower App. As dataset downloading and partitioning is done with Flower Datasets, you may get inspiration from the visualization tutorial.
Practical B
In this practical you will customize further your Flower app by replacing the dataset and creating a new Flower strategy that adds useulf functionality to your ServerApp
.
- Create a new app again using the
flwr new
and PyTorch template, give it any name you want. - Replace with the default dataset with
FashionMNIST
. Take a look at its repository on Huggingface Datasets. To use it in your Flower App you’ll need to:- Update the dataset preparation (see the
load_data
function intask.py
) - Update the keys used to access the elements in the batch (e.g., in
train
andtest
) - Update the model so it’s ready to work with 28x28 greyscale images as opposed to RGB 32x32 (which is what CIFAR-10 uses)
- Update the dataset preparation (see the
- Run the code. Is the loss going down?
- Is your app making use of the GPU?
- You can check by opening a second terminal and running the command
watch -n 1 nvidia-smi
. - Read the Flower Documentation to know how to set the CPU/GPU resources available to each
ClientApp
.
- You can check by opening a second terminal and running the command
- Add a callback to your
ServerApp
strategy:- Introduce a callback to the strategy, so the
accuracy
is also aggregated and displayed. You may follow how theServerApp
in the pytorch-quickstart example does it. - Adjust the hyperparameters so the performance goes to over 70%.
- (optional) Get the performance to over 85%.
- Introduce a callback to the strategy, so the
- Create a custom strategy for your
ServerApp
that enables it to do the following.- Creates a directory for the run with the format:
outputs/<current-date-and-time>/
and saves to it. - Saves the evaluate metrics each round to a JSON. This is important so you can complete practical C.
- (optional) Save a checkpoint of the global model each time a new best is found.
- (optional) Logs metrics to Weights&Biases (or equivalent).
- Creates a directory for the run with the format:
For the last few points in this practical you may use the advanced-pytorch example for inspiration.
Practical C
In this practical you will extend the Flower App you created in Practical-B and you'll analyize how it performs under different scenarios of varying complexity (i.e. different number of nodes, increasingly heterogenous data distributions).
- Copy the app created in Practical-B and give it a different name.
- Update the code so that from the config (i.e. via
--run-config
) you can:- Change the partitioner: to
DirichletPartitioner
orPathologicalPartitioner
. - Change parameterization of the partitioner (e.g., alpha in
DirichletPartitioner
ornum_classes_per_partition
forPathologicalPartitioner
).
- Change the partitioner: to
- Run experiments and analysis when:
- Increasing the number of nodes in the federation (you may want to create several federations in your
pyproject.toml
). - Changing
fraction_fit
. - Using different partitioners and their parameterization.
- Increasing the number of nodes in the federation (you may want to create several federations in your
- Create a report as a Jupyter Notebook that describes: (1) the experiments you ran; (2) loads the JSON results created by your experiments; and (3) creates several plots showing the performance of the Federated model across rounds and for each scenario you considered.
- Submit your report (as a jupyter notebook, markdown file or PDF), and all the code and results generated in Practical C. Submit it to ADI (actividad Federated Learning Assignment) as a single ZIP file.
Adjust hyperparameters to the best of your ability. For example, highly non-IID setups might need smaller learning rates or more rounds to reach the same performance than (unrealistic) IID scenarios. Try your best. Create a short report showing plots for how federated accuracy/loss perform when varying parameters across the above dimensions.
Project
For the project you'll design your own Federated AI setup with a model and dataset of your choice and run an analysis just like in Practical C. Since you have access to GPUs, you may want to make use of a larger model than the ones used in Practicasl A,B,C. The following are some ideas, but feel to do something else. But you must consult with me during the class about other projects first.
Depending on the type of project, the evaluation would be slightly different. However, in all cases you will need to submit to ADI (actividad Federated Learning Assignment) all the code you wrote, a report as a jupyter notebook, markdown file or in PDF format about the analysis you conducted as well as a README indicating how the code can be run via flwr run
. Compress everything as a ZIP file. I will be running your code.