EFFGAN: Ensembles of fine-tuned federated GANs

15 November 2022, 08:18

Ebba Ekblom Edvin Listo Zec and Olof Mogren

As technology has increasingly been getting more powerful, we have also started generating vast amounts of data. This data could be leveraged to increase performance and user experience in a multitude of areas. However, depending on factors such as how and where the data is collected, it is often considered to be very private. For instance, information from our personal phones or medical data cannot be shared and collected centrally, posing a challenge to the training of data hungry machine learning models. It is in this setting that federated learning has become a paradigm shift for training on distributed data. However, since most research in federated learning has been focused on supervised tasks, the study of unsupervised learning in this framework has been left relatively understudied. To bridge this gap, in a recent paper we study generative adversarial networks in a federated setting.

Distributed machine learning tackles the problem of learning useful models when data is distributed among several clients. The most prevalent decentralized setting today is federated learning (FL), where a central server orchestrates the learning among clients. At this central server there is a global model, and at the start of training a copy of this model is sent out to each client. After a period of local training, the updated models are then sent back to the central server where they are aggregated into a new global model. This procedure is repeated for a number of communication rounds until some stopping criteria is met. However, a challenge arises when the data among the clients is heterogeneously distributed, i.e. the data is non-iid. This may lead to client drift during the local training, meaning that the local models move in different directions away from the global optima. The difficulty then lies in the aggregation of these models, since the common approach of federated averaging (FedAvg) does not necessarily result in a meaningful model.

In our work (which is to be presented at IEEE Big Data 2022) we study and propose a new method for training generative adversarial networks (GAN) in the FL-framework for non-iid data. A GAN is a generative machine learning model with the goal to generate data that is indistinguishable from sampling data from an existing real dataset. The training of this model involves unsupervised training of two networks competing in a two-player minimax game. There is a generator which will output samples from a noise input, and a discriminator which discriminates between real samples and generated samples. Since the objective of the generator is to maximize the error made by the discriminator, the model manages to learn in an unsupervised manner.

Our method (EFFGAN) for training a GAN on distributed data results in an ensemble of client generators G_m , m = 1, …, M, as opposed to having one global model. The training still involves aggregation of local models using FedAvg, but with the addition of a fine-tuning step before creating an ensemble. Note that the generation of data still happens at the central server, meaning that no extra computation is needed at the clients. The step by step procedure of EFFGAN is as follows.

Train a global GAN using FedAvg.
Do one last round of local training on the clients (fine-tuning).
Send the fine-tuned models back to the server and keep all of them as an ensemble.
Sample a random generator in the ensemble to generate data.

An illustration of EFFGAN is shown in Figure 1.

We compare the results of EFFGAN with the results of using a global aggregated model (known as FedGAN in the literature). The latter has been trained the same way as EFFGAN but without the final fine-tuning and ensembling, i.e. stopping after step 1. In a qualitative analysis of the generated data we see a clear difference, as demonstrated in Figure 2. EFFGAN manages to generate clear and varied output, while the images from FedGAN are blurry. This is believed to be due to the previously mentioned client drift, where the aggregated model does not become meaningful when the local models have drifted too far apart.

Figure 2. Training with a local training period of 50 local epochs for EFFGAN (top) and FedGAN (bottom).

The Fréchet Inception Distance (FID) is used as a quantitative measure of performance, which compares the distribution of generated samples with the distribution of the real ones that were used to train the generators. A lower FID score indicates higher quality images, and in our case EFFGAN outperformed FedGAN in all experiments with heterogeneously distributed data. In one experiment the length of the local training period was varied, and the lowest FID-score was reached by EFFGAN at 18.7 with 5 local epochs. Furthermore, the performance of EFFGAN remained relatively stable even for increasing lengths of local training, while the FID score of FedGAN increased with increasing number of local epochs. With 50 local epochs, EFFGAN reached a minimum FID of 22.0, while FedGAN scored 72.1, and this corresponds well to the difference in quality observed in the generated images in Figure 2.

To conclude, we note that although FedAvg seems to have good transfer learning capabilities that are beneficial during training, the ensemble in EFFGAN manages to mitigate the issue with client drift and thus generates higher quality data.

For more information, see the article.