# RevFRF - Revocable Federated Random Forest

## Background

Considering the whole lifecycle of a machine learning model comprehensively, it can be derived that a practical FL framework should at least satisfy the following requirements.

- **Collaboration Privacy.** The original data of a participant cannot be revealed to others during model construction, especially in the gradient aggregation process.

- **Usage Privacy.** The machine learning model built by the FL technique is sometimes treated as a publicly available “infrastructure” of the learning federation, e.g., [Federated AI Technology Enabler (FATE)](https://fate.fedai.org/). Therefore, there are two security requirements for usage privacy: 1) ensuring that no original data are revealed in the usage stage; 2) protecting the privacy of usage request content.

- **Revocation Privacy.** The revoked participant can choose whether to leave its data in the learning federation or not. If the choice is no, the data of the revoked participant in the trained model should be no longer available for the remaining participants. Correspondingly, for fairness, the revoked participant cannot continue using the resources in the learning federation.

Up to now, most FL frameworks can achieve the collaboration privacy, and some of them also consider the second goal. Nevertheless, few of them resolves the problem brought by participant revocation.

![rf](https://drive.google.com/uc?export=view&id=1UcDwOjsgw29a_8A-cJIBy_AJzpln1q_T)

## Contributions of RevFRF

- **Secure RF Construction.** RevFRF implements federated RF construction without private data revealing. Different from traditional privacy-preserving RF schemes, the tree nodes of RF in RevFRF are from different participants and encrypted with different public keys, which is the basis of realizing secure participant revocation.

- **Secure RF Prediction.** Based on the homomorphic encryption technique, RevFRF ensures that RF prediction can be completed without revealing any information about the prediction request and the model parameters, which meets the security requirements for usage privacy.

- **Revocable Federated RF.** RevFRF extends the practicality of FL in real-world scenarios by introducing the participant revocation concept. Based on the speciﬁcally designed participant revocation protocols, RevFRF implements two levels of revocation. For the ﬁrst-level revocation, RevFRF ensures that the data of an honest revoked participant in the trained RF model cannot be utilized by the remaining participants. For the second-level revocation, RevFRF further ensures that if a revoked participant is dishonest, it cannot get back to utilize the data of remaining participants memorized by the trained RF model.

## Design of RevFRF

RevFRF comprises four kinds of entities, namely a center server (CS), a set of participants (UD), a computation service provider (CC) and a key generation center (KGC).

![revfrf](https://drive.google.com/uc?export=view&id=1LNp0ERlxF0MKeyrMqvJb121Khz0JYoxL)

- **Center Server.** CS is usually an initiator of a learning federation of RevFRF. It takes on most of the computation tasks in RevFRF and manages the usage of the trained RF model. Speciﬁcally, CS is also a data provider who has the ground truths, i.e., the labels for classiﬁcation or the prediction target for regression.

- **Normal Participant.** RevFRF involves more than one normal participants, UD $= \{u_1, u_2, ...,\}$. Each $u_i\in$ UD has one or more dimensions of data used for RF construction.

- **Computation Service Provider.** CC is a third-party entity that is only responsible for assisting CS to complete the complex computations of HE. Usually, a cloud computing platform that has no business relation to CS or UD serves as the role of CC. To some extent, CC restricts the power of CS for model usage.

- **Key Generation Center.** KGC is only tasked with key generation and distribution. According to the application requirement for key management, KGC will expire the past keys and distribute new keys to all entities.

## RevFRF Framework

RevFRF contains three stages, namely secure RF construction,
secure RF prediction and secure participant revocation.

![over](https://drive.google.com/uc?export=view&id=1DEXskcvNd697sA_NzUydULWKa1nIal5M)

- **Secure RF Construction.** RevFRF constructs an RF in two steps based on a common RF training method. However, the node information of trained RF is not in the plaintext format but encrypted with its contributor’s key. Thus, RevFRF can constrain the ﬁrst-level revocation problem to key revocation.
  - **Key Setup.** KGC generates HE keys and distributes
them to corresponding participants.
  - **Federated Tree Growth.** To avoid data privacy leakage, the tree growth in RevFRF is implemented by iteratively invoking a crafted HE based leaf expansion protocol. In the protocol, CS ﬁrst randomly chooses a subset of all features. Then, the normal participants that own the data of chosen features recommend candidate splits and send the split result (only containing 0 and 1) to CS. Finally, CS assesses all collected split results and asks the participant who provides the split vector with the highest quality to upload the encrypted best split (a speciﬁc feature threshold). The encrypted best split is stored as a new decision tree node.

- **Secure RF Prediction.** Secure RF prediction is used to securely process a prediction request that contains all dimensions of features. Such a request type is commonly discussed in the privacy-preserving RF frameworks. In the stage, the requester sends the encrypted prediction request to CS. Then, CS iteratively traverses the RF with a secure RF prediction protocol. The prediction results are encrypted across different domains. Therefore, the plaintext prediction result is only available when all RF model providers are revoked from the federation.

- **Secure Participant Revocation.** Secure participant revocation ﬁrst guarantees that the data of revoked participants are removed from the RF and no longer available for remaining participants. When a normal participant wants to quit a learning federation, CS traverses the decision trees in the trained RF and destroys the splits provided by the participant with CC. Then, the protocol used for RF construction is invoked to rebuild the destroyed decision tree nodes. In most cases, these steps are enough to provide forward security of participant revocation. This is because the decision tree nodes in RevFRF are always kept in encrypted format and can only be used with the existence of their providers. Nonetheless, consider the situation where the revoked participant is malicious and still wants to use the RF model after it leaves the learning federation. The above revocation is no longer secure from the ”backward” perspective. By colluding with CS, the revoked participant can operate the old RF without being noticed by the other honest participants. Therefore, we further provide a second-level revocation to ensure backward security. Compared to the ﬁrst-level revocation, extra computations are involved in the second-level revocation to refresh the revoked splits with random values. In this way, we ensure that the revoked data are no longer available even for its provider.

# References

- Y. Liu, Z. Ma, Y. Yang, X. Liu, J. Ma, and K. Ren, “RevFRF: Enabling Cross-Domain Random Forest Training With Revocable Federated Learning,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 6, pp. 3671–3685, 2022. [[Paper](https://ieeexplore.ieee.org/abstract/document/9514457)]


