This tutorial shows how to use [Knockoff Gan](https://openreview.net/forum?id=ByeZ5jC5YQ): "*generating knockoffs for feature selection using generative adversarial networks*". This implemenation uses both R and python3 including tensorflow. Please make sure that all the depencies are installed by following the installation procedure.

We generate a data file with synthetic data so we know the relation between the explanatory variables and response variable - see also [knockoffs](https://web.stanford.edu/group/candes/knockoffs/software/knockoffs): 

In [None]:
fn_data_csv = "data.csv"
target = "label"
fn_json = "generated_data_properties.json"
!Rscript gen_data.r  -o {fn_data_csv} --target {target} --ojson {fn_json}

Show first five rows of generated data:

In [None]:
import pandas as pd
df = pd.read_csv(fn_data_csv)
df.head()

Show the relevant variables retrieved from the properties file:

In [None]:
import json
with open(fn_json, "r") as fp:
    features_gen_data = json.load(fp)
print('relevant variables:{}'.format(features_gen_data['features_selected']))

In [None]:
niter = 2000 # number iterations GAN
rep = 20 # numbers of repeated runs from which the selected features will be collected
false_discovery_rate = 0.1
stat = "glm" # Importance statistics based on glmnet_coefdiff (glm)
fn_json_ko = "result_knockoff_gan.json"
python_exe = "python3" # on some systems this is python

!Rscript knockoffgan.r -i {fn_data_csv} --target {target} --it {niter}  --fdr {false_discovery_rate} --replication {rep}  -o {fn_json_ko} --stat {stat} --exe {python_exe}

Compared knockoff gan results with generated data properties:

In [None]:
import json

fn_json_ko = "result_knockoff_gan.json"

with open(fn_json_ko,'r') as fp:
    result = json.load(fp)

agree_set = set(result['features_selected']).intersection(set(features_gen_data['features_selected']))
disagree_set = set(result['features_selected']) - set(features_gen_data['features_selected'])
    
print('relevant explanatory variables:{}\n'.format(result['features_selected']))
print('agreement generated and detectect explanatory variables:{}) {}'.format(len(agree_set), agree_set))
print('disagreement: {}'.format(disagree_set if len(disagree_set) else '-'))


Validate results:

In [None]:
assert len(disagree_set)  <= (false_discovery_rate * len(features_gen_data['features_selected']) + 1)
assert len(agree_set)  == len(features_gen_data['features_selected'])            
print('pass')