Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to understand invariants? #5

Open
pwayner opened this issue May 29, 2020 · 0 comments
Open

How to understand invariants? #5

pwayner opened this issue May 29, 2020 · 0 comments

Comments

@pwayner
Copy link

pwayner commented May 29, 2020

I've been experimenting with feeding a single county (Dallas) into the E2E code using the attached .ini file. The data is synthetic and there are 2.36m people and 2.36m household. (They all live alone.) The MDF output is skewed. The UNIT output has 2.36m lines but there are roughly 10m lines that appear in the PER directory.

Do you have any suggestions for understanding this problem? Is there some good process for debugging the results? Are there better settings for the .ini file?

Thx.


-------------------------

# The configuration file for DAS as run in standalone mode by das_decennial/run_1940_standalone.sh
# This is a modified copy of das_decennial/E2E_1940_CONFIG.sh

## This is further modified by PCW to include three extended fields.

[DEFAULT]
# root specifies the root location for all files
# For the demo, the root in the current directory
name: 2018DAS_E2E_1940
root: .
loglevel: INFO
logfolder: logs


[ENVIRONMENT]
DAS_FRAMEWORK_VERSION: 0.0.1
GRB_ISV_NAME: Census
GRB_APP_NAME: DAS
GRB_Env3: 0
GRB_Env4:

[geodict]:
# Names of smallest to largest geocode (no spaces)
geolevel_names: Block,Blockgrp,Tract,County,State,National
# Largest geocode length to smallest, put 1 for top level (no spaces)
geolevel_leng: 17,13,10,6,2,1

[setup]
setup: programs.das_setup.setup

# Spark config stuff
spark.name: DAS_E2E
#local[6] tells spark to run locally with 6 threads
#spark.master: local[9]
#Error , only writes to log if there is an error (INFO, DEBUG, ERROR)
spark.loglevel: ERROR

[reader]
# ipums_file: $EXT1940USCB
# ipums_file: /home/pcw/Census/EXT1940USCB.dat
ipums_file: /home/pcw/Census/converted-reconstructed-extended-Dallas.dat
# package(s).module_name.class_name of the reader module
reader: programs.reader.e2e_1940_reader_extended17.reader

###
### List of tables
### These tables have decennial Census specific process methods
### Table class methods will likely need to be rewritten for other applications
### 
tables: PersonData UnitData

privacy_table: PersonData
constraint_tables: UnitData

# table_name.path - location of dir of filename=======

PersonData.geography: geocode
PersonData.histogram: hhgq age18plus hispanic race

UnitData.geography: geocode
UnitData.histogram: hhgq

[engine]
engine: programs.engine.topdown_engine.engine

# should we delete the true data after making DP measurments (1 for True or 0 for False)
delete_raw: 1

[budget]
epsilon_budget_total: 0.2499
global_sensitivity: 2.0

#budget in topdown order (e.g. County, Tract, Block Group, Block)
geolevel_budget_prop:0.95,0.01,0.01,0.01,0.01,0.01

# detailed query proportion of budget (a float between 0 and 1)
detailedprop: 0.1

queriesfile: programs.engine.queries1940.QueriesCreator1940
DPqueries: hhgq, va_hisp_race
queriesprop: 0.225, 0.675


[constraints]
#the invariants created, (no spaces)
theInvariants.Block: gqhh_vect,gqhh_tot
##theInvariants.Block:

theInvariants.Enumdist: gqhh_vect,gqhh_tot
theInvariants.State: tot
invariants: programs.reader.invariants1940.InvariantsCreator1940

#these are the info to build cenquery.constraint objects
theConstraints.Block: hhgq_total_lb,hhgq_total_ub

theConstraints.Enumdist: hhgq_total_lb,hhgq_total_ub
theConstraints.State: total,hhgq_total_lb,hhgq_total_ub
constraints: programs.reader.constraints1940.ConstraintsCreator1940

minimalSchema: hhgq

[gurobi]
gurobi_lic: /home/pcw/gurobi_client.lic
gurobi_logfile_name: $HOME/E2E_1940_GUROBI.log 
OutputFlag: 1
OptimalityTol: 1e-9
BarConvTol: 1e-8
BarQCPConvTol: 0 
BarIterLimit: 1000 
FeasibilityTol: 1e-9
Threads: 1
Presolve: -1
NumericFocus: 3


[writer]
writer: programs.writer.e2e_1940_writer_extended17.writer

# Variables Re-used by multiple writers
# Where the data gets written:
#per_path:  s3://uscb-decennial-ite-das/sexto015/tmp/1940/per2
#unit_path: s3://uscb-decennial-ite-das/sexto015/tmp/1940/unit2
per_path: /home/pcw/Census/census-das-e2e/output/MDF_PER 
unit_path: /home/pcw/Census/census-das-e2e/output/MDF_UNIT


[validator]
validator: programs.validator.validator
error_privacy_budget: 1e-4
error_dp_confidence_level: 0.9
certificate: no
certificate_path: /home/pcw/Census/census-das-e2e/output

[assessment]

[takedown]
takedown: programs.takedown.takedown
delete_output: True

[error_metrics]
error_metrics: programs.metrics.das_error_metrics.error_metrics




Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant