How to understand invariants? #5

pwayner · 2020-05-29T17:38:14Z

I've been experimenting with feeding a single county (Dallas) into the E2E code using the attached .ini file. The data is synthetic and there are 2.36m people and 2.36m household. (They all live alone.) The MDF output is skewed. The UNIT output has 2.36m lines but there are roughly 10m lines that appear in the PER directory.

Do you have any suggestions for understanding this problem? Is there some good process for debugging the results? Are there better settings for the .ini file?

Thx.


-------------------------

# The configuration file for DAS as run in standalone mode by das_decennial/run_1940_standalone.sh
# This is a modified copy of das_decennial/E2E_1940_CONFIG.sh

## This is further modified by PCW to include three extended fields.

[DEFAULT]
# root specifies the root location for all files
# For the demo, the root in the current directory
name: 2018DAS_E2E_1940
root: .
loglevel: INFO
logfolder: logs


[ENVIRONMENT]
DAS_FRAMEWORK_VERSION: 0.0.1
GRB_ISV_NAME: Census
GRB_APP_NAME: DAS
GRB_Env3: 0
GRB_Env4:

[geodict]:
# Names of smallest to largest geocode (no spaces)
geolevel_names: Block,Blockgrp,Tract,County,State,National
# Largest geocode length to smallest, put 1 for top level (no spaces)
geolevel_leng: 17,13,10,6,2,1

[setup]
setup: programs.das_setup.setup

# Spark config stuff
spark.name: DAS_E2E
#local[6] tells spark to run locally with 6 threads
#spark.master: local[9]
#Error , only writes to log if there is an error (INFO, DEBUG, ERROR)
spark.loglevel: ERROR

[reader]
# ipums_file: $EXT1940USCB
# ipums_file: /home/pcw/Census/EXT1940USCB.dat
ipums_file: /home/pcw/Census/converted-reconstructed-extended-Dallas.dat
# package(s).module_name.class_name of the reader module
reader: programs.reader.e2e_1940_reader_extended17.reader

###
### List of tables
### These tables have decennial Census specific process methods
### Table class methods will likely need to be rewritten for other applications
### 
tables: PersonData UnitData

privacy_table: PersonData
constraint_tables: UnitData

# table_name.path - location of dir of filename=======

PersonData.geography: geocode
PersonData.histogram: hhgq age18plus hispanic race

UnitData.geography: geocode
UnitData.histogram: hhgq

[engine]
engine: programs.engine.topdown_engine.engine

# should we delete the true data after making DP measurments (1 for True or 0 for False)
delete_raw: 1

[budget]
epsilon_budget_total: 0.2499
global_sensitivity: 2.0

#budget in topdown order (e.g. County, Tract, Block Group, Block)
geolevel_budget_prop:0.95,0.01,0.01,0.01,0.01,0.01

# detailed query proportion of budget (a float between 0 and 1)
detailedprop: 0.1

queriesfile: programs.engine.queries1940.QueriesCreator1940
DPqueries: hhgq, va_hisp_race
queriesprop: 0.225, 0.675


[constraints]
#the invariants created, (no spaces)
theInvariants.Block: gqhh_vect,gqhh_tot
##theInvariants.Block:

theInvariants.Enumdist: gqhh_vect,gqhh_tot
theInvariants.State: tot
invariants: programs.reader.invariants1940.InvariantsCreator1940

#these are the info to build cenquery.constraint objects
theConstraints.Block: hhgq_total_lb,hhgq_total_ub

theConstraints.Enumdist: hhgq_total_lb,hhgq_total_ub
theConstraints.State: total,hhgq_total_lb,hhgq_total_ub
constraints: programs.reader.constraints1940.ConstraintsCreator1940

minimalSchema: hhgq

[gurobi]
gurobi_lic: /home/pcw/gurobi_client.lic
gurobi_logfile_name: $HOME/E2E_1940_GUROBI.log 
OutputFlag: 1
OptimalityTol: 1e-9
BarConvTol: 1e-8
BarQCPConvTol: 0 
BarIterLimit: 1000 
FeasibilityTol: 1e-9
Threads: 1
Presolve: -1
NumericFocus: 3


[writer]
writer: programs.writer.e2e_1940_writer_extended17.writer

# Variables Re-used by multiple writers
# Where the data gets written:
#per_path:  s3://uscb-decennial-ite-das/sexto015/tmp/1940/per2
#unit_path: s3://uscb-decennial-ite-das/sexto015/tmp/1940/unit2
per_path: /home/pcw/Census/census-das-e2e/output/MDF_PER 
unit_path: /home/pcw/Census/census-das-e2e/output/MDF_UNIT


[validator]
validator: programs.validator.validator
error_privacy_budget: 1e-4
error_dp_confidence_level: 0.9
certificate: no
certificate_path: /home/pcw/Census/census-das-e2e/output

[assessment]

[takedown]
takedown: programs.takedown.takedown
delete_output: True

[error_metrics]
error_metrics: programs.metrics.das_error_metrics.error_metrics

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand invariants? #5

How to understand invariants? #5

pwayner commented May 29, 2020

How to understand invariants? #5

How to understand invariants? #5

Comments

pwayner commented May 29, 2020