You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been experimenting with feeding a single county (Dallas) into the E2E code using the attached .ini file. The data is synthetic and there are 2.36m people and 2.36m household. (They all live alone.) The MDF output is skewed. The UNIT output has 2.36m lines but there are roughly 10m lines that appear in the PER directory.
Do you have any suggestions for understanding this problem? Is there some good process for debugging the results? Are there better settings for the .ini file?
Thx.
-------------------------
# The configuration file for DAS as run in standalone mode by das_decennial/run_1940_standalone.sh
# This is a modified copy of das_decennial/E2E_1940_CONFIG.sh
## This is further modified by PCW to include three extended fields.
[DEFAULT]
# root specifies the root location for all files
# For the demo, the root in the current directory
name: 2018DAS_E2E_1940
root: .
loglevel: INFO
logfolder: logs
[ENVIRONMENT]
DAS_FRAMEWORK_VERSION: 0.0.1
GRB_ISV_NAME: Census
GRB_APP_NAME: DAS
GRB_Env3: 0
GRB_Env4:
[geodict]:
# Names of smallest to largest geocode (no spaces)
geolevel_names: Block,Blockgrp,Tract,County,State,National
# Largest geocode length to smallest, put 1 for top level (no spaces)
geolevel_leng: 17,13,10,6,2,1
[setup]
setup: programs.das_setup.setup
# Spark config stuff
spark.name: DAS_E2E
#local[6] tells spark to run locally with 6 threads
#spark.master: local[9]
#Error , only writes to log if there is an error (INFO, DEBUG, ERROR)
spark.loglevel: ERROR
[reader]
# ipums_file: $EXT1940USCB
# ipums_file: /home/pcw/Census/EXT1940USCB.dat
ipums_file: /home/pcw/Census/converted-reconstructed-extended-Dallas.dat
# package(s).module_name.class_name of the reader module
reader: programs.reader.e2e_1940_reader_extended17.reader
###
### List of tables
### These tables have decennial Census specific process methods
### Table class methods will likely need to be rewritten for other applications
###
tables: PersonData UnitData
privacy_table: PersonData
constraint_tables: UnitData
# table_name.path - location of dir of filename=======
PersonData.geography: geocode
PersonData.histogram: hhgq age18plus hispanic race
UnitData.geography: geocode
UnitData.histogram: hhgq
[engine]
engine: programs.engine.topdown_engine.engine
# should we delete the true data after making DP measurments (1 for True or 0 for False)
delete_raw: 1
[budget]
epsilon_budget_total: 0.2499
global_sensitivity: 2.0
#budget in topdown order (e.g. County, Tract, Block Group, Block)
geolevel_budget_prop:0.95,0.01,0.01,0.01,0.01,0.01
# detailed query proportion of budget (a float between 0 and 1)
detailedprop: 0.1
queriesfile: programs.engine.queries1940.QueriesCreator1940
DPqueries: hhgq, va_hisp_race
queriesprop: 0.225, 0.675
[constraints]
#the invariants created, (no spaces)
theInvariants.Block: gqhh_vect,gqhh_tot
##theInvariants.Block:
theInvariants.Enumdist: gqhh_vect,gqhh_tot
theInvariants.State: tot
invariants: programs.reader.invariants1940.InvariantsCreator1940
#these are the info to build cenquery.constraint objects
theConstraints.Block: hhgq_total_lb,hhgq_total_ub
theConstraints.Enumdist: hhgq_total_lb,hhgq_total_ub
theConstraints.State: total,hhgq_total_lb,hhgq_total_ub
constraints: programs.reader.constraints1940.ConstraintsCreator1940
minimalSchema: hhgq
[gurobi]
gurobi_lic: /home/pcw/gurobi_client.lic
gurobi_logfile_name: $HOME/E2E_1940_GUROBI.log
OutputFlag: 1
OptimalityTol: 1e-9
BarConvTol: 1e-8
BarQCPConvTol: 0
BarIterLimit: 1000
FeasibilityTol: 1e-9
Threads: 1
Presolve: -1
NumericFocus: 3
[writer]
writer: programs.writer.e2e_1940_writer_extended17.writer
# Variables Re-used by multiple writers
# Where the data gets written:
#per_path: s3://uscb-decennial-ite-das/sexto015/tmp/1940/per2
#unit_path: s3://uscb-decennial-ite-das/sexto015/tmp/1940/unit2
per_path: /home/pcw/Census/census-das-e2e/output/MDF_PER
unit_path: /home/pcw/Census/census-das-e2e/output/MDF_UNIT
[validator]
validator: programs.validator.validator
error_privacy_budget: 1e-4
error_dp_confidence_level: 0.9
certificate: no
certificate_path: /home/pcw/Census/census-das-e2e/output
[assessment]
[takedown]
takedown: programs.takedown.takedown
delete_output: True
[error_metrics]
error_metrics: programs.metrics.das_error_metrics.error_metrics
The text was updated successfully, but these errors were encountered:
I've been experimenting with feeding a single county (Dallas) into the E2E code using the attached .ini file. The data is synthetic and there are 2.36m people and 2.36m household. (They all live alone.) The MDF output is skewed. The UNIT output has 2.36m lines but there are roughly 10m lines that appear in the PER directory.
Do you have any suggestions for understanding this problem? Is there some good process for debugging the results? Are there better settings for the .ini file?
Thx.
The text was updated successfully, but these errors were encountered: