Week8

Stats on the results.

Run experiments on your SMO tool. Vary sample sizes, and comparing to random.

Report baselines and ceiling.

Baseline = value without any treatments (the centroid of the original data set)
Ceiling = If we abandoned all the principles of this subject and evaluated everything, how good can we get?

Report current date, random number seed, and some summary stats on the input data.

For each treatments, repeat the run 20 times. During the run, print some progress statement so observers know you have not crashed.

Use stats to group the results into statistically distinguishable groups.

Part1 (no stats)

Reproduce the following output.

Before summarizing the results of many runs, first show details (very useful for debugging).

In the following we show baseline centroids (mid) and variability around that centroid (div). We then run SMO 20 times (smo9)with a budget of 9 (peek at 4 to find initial best and rest, then look at five more).

Then we compare to "just grab any 50 at random" (any50).

Finally, we abandoned all the principles of this subject and evaluated everything (100%)..

date : 08/02/2024 07:42:53
file : ../data/auto93.csv
repeats  : 20
seed : 31210
rows : 398
cols : 8
names                       	['Clndrs'   'Volume'   'HpX'    'Model'   'origin'   'Lbs-'    'Acc+'   'Mpg+']  	D2h-
mid                         	[5.45       193.43     104.47   76.01     1          2970.42   15.57    23.84]   	0.54
div                         	[1.7        104.27     38.49    3.7       1.33       846.84    2.76     8.34]    	0.16
#
smo9                        	[4          90         48       78        2          1985      21.5     40]      	0.19
smo9                        	[4          90         48       78        2          1985      21.5     40]      	0.19
smo9                        	[4          90         48       78        2          1985      21.5     40]      	0.19
smo9                        	[4          90         48       80        2          2085      21.7     40]      	0.2
smo9                        	[4          85         65       81        3          1975      19.4     40]      	0.24
smo9                        	[4          85         65       81        3          1975      19.4     40]      	0.24
smo9                        	[4          85         65       81        3          1975      19.4     40]      	0.24
smo9                        	[4          85         65       80        3          2110      19.2     40]      	0.25
smo9                        	[4          79         58       77        2          1825      18.6     40]      	0.26
smo9                        	[4          98         70       82        1          2125      17.3     40]      	0.31
smo9                        	[4          85         52       76        1          2035      22.2     30]      	0.31
smo9                        	[4          98         65       81        1          2380      20.7     30]      	0.34
smo9                        	[4          98         '?'      71        1          2046      19       30]      	0.36
smo9                        	[4          98         68       77        3          2045      18.5     30]      	0.37
smo9                        	[4          112        88       82        1          2605      19.6     30]      	0.38
smo9                        	[4          98         80       79        1          1915      14.4     40]      	0.39
smo9                        	[4          97         92       72        3          2288      17       30]      	0.41
smo9                        	[4          97         92       72        3          2288      17       30]      	0.41
smo9                        	[4          135        84       82        1          2525      16       30]      	0.44
smo9                        	[4          97         88       73        3          2279      19       20]      	0.49
#
any50                       	[4          97         52       82        2          2130      24.6     40]      	0.17
any50                       	[4          90         48       80        2          2335      23.7     40]      	0.19
any50                       	[4          90         48       80        2          2335      23.7     40]      	0.19
any50                       	[4          90         48       78        2          1985      21.5     40]      	0.19
any50                       	[4          90         48       78        2          1985      21.5     40]      	0.19
any50                       	[4          90         48       78        2          1985      21.5     40]      	0.19
any50                       	[4          90         48       80        2          2085      21.7     40]      	0.2
any50                       	[4          90         48       80        2          2085      21.7     40]      	0.2
any50                       	[4          90         48       80        2          2085      21.7     40]      	0.2
any50                       	[4          86         65       80        3          2110      17.9     50]      	0.25
any50                       	[4          89         60       80        3          1968      18.8     40]      	0.26
any50                       	[4          85         70       78        3          2070      18.6     40]      	0.27
any50                       	[4          85         70       78        3          2070      18.6     40]      	0.27
any50                       	[4          72         69       71        3          1613      18       40]      	0.27
any50                       	[4          91         68       82        3          2025      18.2     40]      	0.28
any50                       	[4          98         70       82        1          2125      17.3     40]      	0.31
any50                       	[4          85         52       76        1          2035      22.2     30]      	0.31
any50                       	[5          121        67       80        2          2950      19.9     40]      	0.31
any50                       	[4          97         46       73        2          1950      21       30]      	0.32
any50                       	[4          68         49       73        2          1867      19.5     30]      	0.34
#
100%                        	[4          97         52       82        2          2130      24.6     40]      	0.17

Part Two (stats)

Right a more succinct report that summarizes 20 runs on anything that uses stochastic choice.

Here, we only report distance to ehaven (d2h).

The following report can get slow for larger data sets so the line starting with #base prints each word as we loop through that part.

In all the following, we make 4 initial guesses to initialize best:rest, then we run on for some BUDGET=4 repeats.

e.g. bonr20 means 4 initial guesses then 16 subsequent ones.

bonr means using the acquire function you've been using all along ((b+r)/(b-r))

RandN means, 20 ties, pull 90% of the data, sort by d2h, then report the top one.

base shows the d2h distribution within the untreated data set.

best reports the best d2h in the data (this is our ceiling)

tiny shows .35*standard deviation. Any difference less than this is getting a little pedanctic.

date : February/02/2024 08:19:54,
file : ../data/auto93.csv,
repeats  : 20,
seed : 31210,
rows : 398,
cols : 8,
best : 0.17,
tiny : 0.06
#base #bonr9 #rand9 #bonr15 #rand15 #bonr20 #rand20 #rand358 
#report8
#
 0,             #rand358,  0.17,  0.00, *                   |                   ,  0.17,  0.93
#
 1,              #rand20,  0.26,  0.07,     *---            |                   ,  0.17,  0.93
 1,              #bonr20,  0.30,  0.14,  -----*-            |                   ,  0.17,  0.93
#
 2,              #bonr15,  0.32,  0.16,  -------*           |                   ,  0.17,  0.93
 2,               #bonr9,  0.34,  0.11,        --*--        |                   ,  0.17,  0.93
 2,              #rand15,  0.34,  0.06,       ---*          |                   ,  0.17,  0.93
 2,               #rand9,  0.36,  0.12,      ----*--        |                   ,  0.17,  0.93
#
 3,                 base,  0.55,  0.20,              -------*---                ,  0.17,  0.93

In the above, the left-hand-side number shows the statistical ranking. All statistically similar treatments have the same rank.

In the above we note that:

All treatments significantly improve on the base line
- Hooray
Evaluating a lot of things (rand358) does better than evaluating just a few things (e.g. bonr9).
- No surprises there.
Evaluating just a few things is surprisingly effective (e.g. the bonri9 results are very similar to bonr20)
- 15 no better than 9 not recommended
- 20 recommended.
Random as good as smo
- For X in (bonr,rand)
  - and N in (9,15,20):
    - #rand(N) == #bonr(N)
- Is this result repeatable in many data sets (your class project)?

How do we sore these things?

by predictive prowess?
by predictive certainty (the variance)
by simplicity of explanation (not down above):
- what are the least number of attribute value settings...
- ... that most influence the outcomes?
- welcome to explanation (homeworks 7,8,9)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hw06.md

hw06.md

Week8

Part1 (no stats)

Part Two (stats)

Files

hw06.md

Latest commit

History

hw06.md

File metadata and controls

Week8

Part1 (no stats)

Part Two (stats)