-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
128 lines (88 loc) · 3.46 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Category: machinelearning | Coding For Fun]]></title>
<link href="http://utopiazh.github.com/blog/categories/machinelearning/atom.xml" rel="self"/>
<link href="http://utopiazh.github.com/"/>
<updated>2013-03-17T10:57:36+08:00</updated>
<id>http://utopiazh.github.com/</id>
<author>
<name><![CDATA[Hang Zhou]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[Notes:Practical Machine Learning in Python]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/10/notes-practical-machine-learning-in-python/"/>
<updated>2013-03-10T18:54:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/10/notes-practical-machine-learning-in-python</id>
<content type="html"><![CDATA[<p>Notes for video <a href="http://www.youtube.com/watch?v=__s45TTXxps">Practical Machine Learning in
Python</a>.</p>
<p><strong>Example: Home-runs and strikeouts predicting</strong></p>
<p>Questions:</p>
<ul>
<li>What features are strong predicators for home runs and strikeouts?</li>
<li>Given a particular situation, with what probability will the batter hit a
home run or strike out?</li>
</ul>
<p><strong>Gathering Data</strong></p>
<ul>
<li>Get the original data</li>
<li>Coalescing</li>
<li>Scrubbing (ensure consistency)</li>
<li>Select the training data</li>
</ul>
<p><strong>Select a ToolKit</strong></p>
<p><em>Trade off</em></p>
<ul>
<li>Speed (offline or realtime)</li>
<li>Transparency (internal visibility, customizability)</li>
<li>Support (community, etc)</li>
</ul>
<p><em>Available Options:</em></p>
<ul>
<li>External bindings of popular packages</li>
<li><p>Python Implementation</p>
<p> NLTK focus on NLP (Natural Lanaguage Processing with Python)
mlpy
PyML (SVM)
PyBrain
mdp-toolkit (abstraction over workflow)
scikit-learn (manager algorithms, active community)</p></li>
<li><p>DIY with Bascic building blocks</p>
<p> Python impl: NumPy, SciPy
C/C++ impl: OpenCV, LIBSVM, LIBLinear</p></li>
</ul>
<p><strong>Feature Selection</strong></p>
<ul>
<li>scikit-learn: chi-square feature selection</li>
<li>visualize significance</li>
</ul>
<p><strong>Tips and Tricks</strong></p>
<ul>
<li>Persistent classifier internals (save trained and reuse)</li>
<li>Using generators where possible</li>
<li>Multicore text processing (use multiple python processes)</li>
</ul>
<h2>TODO</h2>
<p><em>chi-square</em></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Chi-squared_distribution" title="Chi squared distribution">Chi-square</a></li>
</ul>
<p><em>scikit-learn</em></p>
<ul>
<li><a href="http://scikit-learn.org/stable/" title="scikit-learn website">scikit-learn</a></li>
<li><a href="http://www.youtube.com/watch?v=cHZONQ2-x7I" title="Tutorial: scikit-learn - Machine Learning in Python with Contributor Jake
VanderPlas">scikit-learn intro video</a></li>
<li><a href="http://scipy-lectures.github.com/index.html" title="SciPy lecture">SciPy-lecture</a></li>
</ul>
<p><em>mdp-toolkit</em></p>
<ul>
<li><a href="http://mdp-toolkit.sourceforge.net/" title="http://mdp-toolkit.sourceforge.net/">Modular toolkit for Data Processing</a></li>
</ul>
<h2>Reference</h2>
<ul>
<li><a href="http://ml-class.org" title="ml-class.org">ml-class.org</a></li>
<li><a href="http://mloss.org" title="Machine learning open source software">mloss.org</a></li>
</ul>
]]></content>
</entry>
</feed>