Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
471 lines (323 sloc) 15 KB
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Coding For Fun]]></title>
<link href="http://utopiazh.github.com/atom.xml" rel="self"/>
<link href="http://utopiazh.github.com/"/>
<updated>2013-03-17T10:57:36+08:00</updated>
<id>http://utopiazh.github.com/</id>
<author>
<name><![CDATA[Hang Zhou]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[Configure Sharded MongoDB on single host]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/17/configure-sharded-mongodb-on-single-host/"/>
<updated>2013-03-17T10:54:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/17/configure-sharded-mongodb-on-single-host</id>
<content type="html"><![CDATA[<p>The default configuration of mongodb only use a single thread, which will be
bottleneck.</p>
<h2>Config and Start mongodb</h2>
<p>Start 3 config server</p>
<pre><code>mongod --configsvr --dbpath /var/lib/mongodb/configdb/27019 --port 27019 --logpath /var/log/mongodb/mongod-27019.log --logappend &amp;
mongod --configsvr --dbpath /var/lib/mongodb/configdb/27119 --port 27119 --logpath /var/log/mongodb/mongod-27119.log --logappend &amp;
mongod --configsvr --dbpath /var/lib/mongodb/configdb/27219 --port 27219 --logpath /var/log/mongodb/mongod-27219.log --logappend &amp;
</code></pre>
<p>Start 3 data server</p>
<pre><code>mongod --dbpath /var/lib/mongodb/datadb/27029 --port 27029 --logpath /var/log/mongodb/mongod-27029.log --logappend &amp;
mongod --dbpath /var/lib/mongodb/datadb/27129 --port 27129 --logpath /var/log/mongodb/mongod-27129.log --logappend &amp;
mongod --dbpath /var/lib/mongodb/datadb/27229 --port 27229 --logpath /var/log/mongodb/mongod-27229.log --logappend &amp;
</code></pre>
<p>Start mongos server</p>
<pre><code>mongos --configdb 127.0.0.1:27019,127.0.0.1:27119,127.0.0.1:27219 --port 27017 --logpath /var/log/mongodb/mongos-27017.log --logappend &amp;
</code></pre>
<h2>Add shard for standalone mongod</h2>
<p>Connect to mongos</p>
<pre><code># mongo
MongoDB shell version: 2.0.6
connecting to: test
mongos&gt; sh.addShard("127.0.0.1:27029")
mongos&gt; sh.addShard("127.0.0.1:27129")
mongos&gt; sh.addShard("127.0.0.1:27229")
</code></pre>
<p>Enable shard for DB</p>
<pre><code>mongos&gt; sh.enableSharding("github")
</code></pre>
<p>Create Index on the to-be-sharded key</p>
<pre><code>mongos&gt; use github
mongos&gt; db.repos.ensureIndex({"id" :1})
</code></pre>
<p>Enable shard for collection</p>
<pre><code>mongos&gt; sh.shardCollection("github.repos", {"id": 1})
</code></pre>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[CodeReading: SolrCloud - Sharding]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/10/codereading-solrcloud-sharding/"/>
<updated>2013-03-10T21:20:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/10/codereading-solrcloud-sharding</id>
<content type="html"><![CDATA[<h1>Working in progress</h1>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Notes:Practical Machine Learning in Python]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/10/notes-practical-machine-learning-in-python/"/>
<updated>2013-03-10T18:54:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/10/notes-practical-machine-learning-in-python</id>
<content type="html"><![CDATA[<p>Notes for video <a href="http://www.youtube.com/watch?v=__s45TTXxps">Practical Machine Learning in
Python</a>.</p>
<p><strong>Example: Home-runs and strikeouts predicting</strong></p>
<p>Questions:</p>
<ul>
<li>What features are strong predicators for home runs and strikeouts?</li>
<li>Given a particular situation, with what probability will the batter hit a
home run or strike out?</li>
</ul>
<p><strong>Gathering Data</strong></p>
<ul>
<li>Get the original data</li>
<li>Coalescing</li>
<li>Scrubbing (ensure consistency)</li>
<li>Select the training data</li>
</ul>
<p><strong>Select a ToolKit</strong></p>
<p><em>Trade off</em></p>
<ul>
<li>Speed (offline or realtime)</li>
<li>Transparency (internal visibility, customizability)</li>
<li>Support (community, etc)</li>
</ul>
<p><em>Available Options:</em></p>
<ul>
<li>External bindings of popular packages</li>
<li><p>Python Implementation</p>
<p> NLTK focus on NLP (Natural Lanaguage Processing with Python)
mlpy
PyML (SVM)
PyBrain
mdp-toolkit (abstraction over workflow)
scikit-learn (manager algorithms, active community)</p></li>
<li><p>DIY with Bascic building blocks</p>
<p> Python impl: NumPy, SciPy
C/C++ impl: OpenCV, LIBSVM, LIBLinear</p></li>
</ul>
<p><strong>Feature Selection</strong></p>
<ul>
<li>scikit-learn: chi-square feature selection</li>
<li>visualize significance</li>
</ul>
<p><strong>Tips and Tricks</strong></p>
<ul>
<li>Persistent classifier internals (save trained and reuse)</li>
<li>Using generators where possible</li>
<li>Multicore text processing (use multiple python processes)</li>
</ul>
<h2>TODO</h2>
<p><em>chi-square</em></p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Chi-squared_distribution" title="Chi squared distribution">Chi-square</a></li>
</ul>
<p><em>scikit-learn</em></p>
<ul>
<li><a href="http://scikit-learn.org/stable/" title="scikit-learn website">scikit-learn</a></li>
<li><a href="http://www.youtube.com/watch?v=cHZONQ2-x7I" title="Tutorial: scikit-learn - Machine Learning in Python with Contributor Jake
VanderPlas">scikit-learn intro video</a></li>
<li><a href="http://scipy-lectures.github.com/index.html" title="SciPy lecture">SciPy-lecture</a></li>
</ul>
<p><em>mdp-toolkit</em></p>
<ul>
<li><a href="http://mdp-toolkit.sourceforge.net/" title="http://mdp-toolkit.sourceforge.net/">Modular toolkit for Data Processing</a></li>
</ul>
<h2>Reference</h2>
<ul>
<li><a href="http://ml-class.org" title="ml-class.org">ml-class.org</a></li>
<li><a href="http://mloss.org" title="Machine learning open source software">mloss.org</a></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Multi-tenant of HBase in FaceBook]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/10/multi-tenancies-of-hbase-in-facebook/"/>
<updated>2013-03-10T09:24:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/10/multi-tenancies-of-hbase-in-facebook</id>
<content type="html"><![CDATA[<h1>Overview</h1>
<p>Watched the video for 2 times to get the main point, the highlights learned from this video is:</p>
<ul>
<li>Use unique message id as ts to store into hbase</li>
<li>Use monitoring/reporting tools to know what exactly the access pattern is, while the app developers don&#8217;t know either</li>
<li>Use underlying hashout solution to isolate traffic, instead of an optimistic solution of using table names.</li>
</ul>
<h1>HBase adoption</h1>
<h2>Messages:</h2>
<p><strong>Requirements:</strong></p>
<ul>
<li>Very high volume (9B+ msg, 90B r+w ops per day), 4.5 PB no compressed.</li>
<li>Elasticity &amp; auto-failover</li>
<li>Strong consistency within a data centeer</li>
<li>import large data set (old data)</li>
</ul>
<p><strong>Schema:</strong></p>
<ul>
<li>3 CF (actions, snapshot (cache), keywords (search))</li>
<li>Use message id as timestamp, which leverage the 3rd dimension.</li>
<li>Fast iteration on schema 
  - Action logs are single source of truth
  - Most data during schema iteration from snapshot  and keywords
  - Custom backup solution (action logs), which could be replayed on schema evolving.</li>
</ul>
<h2>ODS (Operational Data Store)</h2>
<ul>
<li>Schema: 3CFs, raw, hour, days</li>
<li>MR job to move data to buckets</li>
<li>Compaction tricks (leverage the time series to pick the right HFile to achieve more spatial locality)</li>
<li>Separating HBase clusters and its backing HDFS clusters proved not feasible</li>
<li>HA is essential (Master/Master cluster replication with MR jobs)</li>
</ul>
<h1>HBase best practices</h1>
<p><strong>Process:</strong></p>
<p>Select the right apps: no five nines, etc.</p>
<p>Shadow testing before going production (real traffic goes in and out without noticed by users).</p>
<p><strong>Schema Design</strong></p>
<p>One CF or Multiple CF: common issue is putting all data into one CF, so get poor spatial locality </p>
<p>New table or New CF: New CF provides better locality, e.g. users message and search, the best user experience is that all users data is complete available or down</p>
<p><strong>Lessons:</strong></p>
<ul>
<li>Need graph/reporting monitoring to provide insights</li>
<li>Constant upgrade (facebook is running HBase 0.89 with back-ported patches)</li>
</ul>
<h1>Multi-Tenants</h1>
<p><strong>optimistic multi-tenancies</strong></p>
<ul>
<li>let users do the right things</li>
<li>lack of table naming rules</li>
</ul>
<p><strong>Hashout:</strong></p>
<ul>
<li>Was for Mysql, ported to support HBase</li>
<li>review every app</li>
<li>block abusive users, promote heave users</li>
<li>users must provide app id and key, system app id to specific shard</li>
<li>backed reading ops with memcached</li>
<li>intensive MR job on separate cluster</li>
</ul>
<h1>Reference</h1>
<ul>
<li><a href="vimeo.com/44715953" title="Multi-tenant HBase Solutions at Facebook">Nicolas Spiegelberg - Multi-tenant HBase Solutions at Facebook</a></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Configure Ubuntu with Dual Monitor]]></title>
<link href="http://utopiazh.github.com/blog/2013/03/10/configure-ubuntu-with-dual-monitor/"/>
<updated>2013-03-10T09:11:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/03/10/configure-ubuntu-with-dual-monitor</id>
<content type="html"><![CDATA[<h2>Goals:</h2>
<p>Setting ubuntu monitor like this:</p>
<pre><code>LCD with HDMI --&gt; Laptop Monitor
</code></pre>
<h2>Solutions:</h2>
<p>Create a xrandr script, named as <code>dual_monitor</code>:</p>
<pre><code>#!/bin/sh
if [ "$1" != "off" ]; then
# External Screen as HDMI, Laptop below
echo "Dual Screen on"
xrandr --output VGA1 --off
xrandr --output HDMI1 --primary --mode 1920x1080
xrandr --output LVDS1 --off
xrandr --output LVDS1 --mode 1600x900 --right-of HDMI1
else
# External Screen off, Laptop primary
echo "Dual Screen off"
xrandr --output HDMI1 --off
xrandr --output LVDS1 --primary --mode 1600x900
fi
</code></pre>
<p>To use it conveniently, add the directory into $PATH.</p>
<p>Start dual monitor mode with:</p>
<pre><code>dual_monitor
</code></pre>
<p>Stop dual monitor mode with:</p>
<pre><code>dual_monitor off
</code></pre>
<h2>Reference</h2>
<ul>
<li><a href="http://ubuntuforums.org/showthread.php?t=1801549" title="Setting primary monitor on dual display">AskUbuntu:&#8221;Setting primary monitor on dual display&#8221;</a></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[How to use octopress and github to write blog]]></title>
<link href="http://utopiazh.github.com/blog/2013/01/13/how-to-use-octopress-and-github-to-write-blog/"/>
<updated>2013-01-13T15:30:00+08:00</updated>
<id>http://utopiazh.github.com/blog/2013/01/13/how-to-use-octopress-and-github-to-write-blog</id>
<content type="html"><![CDATA[<p>Today spent about one hours to setup blogs on github, the main motivation is
to it&#8217;s code friendly.</p>
<p>The setup steps are:</p>
<h2> 1. Install Ruby With RVM </h2>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>curl -L https://get.rvm.io | bash -s stable --ruby
</span><span class='line'># Update .zshrc and add one more line:
</span><span class='line'>source /etc/profile.d/rvm.sh
</span><span class='line'>rvm install 1.9.3
</span><span class='line'>rvm use 1.9.3
</span><span class='line'>rvm rubygems latest</span></code></pre></td></tr></table></div></figure>
<h2> 2. Setup Octopress </h2>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>cd ~/github/
</span><span class='line'>git clone git://github.com/imathis/octopress.git utopiazh.github.com
</span><span class='line'>cd ~/github/utopiazh.github.com
</span><span class='line'>bundle update
</span><span class='line'>rake install</span></code></pre></td></tr></table></div></figure>
<h2> 3. Deploy as utopiazh.github.com </h2>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git remote add utopiazh git@github.com:utopiazh/utopiazh.github.com.git
</span><span class='line'>rake setup_github_pages
</span><span class='line'># creaete a test post
</span><span class='line'>rake new_post["post title"]
</span><span class='line'>rake generate
</span><span class='line'>rake deploy</span></code></pre></td></tr></table></div></figure>
<h2> 4. Commit the source </h2>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git push -u utopiazh master
</span><span class='line'>git add .
</span><span class='line'>git commit -m 'your message'
</span><span class='line'>git push origin source</span></code></pre></td></tr></table></div></figure>
<p>After setup, the publishing steps will be:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>rake new_post["post title"]
</span><span class='line'>rake generate
</span><span class='line'>rake deploy
</span><span class='line'>
</span><span class='line'>git commit -a -m "new post"
</span><span class='line'>git push origin source</span></code></pre></td></tr></table></div></figure>
<p><strong>References:</strong><br/></p>
<ol>
<li>http://octopress.org/docs/setup/ </li>
<li>http://blog.devtang.com/blog/2012/02/10/setup-blog-based-on-github</li>
<li>http://www.yangzhiping.com/tech/octopress.html</li>
</ol>
]]></content>
</entry>
</feed>