Permalink
Browse files

boss mashup files - initialization

  • Loading branch information...
1 parent 0d4022f commit ac9be987ff5845fbccd4d454dc51287dda117651 @jcleblanc jcleblanc committed Jun 28, 2011
View
@@ -0,0 +1,34 @@
+Software Copyright License Agreement (BSD License)
+
+Copyright (c) 2011, Yahoo! Inc.
+All rights reserved.
+
+Redistribution and use of this software in source and binary forms,
+with or without modification, are permitted provided that the following
+conditions are met:
+
+* Redistributions of source code must retain the above
+ copyright notice, this list of conditions and the
+ following disclaimer.
+
+* Redistributions in binary form must reproduce the above
+ copyright notice, this list of conditions and the
+ following disclaimer in the documentation and/or other
+ materials provided with the distribution.
+
+* Neither the name of Yahoo! Inc. nor the names of its
+ contributors may be used to endorse or promote products
+ derived from this software without specific prior
+ written permission of Yahoo! Inc.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
View
101 README 100644 → 100755
@@ -0,0 +1,101 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+Welcome to the Boss Mashup Framework - an experimental, proof-of-concept library
+for operating on Boss Search results and web data sources in a SQL like fashion.
+
+I. Installation
+===============
+
+[1] First, download and install python 2.5
+
+http://www.python.org/download/
+
+[2] Then download and install simplejson
+
+simplejson
+http://pypi.python.org/pypi/simplejson/
+
+Save this anywhere you can cd to via the terminal
+
+Decompress the file
+
+> tar -xzvf simplejson-1.9.2.tar.gz
+> cd <new_folder>
+
+Then install
+
+> sudo python setup.py build
+> sudo python setup.py install
+
+[3] Then download and install oauth
+
+oauth
+https://github.com/simplegeo/python-oauth2, click Downloads, click "Download .tar.gz"
+
+Save this anywhere you can cd to via the terminal
+
+Decompress the file
+
+> tar -xzvf <downloaded_file>.tar.gz
+> cd <new_folder>
+
+Then install
+
+> sudo python setup.py build
+> sudo python setup.py install
+
+Install setuptools for python2.5 if it complains for the same.
+
+[4] Create a folder named deps within this boss_mashup_framework_0.1 directory:
+
+boss_mashup_framework-0.1> mkdir deps
+
+[5] Download the following 2 items into the deps folder:
+
+dict2xml
+http://dict2xml.googlecode.com/files/dict2xml.tar.gz
+
+xml2dict
+http://xml2dict.googlecode.com/files/xml2dict.tgz
+
+[6] Decompress the two files from the last step from inside the deps folder:
+
+boss_mashup_framework-0.1/deps/> tar -xzvf dict2xml.tar.gz
+boss_mashup_framework-0.1/deps/> tar -xzvf xml2dict.tgz
+
+[7] Within the boss_mashup_framework-0.1 directory, execute the following 2 commands:
+
+sudo python setup.py build
+sudo python setup.py install
+
+[8] Set your user information (e.g. appid for V1, cc_key,cc_secrent,source_tag for V2) in the config.json file inside boss_mashup_framework-0.1 directory
+
+[9] To test, execute the following command (from the same directory as above) with no errors:
+
+python examples_v2/ex1.py
+
+
+II. Usage
+=========
+
+Check out the source files in the "examples" directory for usage syntax
+
+Also, take a look in the library sources for code documentation
+
+Here's a quick library organization description:
+
+yos.yql.db provides classes and functions for creating and remixing tables out of XML/JSON responses
+yos.boss.ysearch provides a single function for fetching BOSS search results
+yos.yql.udfs provides some handy user defined functions for yos.yql.db.select calls
+yos.util.text provides some handy functions for processing and comparing text (strings)
+yos.util.console provides a write function that prints messages to stdout despite encoding errors
+
+When using yos.yql.db, keep in mind that for join calls
+like join (inner_join), outer_join (left_outer_join)
+that the first parameter (predicate function) should operate on row keys assuming no namespacing (field name text before the '$')
+like row['yn$title'] => should be row['title'] within the predicate function code
+This is because the predicate function is being applied like a map function,
+so the order of the tables input (second parameter) does not matter
+It also doesn't make sense when the number of tables exceeds 2
+as a predicate function only operates on records from two tables at a time
View
@@ -0,0 +1,4 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+__all__ = ["examples", "templates", "util", "yos"]
View
@@ -0,0 +1,13 @@
+{"appid":"Add your V1 ID here if you using that service",
+ "email": "boss-feedback@yahoo-inc.com",
+ "org": "Yahoo! Inc.",
+ "agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0",
+ "commercial": false,
+ "purpose": "To create the best search experience ever",
+ "version": "1.0",
+ "uri_v1": "http://boss.yahooapis.com/ysearch/",
+ "uri_v2": "http://yboss.yahooapis.com/ysearch/",
+ "cc_key": "ADD YOUR KEY HERE",
+ "cc_secret": "ADD YOUR CONSUMER SECRET HERE",
+ "source_tag": "ADD YOUR SOURCE TAG HERE"
+}
View
@@ -0,0 +1,4 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+__all__ = ["ex1", "ex2", "ex3", "ex4", "ex5", "ex6"]
View
@@ -0,0 +1,33 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+
+"""
+Inner join popular delicious results and yahoo news results for the query 'iphone'
+Combine results which have at least 2 terms in common in their titles
+Then publish as a search results html page using the provided california template
+"""
+
+__author__ = "BOSS Team"
+
+from templates import publisher
+from util import text, console
+from yos.boss.ysearch import search_v1
+from yos.yql import db, udfs
+
+dl = db.select(name="dl", udf=udfs.unnest_value, url="http://feeds.delicious.com/rss/popular/iphone")
+dl.describe()
+yn = db.create(name="yn", data=search_v1("iphone", vertical="news", count=50))
+
+def overlap_predicate(r1, r2):
+ return text.overlap(r1["title"], r2["title"]) > 1
+
+serp = publisher.Serp(template_dir="templates/california", title="boss 'iphone'", endpoint="http://yahoo/search")
+
+tb = db.join(overlap_predicate, [dl, yn])
+tb = db.group(by=["yn$title"], key=None, reducer=lambda x,y: None, as=None, table=tb, norm=text.norm)
+
+for row in tb.rows:
+ serp.add(url=row["dl$link"], title=row["yn$title"], abstract=row["yn$abstract"], dispurl=row["yn$sourceurl"], source=row["dl$creator"])
+
+serp.dump("iphone.html")
View
@@ -0,0 +1,32 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+
+"""
+Search yahoo news and twitter for facebook
+Combine results with techmeme feeds based on titles having at least 2 term overlap
+Print results to stdout
+"""
+
+__author__ = "BOSS Team"
+
+from util import console, text
+from yos.yql import db, udfs
+from yos.boss import ysearch
+
+gn = db.create(name="gn", data=ysearch.search_v1("facebook", vertical="news", count=40))
+gn.rename("headline", "title")
+
+sm = db.create(name="sm", url="http://search.twitter.com/search.json?q=facebook&rpp=40")
+sm.rename("text", "title")
+
+tm = db.select(name="tm", udf=udfs.unnest_value, url="http://techmeme.com/firehose.xml")
+
+def overlap(r1, r2):
+ return text.overlap(r1["title"], r2["title"]) > 1
+
+j = db.join(overlap, [gn, sm, tm])
+j = db.sort(key="sm$id", table=j)
+
+for r in j.rows:
+ console.write( "\n%s\n[yahoo] %s\n[twitter] %s\n[techmeme] %s\n" % (r["sm$created_at"], r["gn$title"], r["sm$title"], r["tm$title"]) )
View
@@ -0,0 +1,38 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+
+"""
+Search 'google android' on yahoo news, summize, and digg
+Join results based on titles having an overlap of 3 terms or more
+Group duplicates based on yahoo news title
+In the group by sum by diggs, save as field rank
+Then sort by rank and print to stdout
+"""
+
+__author__ = "BOSS Team"
+
+from util import console, text
+from yos.yql import db
+from yos.boss import ysearch
+
+ynews_data = ysearch.search_v1("google android", vertical="news", count=60)
+ynews = db.create(name="ynews", data=ynews_data)
+ynews.rename(before="headline", after="title")
+
+sm = db.create(name="sm", url="http://summize.com/search.json?q=google+android&rpp=60&lang=en")
+sm.rename(before="text", after="title")
+
+titlef = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])}
+digg = db.select(name="dg", udf=titlef, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
+
+def overlap_predicate(r1, r2):
+ return text.overlap(r1["title"], r2["title"]) > 2
+
+tb = db.join(overlap_predicate, [ynews, sm, digg])
+tb = db.group(by=["ynews$title"], key="dg$diggs", reducer=lambda d1, d2: d1 + d2, as="rank", table=tb, norm=text.norm)
+tb = db.sort(key="rank", table=tb)
+
+for r in tb.rows:
+ console.write( "\n%s\n[y] %s\n[t] %s\n[s] %d\n" % (r["sm$created_at"], r["ynews$title"], r["sm$title"], r["rank"]) )
+
View
@@ -0,0 +1,46 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+
+"""
+Four way of join of 'google android' on yahoo news, summize, youtube, and digg
+Combine results based on titles having an overlap of 3 terms or more
+Group results based on yahoo news title (remove duplicates)
+Redefined the group by equality operator to use text.norm to do near duplicate text removal
+In the group sum the digg and youtube favorite counts as the rank for each joined result
+Sort by rank, print to stdout
+"""
+
+__author__ = "BOSS Team"
+
+from util import console, text
+from yos.yql import db
+from yos.boss import ysearch
+
+ynews_data = ysearch.search_v1("google android", vertical="news", count=100, more={"news.ranking": "date"})
+ynews = db.create(name="ynews", data=ynews_data)
+ynews.rename(before="headline", after="title")
+
+sm = db.create(name="sm", url="http://summize.com/search.json?q=google+android&rpp=60&lang=en")
+sm.rename(before="text", after="title")
+
+ytf = lambda r: {"title": r["title"]["value"], "favorites": int(r["statistics"]["favoriteCount"])}
+yt = db.select(name="yt", udf=ytf, url="http://gdata.youtube.com/feeds/api/videos?vq=google+android&lr=en&orderby=published")
+
+diggf = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])}
+digg = db.select(name="dg", udf=diggf, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
+
+def overlap_predicate(r1, r2):
+ return text.overlap(r1["title"], r2["title"]) > 2
+
+tb = db.join(overlap_predicate, [ynews, sm, digg, yt])
+
+def socialf(row):
+ row.update({"social": row["dg$diggs"] + row["yt$favorites"]}) ; return row
+
+tb = db.select(udf=socialf, table=tb)
+tb = db.group(by=["ynews$title"], key="social", reducer=lambda d1,d2: d1+d2, as="rank", table=tb, norm=text.norm)
+tb = db.sort(key="rank", table=tb)
+
+for r in tb.rows:
+ console.write( "\n%s\n[y] %s\n[t] %s\n[sr] %d\n" % (r["sm$created_at"], r["ynews$title"], r["sm$title"], r["rank"]) )
View
@@ -0,0 +1,38 @@
+#Copyright (c) 2011 Yahoo! Inc. All rights reserved. Licensed under the BSD License.
+# See accompanying LICENSE file or http://www.opensource.org/licenses/BSD-3-Clause for the specific language governing permissions and limitations under the License.
+
+
+"""
+Search 'iphone' on yahoo news and sort by date
+Get the wikipedia edits for the iphone page
+Rank the news results based on their title/text overlap with the wikipedia entries
+Sort by the overlap sizes
+This could potentially be a new freshness model, based on the idea that wikipedia is updated for recent significance
+"""
+
+__author__ = "BOSS Team"
+
+from util import console, text
+from yos.boss import ysearch
+from yos.yql import db
+
+yn = db.create(name="yn", data=ysearch.search_v1("iphone sdk", vertical="news", count=50, more={"news.ranking": "date"}))
+wiki = db.create(name="wiki", url="http://en.wikipedia.org/w/index.php?title=IPhone_OS&feed=atom&action=history")
+
+tb = db.cross([yn, wiki])
+
+def rankf(row):
+ row.update( {"rank": text.overlap(row["yn$abstract"], row["wiki$summary"]["value"])} ) ; return row
+
+tb = db.select(udf=rankf, table=tb)
+tb = db.group(by=["yn$title"], key="rank", reducer=lambda d1,d2: d1+d2, as="total", table=tb, norm=text.norm)
+tb = db.sort(key="total", table=tb)
+
+print "Before\n"
+for r in yn.rows:
+ console.write( "[news] %s\n" % r["yn$title"] )
+
+print "After\n"
+for r in tb.rows:
+ console.write( "[news] %s\n[source] %s\t[rank] %d\n" % (r["yn$title"], r["yn$source"], r["total"]) )
+
Oops, something went wrong.

0 comments on commit ac9be98

Please sign in to comment.