Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Implemented categorization for text

  • Loading branch information...
commit 1b61014c8908a8485794f836464253b5531138f8 1 parent 6514f83
David Balatero dbalatero authored
2  lib/alchemy_api.rb
@@ -12,11 +12,13 @@ module AlchemyApi
12 12 @api_key = nil
13 13 @base_uri = "http://access.alchemyapi.com/calls/url"
14 14 @base_html_uri = "http://access.alchemyapi.com/calls/html"
  15 + @base_text_uri = "http://access.alchemyapi.com/calls/text"
15 16
16 17 class << self
17 18 attr_accessor :api_key
18 19 attr_accessor :base_uri
19 20 attr_accessor :base_html_uri
  21 + attr_accessor :base_text_uri
20 22 end
21 23
22 24 class UnknownError < StandardError; end
21 lib/alchemy_api/base.rb
@@ -4,5 +4,26 @@ class Base < MonsterMash::Base
4 4 cache_timeout 999999
5 5 user_agent 'Ruby AlchemyApi'
6 6 end
  7 +
  8 + def self.check_json_for_errors_and_raise!(json)
  9 + if json['status'] == 'ERROR'
  10 + case json['statusInfo']
  11 + when 'invalid-api-key'
  12 + raise InvalidApiKeyError, "The API key you sent (#{AlchemyApi.api_key.inspect}) is invalid! Please set AlchemyApi.api_key!"
  13 + when 'cannot-retrieve'
  14 + raise CannotRetrieveUrlError, "The URL (#{json['url']}) could not be retrieved."
  15 + when 'cannot-retrieve:http-redirect-limit'
  16 + raise RedirectionLimitError, "The URL (#{json['url']}) could not be retrieved, as it reached a redirect limit."
  17 + when 'page-is-not-html'
  18 + raise PageIsNotValidHtmlError, "The page at #{json['url']} is not valid HTML!"
  19 + when 'content-exceeds-size-limit'
  20 + raise ContentExceedsMaxLimitError, "The page at #{json['url']} is larger than 600KB!"
  21 + when 'invalid-html'
  22 + raise InvalidHtmlError, "The HTML sent was invalid!"
  23 + else
  24 + raise UnknownError, "Got an unknown error: #{json['statusInfo']}"
  25 + end
  26 + end
  27 + end
7 28 end
8 29 end
22 lib/alchemy_api/categorization.rb
... ... @@ -1,4 +1,24 @@
1 1 module AlchemyApi
2   - class Categorization
  2 + Category = Struct.new(:url, :name, :score)
  3 +
  4 + class Categorization < Base
  5 + post(:get_categorization_from_text) do |text, *args|
  6 + options = args.first || {}
  7 + uri "#{AlchemyApi.base_text_uri}/TextGetCategory"
  8 + params :apikey => AlchemyApi.api_key,
  9 + :text => text,
  10 + :url => options[:url] || '',
  11 + :outputMode => 'json'
  12 + handler do |response|
  13 + AlchemyApi::Categorization.get_categorization_handler(response)
  14 + end
  15 + end
  16 +
  17 + def self.get_categorization_handler(response)
  18 + json = JSON.parse(response.body)
  19 + check_json_for_errors_and_raise!(json)
  20 + Category.new(json['url'], json['category'],
  21 + json['score'].to_f)
  22 + end
3 23 end
4 24 end
23 lib/alchemy_api/text_extraction.rb
@@ -83,7 +83,6 @@ class TextExtraction < Base
83 83 end
84 84 end
85 85
86   -
87 86 def self.get_title_from_url_handler(response)
88 87 json = JSON.parse(response.body)
89 88 check_json_for_errors_and_raise!(json)
@@ -95,27 +94,5 @@ def self.get_text_from_url_handler(response)
95 94 check_json_for_errors_and_raise!(json)
96 95 ExtractedText.new(json['url'], json['text'])
97 96 end
98   -
99   - private
100   - def self.check_json_for_errors_and_raise!(json)
101   - if json['status'] == 'ERROR'
102   - case json['statusInfo']
103   - when 'invalid-api-key'
104   - raise InvalidApiKeyError, "The API key you sent (#{AlchemyApi.api_key.inspect}) is invalid! Please set AlchemyApi.api_key!"
105   - when 'cannot-retrieve'
106   - raise CannotRetrieveUrlError, "The URL (#{json['url']}) could not be retrieved."
107   - when 'cannot-retrieve:http-redirect-limit'
108   - raise RedirectionLimitError, "The URL (#{json['url']}) could not be retrieved, as it reached a redirect limit."
109   - when 'page-is-not-html'
110   - raise PageIsNotValidHtmlError, "The page at #{json['url']} is not valid HTML!"
111   - when 'content-exceeds-size-limit'
112   - raise ContentExceedsMaxLimitError, "The page at #{json['url']} is larger than 600KB!"
113   - when 'invalid-html'
114   - raise InvalidHtmlError, "The HTML sent was invalid!"
115   - else
116   - raise UnknownError, "Got an unknown error: #{json['statusInfo']}"
117   - end
118   - end
119   - end
120 97 end
121 98 end
15 spec/alchemy_api/categorization_spec.rb
... ... @@ -1,4 +1,19 @@
1 1 require File.dirname(__FILE__) + "/../spec_helper"
2 2
3 3 describe AlchemyApi::Categorization do
  4 + typhoeus_spec_cache('spec/cache/categorization/get_categorization_from_text') do |hydra|
  5 + describe "#get_categorization_from_text" do
  6 + before(:each) do
  7 + @url = "http://test.com"
  8 + text = fixture_for('article.txt')
  9 +
  10 + @category = AlchemyApi::Categorization.
  11 + get_categorization_from_text(text)
  12 + end
  13 +
  14 + it "should return a category name" do
  15 + @category.name.should_not be_nil
  16 + end
  17 + end
  18 + end
4 19 end
29 spec/cache/categorization/get_categorization_from_text/8b476a3b532afd2da646b145e9dde07570c27352.cache
... ... @@ -0,0 +1,29 @@
  1 +u:Typhoeus::Response�---
  2 +:headers: |
  3 + HTTP/1.1 100 Continue
  4 +
  5 + HTTP/1.1 200 OK
  6 + Server: apgrid
  7 + Date: Fri, 30 Apr 2010 00:06:04 GMT
  8 + Content-Type: application/json
  9 + Connection: keep-alive
  10 + Content-Length: 328
  11 + Cache-Control: max-age=600
  12 + Expires: Fri, 30 Apr 2010 00:16:04 GMT
  13 +
  14 +
  15 +:code: 200
  16 +:requested_http_method:
  17 +:time: 0.302073
  18 +:body: |
  19 + {
  20 + "status": "OK",
  21 + "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
  22 + "url": "",
  23 + "language": "english",
  24 + "category": "arts_entertainment",
  25 + "score": "0.841536"
  26 + }
  27 +
  28 +:start_time:
  29 +:requested_url:
9 spec/fixtures/article.txt
... ... @@ -0,0 +1,9 @@
  1 +Bed and Breakfast locations are trade marked by their small size, antique furniture and homey feel. If this is the kind of B&B you are looking for, then The Custer House is where you should go. It is a five-minute walk from the beach, but in this turn of the century modified Queen Anne residence you'll be begging to stay longer.
  2 +
  3 +She may not look like much on the outside, but visiting 10th Avenue Inn Bed and Breakfast will be worth the visit inside. A stunning panoramic view of the ocean that few can rival cheers guests from this quiet little Inn. Not only is the view pleasant, but the meals and tea are something to look forward to.
  4 +
  5 +The Guest House Bed and Breakfast is the coziest and homiest of the B&B's in Seaside, Oregon. The warm wood paneling of the house is cheerful and clean-looking. A unique feature is the front facing balcony that invites you to enjoy the views day and night.
  6 +
  7 +One cannot mention Seaside B&B's without mentioning The Gilbert Inn Bed and Breakfast. This is the most recommended location for romance and history. This Victorian style B&B right on the Promenade is quaint and scenic in all the best ways. The Turret Room offers a special way to spend your night and morning with a beautiful view of the ocean. This B&B is a must for those visiting Historical sites like the Butterfield Cottage and The Saltworks.
  8 +
  9 +Seaside B&B's are wonderful little getaways for the romantic couple, the history buff and those people who simply enjoy a good old-fashioned vacation.

0 comments on commit 1b61014

Please sign in to comment.
Something went wrong with that request. Please try again.