Permalink
Browse files

Made Engine as a singleton

  • Loading branch information...
1 parent fad69bc commit a50ef208ba1133f83a02071e87895ed4872207bc @Jeweller-Tsai Jeweller-Tsai committed Apr 18, 2012
@@ -13,13 +13,13 @@ def initialize(attrs = {})
end
def save(dir = nil)
- Search::Agent.download(dir)
+ Search::Engine.instance.download(self, dir)
end
class << self
def search_by_title(title)
begin
- result = Search::Agent.search(title: title) unless title.empty?
+ result = Search::Engine.instance.search(title: title) unless title.empty?
rescue Search::NotFound
return []
end
View
@@ -0,0 +1,140 @@
+# encoding: UTF-8
+require 'gmusic/search/result'
+require 'gmusic/search/errors'
+require 'fileutils'
+require 'mechanize'
+
+module Gmusic
+ module Search
+ class Engine < Mechanize
+ include Singleton
+
+ #NOTE not finish
+ def search(query)
+ raise InvalidParameter unless query_valid?(query)
+
+ query_url = format_url(SEARCH_URL, query)
+
+ #TODO should wrapped in begin rescue pair
+ # retry or raise, make it more robust
+ page = agent.get(query_url)
+
+ info = extract_info_from page
+ details = collect_details_from page
+
+ Result.new(info, details)
+ end
+
+ #NOTE not finish
+ def download(song, dir=nil)
+ #TODO
+ #`get` should be rescue too
+ #maybe pass a song list and then can retry 3 times
+ agent.pluggable_parser.default = Mechanize::Download
+ agent.get(song.link) do |page|
+ times = 0
+ begin
+ file = page.links.last.click
+ rescue Errno::ETIMEDOUT => e
+ return false if times > 2
+ times += 1
+ retry
+ end
+ file.save("#{mkdir dir}/#{song.title}.mp3")
+ end
+
+ true
+ end
+
+ private
+
+ def mkdir(dirname)
+ return FileUtils.mkdir_p("#{Dir.home}/Downloads/gmusic").first unless dirname
+ return dirname if Dir.exists? dirname
+
+ dir = sanitize_dirname dirname
+ return FileUtils.mkdir_p(dir).first if /\/.*/.match dir
+
+ FileUtils.mkdir_p(File.join(Dir.pwd, dir)).first
+ end
+
+ def sanitize_dirname(dirname)
+ dirname.strip.tap do |dir|
+ dir.sub!(/^\~/, Dir.home)
+ dir.gsub!(/[^\.|^\/|^[:word:]]/, '_')
+ end
+ end
+
+ def collect_details_from(page)
+ page.search('#song_list tbody').map do |tbody|
+ id = tbody.attributes['id'].text
+ title = extract_text_from(tbody, '.Title b')
+ artist = extract_text_from(tbody, '.Artist a')
+
+ #TODO
+ #make link as an URI object may be better, more OO anyway
+ link = DOWNLOAD_URL % id
+
+ { title: title, artist: artist, link: link }
+ end
+ end
+
+ #def collect_links_from(page)
+ #page.search('#song_list tbody').map do |tbody|
+ #id = tbody.attributes['id'].text
+ #title = extract_text_from(tbody, '.Title b')
+ #url = DOWNLOAD_URL % id
+
+ ##{ title: title, link: link }
+ #Link.new(title, url)
+ #end
+ #end
+
+ def extract_info_from(page)
+ text = extract_text_from(page, '.topheadline')
+ pattern = /\((\d+)\)/
+ figures = text.scan(pattern).flatten.map { |item| item.to_i }
+ mappings = %w{歌曲 专辑 歌手}.zip(figures)
+ info = Hash[mappings]
+ raise NotFound if not_found?(info)
+
+ info
+ end
+
+ def extract_text_from(scope, element)
+ scope.search(element).text
+ end
+
+ #TODO
+ #set a logger for the agent
+ #http://mechanize.rubyforge.org/Mechanize.html
+ #def agent
+ #@agent ||= Mechanize.new
+ #end
+ def agent
+ self.class.instance
+ end
+
+ def query_valid?(hash)
+ return false if hash.empty?
+ hash.each_key { |key| return false unless SEARCH_OPTSTIONS.include?(key.to_sym) }
+
+ true
+ end
+
+ def not_found?(hash)
+ hash.each_value { |v| return false if v != 0 }
+ true
+ end
+
+ def format_url(base_url, hash)
+ base_url + encode_www_form(hash.values)
+ end
+
+ def encode_www_form(ary)
+ str = ary * ' '
+ str.downcase.strip.squeeze(' ').gsub(/\s+/, '+')
+ end
+ end
+ end
+end
@@ -0,0 +1,207 @@
+# encoding: UTF-8
+describe Gmusic::Search::Agent do
+ let(:base_url) { %Q{http://www.google.cn/music/search?q} }
+ let(:found_url) { base_url + 'bad+romance+lady+gaga' }
+ let(:not_found_url) { base_url + 'not+found' }
+
+ describe '.search' do
+ before(:each) do
+ prepare_fake_web('search_results.html', found_url)
+ end
+
+ let(:query) { { title: 'Bad Romance', artist: 'Lady Gaga'} }
+ subject { Gmusic::Search::Agent.search(query) }
+
+ it 'returns a result object' do
+ should be_a Gmusic::Search::Result
+ end
+
+ its(:info) { should eq({"歌曲"=>12, "专辑"=>7, "歌手"=>0}) }
+ end
+
+ #NOTE not finish
+ describe '.download' do
+ before(:each) do
+ @song = Gmusic::Song.new(title: 'for_test', artist: 'nobody', link: 'http://fakelink/')
+ prepare_fake_web('download.html', 'http://fakelink/')
+ prepare_fake_web('for_test.mp3', "file:///home/jeweller/workspaces/gmusic/spec/web_pages/for_test.mp3")
+ end
+ after(:all) { clean_up_files_generated_by_fakeweb }
+
+ context "without specifying a directory" do
+ let(:default_path) { File.join(Dir.home, 'Downloads', 'gmusic') }
+
+ before(:each) { Gmusic::Search::Agent.download(@song) }
+ after(:each) { FileUtils.rm_rf default_path }
+
+ it "downloads the song and stored it in ~/Downloads/gmusic" do
+ Dir.exists?(default_path).should be
+ end
+
+ it { File.exists?(File.join(default_path, 'for_test.mp3')).should be }
+ end
+
+ context "specified a directory" do
+ let(:store_path) { File.join(Dir.home, 'Desktop', 'gmusic') }
+
+ before(:each) { Gmusic::Search::Agent.download(@song, store_path) }
+ after(:all) { FileUtils.rm_rf store_path }
+
+ it 'downloads ths song and stored it the given directory' do
+ Dir.exists?(store_path).should be
+ end
+
+ it { File.exists?(File.join(store_path, 'for_test.mp3')).should be }
+ end
+ end
+
+ describe 'private class methods' do
+ let(:agent) { Gmusic::Search::Agent.send(:agent) }
+
+ describe '.extract_info_from' do
+ context 'when not found' do
+ before(:each) do
+ prepare_fake_web('not_found.html', not_found_url)
+ @page = agent.get not_found_url
+ end
+
+ it 'raises NotFound' do
+ expect do
+ Gmusic::Search::Agent.send(:extract_info_from, @page)
+ end.to raise_error('Gmusic::Search::NotFound')
+ end
+ end
+
+ context 'when found' do
+ before(:each) do
+ prepare_fake_web('search_results.html', found_url)
+ @page = agent.get found_url
+ end
+
+ subject { Gmusic::Search::Agent.send(:extract_info_from, @page) }
+
+ it { should be_a Hash }
+ its(:keys) { should include '歌曲' }
+ its(:keys) { should include '专辑' }
+ its(:keys) { should include '歌手' }
+ end
+ end
+
+ #describe '.collect_links_from' do
+ #before(:each) do
+ #prepare_fake_web('search_results.html', found_url)
+ #@page = agent.get found_url
+ #end
+
+ #subject { Gmusic::Search::Agent.send(:collect_links_from, @page) }
+
+ #it 'returns an array of links' do
+ #should be_an Array
+ #end
+ #its(:first) { should be_a Gmusic::Link }
+ #it { should have_at_least(1).item }
+ #end
+
+ describe '.collect_details_from' do
+ before(:each) do
+ prepare_fake_web('search_results.html', found_url)
+ @page = agent.get found_url
+ end
+
+ subject { Gmusic::Search::Agent.send(:collect_details_from, @page) }
+
+ it 'returns an array of links' do
+ should be_an Array
+ end
+ its(:first) { should be_a Hash }
+ its(:first) { should have_key :title }
+ its(:first) { should have_key :artist }
+ its(:first) { should have_key :link }
+ end
+
+ describe '.format' do
+ it 'replaces blank space to +' do
+ Gmusic::Search::Agent.send(:encode_www_form, [' word1 ',' word2']).should eq 'word1+word2'
+ end
+ end
+
+ describe '.format_url' do
+ it 'formats the base url with the given hash' do
+ hash = {
+ title: 'bad romance',
+ artist: 'Lady Gaga',
+ album: 'The Fame Monster',
+ lyric: 'GaGa oh la la'
+ }
+ url = "#{base_url}bad+romance+lady+gaga+the+fame+monster+gaga+oh+la+la"
+
+ Gmusic::Search::Agent.send(:format_url, base_url, hash).should eq url
+ end
+ end
+
+ describe '.sanitize_dirname' do
+ let(:shorten_dir) { '~/Desktop' }
+ let(:invalid_dir) { ' (invalid)*/&dir@ ' }
+
+ it "replace '~' to user's home directory" do
+ Gmusic::Search::Agent.send(:sanitize_dirname, shorten_dir).should eq "#{Dir.home}/Desktop"
+ end
+
+ it "sanitizes directory name by replacing invalid charactor to '_'" do
+ Gmusic::Search::Agent.send(:sanitize_dirname, invalid_dir).should eq '_invalid__/_dir_'
+ end
+ end
+
+ describe '.mkdir' do
+ context 'when the given arg is nil' do
+ let(:path) { nil }
+ subject { Gmusic::Search::Agent.send(:mkdir, path) }
+
+ it 'makes ~/Downloads/gmusic' do
+ subject
+ dir_exist?("#{Dir.home}/Downloads/gmusic").should be
+ end
+
+ it "returns dirname" do
+ subject.should eq "#{Dir.home}/Downloads/gmusic"
+ end
+ end
+
+ context 'when dir exists' do
+ let(:path) { path = Dir.pwd }
+ subject { Gmusic::Search::Agent.send(:mkdir, path) }
+
+ it 'returns dirname' do
+ subject.should eq path
+ end
+ end
+
+ context 'when given absolute path' do
+ let(:path) { "#{Dir.pwd}/for_test" }
+ subject { Gmusic::Search::Agent.send(:mkdir, path) }
+
+ it 'makes dir' do
+ dir_exist?(path).should be
+ end
+
+ it 'returns dirname' do
+ subject.should eq path
+ end
+ end
+
+ context 'when given relative path' do
+ let(:path) { './for_test' }
+ subject { Gmusic::Search::Agent.send(:mkdir, path) }
+
+ it 'makes dir' do
+ dir_exist?(path).should be
+ end
+
+ it 'returns dirname' do
+ subject.should eq path
+ end
+ end
+ end
+
+ end
+end
Oops, something went wrong.

9 comments on commit a50ef20

Owner

towerhe replied Apr 19, 2012

IMO, The engine should not be singleton. You should create a method under Gmusic to create the search engine.

module Gmusic
  def self.search_engine
    @search_engine ||= Search::Engine.new
  end
end

Now you can create the search engine in a strategy way. For instance, you have two or more search engines naming Gmusic::Search::Engine::Mechanize, Whatever::It::Is and ...

You can config the search engine to whatever you like:

Gmusic.configure do |config|
  config.search_engine Whatever::It::Is
end

For supporting configurable engine, you need to modify your Gmusic.search_engine to something like:

def search_engine
  @search_engine ||= begin
    engine = config.search_engine || Gmusic::Search::Engine::Mechanize
    engine.new
  end
end
Owner

towerhe replied Apr 19, 2012

This configuration also can map to the CLI easily.

Collaborator

Jeweller-Tsai replied Apr 19, 2012

The main reason I made the search engine as singleton is that it inherited from Mechanize and Mechanize is basically acts like a browser, it also save cookies and so on. You only need one browser at a duration. Considering you search for some musics and save those links in the DB, if you want to download one of them, you're still using the same browser.

class Song
  def self.search_by_titile(title)
     engine = Search::Engine.instance.search(title)
     #...
   end

   def download
      Search::Engine.instance.download(self)  # the same engine(browser)
   end
end

What if you're using another browser, you'll probably get redirected, for the google CDN sake. But passing the instance around can be tedious, that's why I make it as singleton

Owner

towerhe replied Apr 19, 2012

I mean that Singleton is not needed here. You just need to ensure only one instance of the engine is created. And

@search_engine ||= ...

is for it.

For your use case

class Song
  def self.search_by_title(title)
    Gmusic.search_engine.search(title)
  end

  def self.download
    Gmusic.search_engine.download(self)
  end
end
Collaborator

Jeweller-Tsai replied Apr 19, 2012

Then Search::Engine is the infrastructure of Gmusic, if it's repalceable, like AR in Rails, that would be flexible. But I bet no one would replace it because he have rewrite the whole engine, if he really have to, he may not want use this gem. :-(

Owner

towerhe replied Apr 19, 2012

I do not think so. This flexibility is for the architecture of Gmusic. You are the maintainer not the others. You should worry about the gems you used. If Mechanize is gone or anther new advanced replacement is comming, what should you do?

Collaborator

Jeweller-Tsai replied Apr 19, 2012

Nice. Fix it later

Collaborator

Jeweller-Tsai replied Apr 20, 2012

@towerhe

module Gmusic
  def self.search_engine
    @search_engine ||= Search::Engine.new
  end
end

Is @search_engine, an instance variable suitable here, in a class method? Shouldn't it a class variable ? @@search_engine
If it turns to be a class variable, it in fact is a singleton! Using the singleton module to do the heavy work will be more convenient.

Owner

towerhe replied Apr 20, 2012

@search_engine here acts as a module attribute. It's OK.

You SHOULD NOT restrict the engine to a singleton object. For example, one day you try to introduce a concurrent feature, you need more than one engine to work for you. Each engine will be assigned a task, searching a song or album and downloading it. If the engine is a singleton, you will lose the possibility of using it in a concurrent env.

So how to keep the engine a single object is out of the engine's logic. You should take care of it by yourself.

Please sign in to comment.