Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding the price - issue#5 #7

Closed
wants to merge 7 commits into from
Closed

[WIP] Adding the price - issue#5 #7

wants to merge 7 commits into from

Conversation

Bolzano-Weierstrass
Copy link

answering #5 ticket.

This commit adds the price to retrieved features alongside the book title, average rating, number of ratings and URL.

Shouldn't create any additional bugs, if it does don't hesitate to contact me.

@Bolzano-Weierstrass Bolzano-Weierstrass changed the title Adding the price - issues#5 Adding the price - issue#5 Aug 4, 2018
@coveralls
Copy link

Pull Request Test Coverage Report for Build 59

  • 7 of 7 (100.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+2.3%) to 93.671%

Files with Coverage Reduction New Missed Lines %
amazonscraper/client.py 1 91.8%
Totals Coverage Status
Change from base Build 43: 2.3%
Covered Lines: 148
Relevant Lines: 158

💛 - Coveralls

@tducret
Copy link
Owner

tducret commented Aug 4, 2018

Very good job Thomas @Bolzano-Weierstrass.
I like your integration 👍

Unfortunately, I get different kind of prices :(
When I test it with amazon2csv.py -k "python" -m 10, and check with the web page https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dstripbooks&field-keywords=python

For example :
Learning Python, 5th Edition : $21.21 <= but this is the price for renting
A Smarter Way to Learn Python: Learn it faster. Remember it longer. : $7.84 <= but this is the price for Kindle edition

Also, when I test it with a amazon.fr search url, I only get N/A for prices.
Example :
amazon2csv.py -m 20 -u "https://www.amazon.fr/s/ref=nb_sb_noss_2?__mk_fr_FR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=python&rh=i%3Aaps%2Ck%3Apython"

"Apprendre à programmer avec Python 3",4.2,36,N/A,https://www.amazon.fr/Apprendre-%C3%A0-programmer-avec-Python/dp/2212134347

I knew this was going to be difficult with all these kind of prices...

@Bolzano-Weierstrass
Copy link
Author

hi @tducret ,

Thanks for the feedback,

I noticed it... but since I did not know what was the goal of people using this soft (to buy, to rent, kindle,..) I thought one price would give an indication. Giving it a second thought, giving a price range (min - max) of all avaiable prices might be more interesting. What do you think ?

Regarding the "using the search url directly problem". I've just noticed that Amazon uses EUR instead of € while it uses $ and not USD. I thought it used only single-char currency so it can be fixed.

Thanks

@tducret
Copy link
Owner

tducret commented Aug 4, 2018

"Min price", "Max price" seems a pretty good idea yes.
A more ambitious idea would be to extract every prices with the good category "Kindle Edition", "Paperback" (and even, new, used...). Or perhaps scrape only one kind of price (Paperback for books for instance).
What do you think?

@Bolzano-Weierstrass
Copy link
Author

The issue is that Amazon translates everything even the html/css tags therefore is it very difficult to know what we scrap: one time it is 'paperback' and the next time it is 'broché'...

@tducret
Copy link
Owner

tducret commented Aug 5, 2018

What if we got all prices in a dict with the category indicated (without translation at first)?
Like :

{
  "paperback":"21.16$",
  "kindle edition":"9.99$"
}

or

{
  "broché":"20.50€",
  "format kindle":"9.99€"
}

It would allow to get the min/max, and even translate the different categories in the future.
You could then ask amazon2csv --filter="paperback" to get only the paperback prices.

@Bolzano-Weierstrass Bolzano-Weierstrass changed the title Adding the price - issue#5 [WIP] Adding the price - issue#5 Aug 5, 2018
@Bolzano-Weierstrass
Copy link
Author

I am not convinced by what I've done. I manage to run it locally and I can retrieve only one price and its label given the product html. It is not sufficient to be interesting...

@tducret
Copy link
Owner

tducret commented Aug 6, 2018

Why do you say so @Bolzano-Weierstrass ?
When I run amazon2csv.py -k "python" -m 10, I got :

Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,"https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036",N/A,$27.16,N/A,$24.28
"Learning Python, 5th Edition",4,300,"https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730",$21.87,$31.24,$15.58,$34.10
"A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,"https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B",N/A,$17.96,N/A,$7.75

@Bolzano-Weierstrass
Copy link
Author

Oh, nice :)
When I run the same command as you do, I get :

Product title,Rating,Number of customer reviews,Product URL,Paperback-to rent,Paperback-to buy,Kindle Edition-to rent,Kindle Edition-to buy "Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,318,https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036,N/A,$27.16,N/A,N/A "Learning Python, 5th Edition",4,300,https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730,$21.87,$31.24,N/A,N/A "A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,218,https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B,N/A,N/A,N/A,$7.75

And I tested multiple commands I never got more than 2 not N/A prices(while it is not yout case) so I was not convinced. Moreover it complexifies the code quite a lot. Your call :)

@tducret
Copy link
Owner

tducret commented Aug 6, 2018

Weird... That may be Amazon anti-scraping protections :S
I have to review your code in details but it's true that it seems complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants