Skip to content

yuyuhan0306/Influenster_haircare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Influenster_haircare

I conducted two-level web scraping to obtain top shampoo, conditioner and hair oil products from Influenster.com. All the Scrapy codes are in the "influenster" folder. Based on over 77,000 reviews and 162 products, I made a comparison word cloud to check what are keywords for different product category. Additionally, I created a search engine using TF-IDF and cosine similarity to check the resemblance between a user's query and product name. There are some interesting insights found in this dataset:

(1) Most of the products belong to household brands.

(2) There are more active Influenster users from California, Florida, Texas and New York.

(3) There is a negative relationship between the number of reviews and rating score.

(4) Functions and the scent of hair care products are of great importance.

(5) The self-developed search engine, applied with TF-IDF and cosine similarity concepts, will work even better if I include product descriptions. By adding up product descriptions, users can have a higher probability to match their inputs to not only product names but product descriptions, so that they are able to retrieve more related merchandises and explore new features of products.

I made a Shiny app to visulize all the results. If you would like to know more about the workflow, concepts, and insights, please feel free to check out my blog post. https://blog.nycdatascience.com/student-works/web-scraping-influenster-find-popular-hair-care-product/

About

Conducting two-level web scraping to obtain top shampoo, conditioner and hair oil products from Influenster.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages