Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nielsen Scanner #8

Open
wbinzhe opened this issue Jul 11, 2021 · 13 comments
Open

Nielsen Scanner #8

wbinzhe opened this issue Jul 11, 2021 · 13 comments

Comments

@wbinzhe
Copy link
Owner

wbinzhe commented Jul 11, 2021

Movement datasets

  1. why there were zero-unit observations?
  2. proper way to aggregate? concerns: different units (counts, pounds).
  3. price change = f(retailer strategy, supplier/brand strategy). For retailers' strategy toward climate risk, they can reset price, or change suppliers.
@shoonlee
Copy link
Collaborator

shoonlee commented Aug 4, 2021

@wbinzhe

  • I've talked to my Nielsen friends and the SUL server seems to be good enough to try the Nielsen data cleaning. Please start data cleaning right away so that we can show something to Siqi next Thursday (Aug 12). I think we can start with the last 2-3 years (which are closer to Safegraph), show it to Siqi, and expand the analysis to earlier years later.
  • Aggregation is a good question. People construct a price index using Nielsen data. Beraja et al (2019) is widely cited for index construction. A forthcoming paper by Leung (at ReStat) has a replication code for price index construction following Beraja (and also for overall data cleaning) if you need guidance.
  • Again about the price index construction, we could start with the overall price index (putting every repeated product into the index basket) and for each category (e.g., medicine, food, general merchandise, etc in a CVS) within a given store later. Again for the Aug 12 meeting, we could do the overall price index.
  • The third point is also a good point. Consult other papers and see if they mention it (and if so how they handle it).

@shoonlee
Copy link
Collaborator

shoonlee commented Aug 5, 2021

@wbinzhe

I think it might be helpful for you to create a few slides and talk through them in our Aug 12 meeting. I want you to cover (at least) the following:

  • How to aggregate price at store level (namely, how to construct price indexes)
    • To make it concrete, a toy example would be very helpful here
  • Some initial results (in a similar specification as before - regressing temperature on prices and revenues) with a sample of data
    • Depending on the processing time, you could use a sample of categories for the last few years
  • Overall plan (including timeline) with the Nielsen data cleaning and analysis

@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 5, 2021

@wbinzhe

I think it might be helpful for you to create a few slides and talk through them in our Aug 12 meeting. I want you to cover (at least) the following:

  • How to aggregate price at store level (namely, how to construct price indexes)

    • To make it concrete, a toy example would be very helpful here
  • Some initial results (in a similar specification as before - regressing temperature on prices and revenues) with a sample of data

    • Depending on the processing time, you could use a sample of categories for the last few years
  • Overall plan (including timeline) with the Nielsen data cleaning and analysis

@shoonlee Sounds good, thanks Seunghoon. I'll draft these slides for our meeting next Monday and by then I should have a better sense of what can I show to Siqi on Thursday.

@shoonlee
Copy link
Collaborator

shoonlee commented Aug 5, 2021 via email

@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 11, 2021

@shoonlee Hi Seunghoon, I added the illustration of the Price Index Construction in slides #50-58 in G-slides. Now it only has 450 stores (~1% random sample), I will keep the program running till this evening to have more store samples and merge price index with temperature data.

@shoonlee
Copy link
Collaborator

shoonlee commented Aug 11, 2021 via email

@shoonlee
Copy link
Collaborator

shoonlee commented Aug 11, 2021 via email

@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 11, 2021

@shoonlee Sure Seunghoon. Actually all numbers put in the slides are real observations from one specific store, let me directly present the calculations there.

@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 11, 2021

Hi Binzhe, Thanks for putting this together. I think it might be helpful to add a more concrete example. Pick a product group (e.g., yogurt or dairy products depending on the actual level) and clearly show how the construction works. One thing a bit confusing for me was q_i, y-1 (average quantity sold in each quarter in the previous year). Do you take the average of the entire year or by each quarter? In other words, is q_i,y-1 different for each quarter or is this quantity the same as long as it's in the same year? Show a toy example would clarify these kinds of questions.

On Wed, Aug 11, 2021 at 1:29 PM wbinzhe @.***> wrote: @shoonlee https://github.com/shoonlee Hi Seunghoon, I added the illustration of the Price Index Construction in slides #50-58 in G-slides https://docs.google.com/presentation/d/14_aDxt2O_Le4mCJj4lBfuK-rG9gI6WA8U69lhPJajis/edit#slide=id.ge48c9d8e4f_0_0. Now it only has 450 stores (~1% random sample), I will keep the program running till this evening to have more store samples and merge price index with temperature data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMM5CBGGTYA7IJDNPLASQYDT4KXRJANCNFSM5AFUNBRQ .

Hi Binzhe, Thanks for putting this together. I think it might be helpful to add a more concrete example. Pick a product group (e.g., yogurt or dairy products depending on the actual level) and clearly show how the construction works. One thing a bit confusing for me was q_i, y-1 (average quantity sold in each quarter in the previous year). Do you take the average of the entire year or by each quarter? In other words, is q_i,y-1 different for each quarter or is this quantity the same as long as it's in the same year? Show a toy example would clarify these kinds of questions.

On Wed, Aug 11, 2021 at 1:29 PM wbinzhe @.***> wrote: @shoonlee https://github.com/shoonlee Hi Seunghoon, I added the illustration of the Price Index Construction in slides #50-58 in G-slides https://docs.google.com/presentation/d/14_aDxt2O_Le4mCJj4lBfuK-rG9gI6WA8U69lhPJajis/edit#slide=id.ge48c9d8e4f_0_0. Now it only has 450 stores (~1% random sample), I will keep the program running till this evening to have more store samples and merge price index with temperature data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMM5CBGGTYA7IJDNPLASQYDT4KXRJANCNFSM5AFUNBRQ .

@shoonlee q_{i,y-1} in equation 1 is the average of the previous year (i.e., same for all quarters in the same year). Both Leung (2020) and Beraja et al. (2105) use this weight without variation across quarters.

@shoonlee
Copy link
Collaborator

@wbinzhe

Can you look into the following two things? These are what we've already discussed in the meeting with Siqi but please let me know if further clarification is needed. Please give me a brief update on Friday.

  • Update revenue plot and fix potential errors as necessary (the one in #77-79)
    • Create similar plot using quantity sold as an outcome variable for the selected three groups
  • Update price index figure at the store level (the one in #67)

@shoonlee
Copy link
Collaborator

shoonlee commented Aug 28, 2021

@wbinzhe

Following up on our conversation today, can you try making graphs about the attrition rate as described below? By attrition rate, I mean the percentage of goods that are not in the base basket (e.g., in year t+1 basket, only 80% of goods overlaps with the base basket goods -> attrition rate is 20%).

It will be a nice summary of the data as well as a useful sanity check of what we're doing. I think we can create these before running the time consuming part of the code #4, #5.

  • For code 4 (fixed basket), can you create a plot of attrition rate over time by each product group? You can pick 5 product groups (choose 5 including the three you've used before) for this exercise. Suppose we start from 2006 (or the earliest year yoo have already cleaned). As we add more years, attrition rate should be weakly increasing over time.
  • For code 5 (chain basket), create a plot of year-to-year attrition rate by each of the five product groups (calculate attrition rate between 2006-2007 and 2007-2008, etc). If there's any outlier either within product group or arcoss product group, investigate them. I think it should be roughly the same over the course of years for each product group although there might be substantial level differences across product groups.

Let me know if any clarification is needed. Thanks!!

@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 28, 2021

@wbinzhe

Following up on our conversation today, can you try making graphs about the attrition rate as described below? By attrition rate, I mean the percentage of goods that are not in the base basket (e.g., in year t+1 basket, only 80% of goods overlaps with the base basket goods -> attrition rate is 20%).

It will be a nice summary of the data as well as a useful sanity check of what we're doing. I think we can create these before running the time consuming part of the code #4, #5.

  • For code 4 (fixed basket), can you create a plot of attrition rate over time by each product group? You can pick 5 product groups (choose 5 including the three you've used before) for this exercise. Suppose we start from 2006 (or the earliest year yoo have already cleaned). As we add more years, attrition rate should be weakly increasing over time.
  • For code 5 (chain basket), create a plot of year-to-year attrition rate by each of the five product groups (calculate attrition rate between 2006-2007 and 2007-2008, etc). If there's any outlier either within product group or arcoss product group, investigate them. I think it should be roughly the same over the course of years for each product group although there might be substantial level differences across product groups.

Let me know if any clarification is needed. Thanks!!

@shoonlee Sure will also do it this Saturday!

@wbinzhe wbinzhe changed the title Nielson Scanner Nielsen Scanner Aug 29, 2021
@wbinzhe
Copy link
Owner Author

wbinzhe commented Aug 29, 2021

@wbinzhe
Following up on our conversation today, can you try making graphs about the attrition rate as described below? By attrition rate, I mean the percentage of goods that are not in the base basket (e.g., in year t+1 basket, only 80% of goods overlaps with the base basket goods -> attrition rate is 20%).
It will be a nice summary of the data as well as a useful sanity check of what we're doing. I think we can create these before running the time consuming part of the code #4, #5.

  • For code 4 (fixed basket), can you create a plot of attrition rate over time by each product group? You can pick 5 product groups (choose 5 including the three you've used before) for this exercise. Suppose we start from 2006 (or the earliest year yoo have already cleaned). As we add more years, attrition rate should be weakly increasing over time.
  • For code 5 (chain basket), create a plot of year-to-year attrition rate by each of the five product groups (calculate attrition rate between 2006-2007 and 2007-2008, etc). If there's any outlier either within product group or arcoss product group, investigate them. I think it should be roughly the same over the course of years for each product group although there might be substantial level differences across product groups.

Let me know if any clarification is needed. Thanks!!

@shoonlee Sure will also do it this Saturday!

@shoonlee I fixed the problem in sales (also price): in a paralleling step, the default orders of elements in input lists are not identical, causing problems in combining data of store i year 2018/2019 with data of store j year 2016/2017. The group-level plots looks good now. And I will continue to work on the rest of the tasks today and let you know when they are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants