Skip to content

Technology categories are missing in 2024-12-01 crawl #31

Open
@max-ostapenko

Description

@max-ostapenko

Checked like this:

SELECT
  date,
  client,
  category,
  COUNT(DISTINCT root_page)
FROM crawl.pages
LEFT JOIN pages.technologies AS tech
LEFT JOIN tech.categories AS category
WHERE
  date >= '2024-11-01'
  AND rank <= 10000
  AND tech.technology = 'WordPress'
GROUP BY 1,2,3
ORDER BY 1,2,3;
date	        client	category f0_
2024-11-01	desktop	Blogs	 545
2024-11-01	desktop	CMS	 545
2024-11-01	mobile	Blogs	 832
2024-11-01	mobile	CMS	 832
2024-12-01	desktop		 537
2024-12-01	desktop	Blogs	 47
2024-12-01	desktop	CMS	 47
2024-12-01	mobile		 815
2024-12-01	mobile	Blogs	 50
2024-12-01	mobile	CMS	 50
2025-01-01	desktop	Blogs	 534
2025-01-01	desktop	CMS	 534
2025-01-01	mobile	Blogs	 809
2025-01-01	mobile	CMS	 809

@pmeenan do you have an idea?
Any way to restore?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions