Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can xword-dl download NYT variety puzzles? #59

Closed
eigenfoo opened this issue Oct 11, 2022 · 5 comments
Closed

Can xword-dl download NYT variety puzzles? #59

eigenfoo opened this issue Oct 11, 2022 · 5 comments

Comments

@eigenfoo
Copy link

(I've manually edited my xword-dl.yaml to circumvent #58)

# python xword-dl/xword_dl.py nyt --latest
Puzzle downloaded and saved as NY Times - 20221011.puz.

# python xword-dl/xword_dl.py https://www.nytimes.com/crosswords/game/variety/2022/10/02
Unable to find a puzzle at https://www.nytimes.com/crosswords/game/variety/2022/10/02.

I've determined that this is likely because NewYorkTimesDownloader isn't in supported_sites:

xword-dl/xword_dl.py

Lines 76 to 78 in bbb4877

supported_sites = [('wsj.com', WSJDownloader),
('newyorker.com', NewYorkerDownloader),
('amuselabs.com', AmuseLabsDownloader)]

However, adding ('nytimes.com', NewYorkTimesDownloader) to the list produces a JSON error, which I don't think I'm well-equipped to make sense of:

# python xword-dl/xword_dl.py https://www.nytimes.com/crosswords/game/variety/2022/10/02
Traceback (most recent call last):
  File "/home/george/pandas/xword-dl/xword_dl.py", line 1162, in <module>
    main()
  File "/home/george/pandas/xword-dl/xword_dl.py", line 1145, in main
    puzzle, filename = by_url(args.source,
  File "/home/george/pandas/xword-dl/xword_dl.py", line 104, in by_url
    puzzle = dl.download(puzzle_url)
  File "/home/george/pandas/xword-dl/xword_dl.py", line 267, in download
    xword_data = self.fetch_data(solver_url)
  File "/home/george/pandas/xword-dl/xword_dl.py", line 971, in fetch_data
    return res.json()['results'][0]
  File "/home/george/miniconda3/lib/python3.9/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/home/george/miniconda3/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/george/miniconda3/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/george/miniconda3/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
@thisisparker
Copy link
Owner

It is possible in theory (for the subset of puzzles that can be represented in .puz files) but would require a little more code — namely, some glue to map from the URL to the underlying puzzle data. As it stands, when xword-dl is downloading an NYT puzzle it goes through a mostly undocumented oracle and downloads a JSON file directly.

That oracle step is necessary because there is a behind-the-scenes mapping of dates to puzzle IDs for the daily crossword, and the puzzle ID is needed to request the right JSON, but fortunately for Variety it looks like there is a more direct transformation of the URL to the underlying data (e.g., it looks like the JSON data for the puzzle you were seeking is at https://www.nytimes.com/svc/crosswords/v6/puzzle/variety/2022-10-02.json). I haven't looked to see if that file is structured the same as the daily crossword JSON, which would determine whether you could use the existing parsing function.

So! It would require a little more code and I think I'd probably want to structure it as a subclass of the existing NewYorkTimesDownloader instead of expanding that class much more, but that seems doable. If you're interested I could add that, and you'd probably get by-url downloading of daily NYT puzzles "for free," which would be nice.

@eigenfoo
Copy link
Author

eigenfoo commented Oct 12, 2022

Thank you for explaining!

I haven't looked to see if that file is structured the same as the daily crossword JSON, which would determine whether you could use the existing parsing function.

Looking further, I see that this particular JSON (for a puns and anagrams puzzle, which theoretically could be represented as a .puz file) doesn't have the same structure that NewYorkTimesDownloader.parse_xword is expecting: for example, it has neither puzzle_meta nor puzzle_data fields. So, a new parsing function would be necessary.

Would you be willing to take on this additional work, beyond the simple glue to translate the date to the JSON URL? I don't want to burden you with a much larger feature request.

For more background: I am interested in puns and anagrams and cryptic crossword downloads for https://github.com/eigenfoo/cryptics (both puzzle types can be represented as .puz files). Now that I know how to look up the JSONs though, I'm happy to just curl and sit on them until I get a chance to write parsing code (which, to be transparent, won't be anytime soon, since the NYT constitutes a very low volume of cryptic crosswords). Completely your call!

@thisisparker
Copy link
Owner

I think the answer is "possibly," although if you wrote the parsing code first I'd probably be just as happy to incorporate it. The longer answer is there's a big refactor that I've been meaning to do on xword-dl for like.... many months now, and it will make working with additional Downloaders easier, and so I hesitate before doing anything around those... but at some point soon it's going to happen and then it will be an easy call to say yes.

@thisisparker
Copy link
Owner

Just a heads up: I recently completed the refactor described above and I think I will get a chance to add NYT Variety support this week.

@thisisparker
Copy link
Owner

Closed in v2022.11.16 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants