Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
More automatic preprocessing, add General and Extra decks. #1
I noticed that you posted this project, but it doesn't look like you ever did the General or Extra questions.
I upgraded your script a little -- now it automatically gets rid of the weird characters, so you don't have to use Office or whatever for that, and it also handles deleted questions so you don't have to delete them yourself.
I ran it on Technician and got the same deck as the one you already posted, and then I also ran it on General and Extra and put those in the repository too.
Let me know what you think!
The problem with Extra is that we're not getting the category labels.
In the process of debugging that, I discovered that the category marker in the General and Technician files is not actually always an ASCII hyphen. About half the time it's an en-dash instead (one of the messy characters that need to be stripped out.) So I've switched from turning the Latin-1 endash into a UTF-8 endash, and just make it a hyphen instead. Suddenly a bunch of questions are in the correct categories where they were in the wrong ones before.
This fix is about to end up being superfluous in light of the next one, though...
And the final fix.
Where General and Technician mark categories with "XXX - Description", the Extra file -- for no reason at all -- just uses "XXX Description".
Fortunately, that XXX is always letter-number-letter, followed by a space, at the start of a line; that never appears anywhere else in the file. So we switch to just looking for that.
(You see why I said the fix in finding hyphens was going to become superfluous.)
So now we can successfully import Tech, General, and Extra!
That was exciting. I kind of see why you gave up before. :-)
I haven't thought about this project in two years, so you can imagine my excitement when I saw this pull request show up in my inbox!
I'm sad that this didn't work on my mac (I don't blame you for not checking that), but super excited it worked on my Linux boxen. Not a blocker. I'll open an issue on the tracker with complete details if you want to look at it later.
What is a blocker is my fault: licensing. This project has no license/usage/copyright file presently. Are you ok with using a simple BSD/MIT style license for this? Akin to this one (with both of our names present): https://github.com/tbielawa/PAD-XMPP/blob/master/COPYING ?
If you're OK with that license then I'll pull in your branch immediately. If not I'm open to suggestions.
Thanks for doing all this! Like I said, I was completely taken by surprise to see someone else interested in this :-) Thanks!
I actually wrote the change on a Mac, so I'm excited that it works on your Linux boxes. I'm a straggler on OS X 10.6; I imagine you're on something newer?
The licensing issue is actually interesting, because the the decks themselves are derived from material that's presumably copyrighted by the ARRL; and currently the repository contains material downloaded directly from the ARRL as well. (This is partly my fault, as I added more to what was already there.)
As for the code itself, I grant permission to license my code under anything you like as long as it's Free and/or Open. BSD or MIT is fine.
Do let me know what sort of failure you see on your OS X machine, and what version it is.
And thanks for starting this project! I never would have gotten around to doing this myself, as I don't know the Mnemosyne deck format and probably couldn't have been arsed to learn it. I orgiinally found the same old decks you mention in your README, was disappointed the files were missing, and was excited to discover that someone had replaced them. I'm only surprised nobody's gotten around to doing General or Extra before now. (I already have my Technician license, so the latter two decks were really what I was after!)
The right way to handle the licensing issue, incidentally, is to include a script for downloading the files (which will break next time they update them, doubtless) or instruct the user to download the files themselves. But first I'd have to fix the last vestiges of the manual stuff I had to do to the file; and ALSO we'd have to special-case, or give up on, the one question where they forgot a linebreak in their own file.
Merged your pull request in!
ACK to the 'script for downloading the files' idea.
I've emailed the NCVEC in issue #2 and am hoping we'll have a proper solution to this issue by means of them including the necessary usage and redistribution information on their website.
Thanks for your patches! Feel free to check out #4 (the sed on mac issue) if you're still interested in contributing.
If you want to contact me to talk about this more I'm on freenode IRC from 9-5 EST as tbielawa.