Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More automatic preprocessing, add General and Extra decks. #1

Merged
merged 4 commits into from Sep 23, 2013

Conversation

Projects
None yet
2 participants
@gwillen
Copy link
Collaborator

gwillen commented Sep 22, 2013

Hello!

I noticed that you posted this project, but it doesn't look like you ever did the General or Extra questions.

I upgraded your script a little -- now it automatically gets rid of the weird characters, so you don't have to use Office or whatever for that, and it also handles deleted questions so you don't have to delete them yourself.

I ran it on Technician and got the same deck as the one you already posted, and then I also ran it on General and Extra and put those in the repository too.

Let me know what you think!

@gwillen

This comment has been minimized.

Copy link
Collaborator Author

gwillen commented Sep 22, 2013

Oops, added a second commit -- General and Extra each had one bogus character in them that was screwing things up. General now works; Extra hangs Mnemosyne when I import it, and I'm not yet sure why. Investigating.

@gwillen

This comment has been minimized.

Copy link
Collaborator Author

gwillen commented Sep 22, 2013

The problem with Extra is that we're not getting the category labels.

In the process of debugging that, I discovered that the category marker in the General and Technician files is not actually always an ASCII hyphen. About half the time it's an en-dash instead (one of the messy characters that need to be stripped out.) So I've switched from turning the Latin-1 endash into a UTF-8 endash, and just make it a hyphen instead. Suddenly a bunch of questions are in the correct categories where they were in the wrong ones before.

This fix is about to end up being superfluous in light of the next one, though...

@gwillen

This comment has been minimized.

Copy link
Collaborator Author

gwillen commented Sep 22, 2013

And the final fix.

Where General and Technician mark categories with "XXX - Description", the Extra file -- for no reason at all -- just uses "XXX Description".

Fortunately, that XXX is always letter-number-letter, followed by a space, at the start of a line; that never appears anywhere else in the file. So we switch to just looking for that.

(You see why I said the fix in finding hyphens was going to become superfluous.)

So now we can successfully import Tech, General, and Extra!

That was exciting. I kind of see why you gave up before. :-)

@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 22, 2013

I haven't thought about this project in two years, so you can imagine my excitement when I saw this pull request show up in my inbox!

I'm sad that this didn't work on my mac (I don't blame you for not checking that), but super excited it worked on my Linux boxen. Not a blocker. I'll open an issue on the tracker with complete details if you want to look at it later.

What is a blocker is my fault: licensing. This project has no license/usage/copyright file presently. Are you ok with using a simple BSD/MIT style license for this? Akin to this one (with both of our names present): https://github.com/tbielawa/PAD-XMPP/blob/master/COPYING ?

If you're OK with that license then I'll pull in your branch immediately. If not I'm open to suggestions.

Thanks for doing all this! Like I said, I was completely taken by surprise to see someone else interested in this :-) Thanks!

@gwillen

This comment has been minimized.

Copy link
Collaborator Author

gwillen commented Sep 22, 2013

Hi!

I actually wrote the change on a Mac, so I'm excited that it works on your Linux boxes. I'm a straggler on OS X 10.6; I imagine you're on something newer?

The licensing issue is actually interesting, because the the decks themselves are derived from material that's presumably copyrighted by the ARRL; and currently the repository contains material downloaded directly from the ARRL as well. (This is partly my fault, as I added more to what was already there.)

As for the code itself, I grant permission to license my code under anything you like as long as it's Free and/or Open. BSD or MIT is fine.

Do let me know what sort of failure you see on your OS X machine, and what version it is.

And thanks for starting this project! I never would have gotten around to doing this myself, as I don't know the Mnemosyne deck format and probably couldn't have been arsed to learn it. I orgiinally found the same old decks you mention in your README, was disappointed the files were missing, and was excited to discover that someone had replaced them. I'm only surprised nobody's gotten around to doing General or Extra before now. (I already have my Technician license, so the latter two decks were really what I was after!)

@gwillen

This comment has been minimized.

Copy link
Collaborator Author

gwillen commented Sep 22, 2013

The right way to handle the licensing issue, incidentally, is to include a script for downloading the files (which will break next time they update them, doubtless) or instruct the user to download the files themselves. But first I'd have to fix the last vestiges of the manual stuff I had to do to the file; and ALSO we'd have to special-case, or give up on, the one question where they forgot a linebreak in their own file.

@tbielawa tbielawa merged commit 43741df into tbielawa:master Sep 23, 2013

@tbielawa

This comment has been minimized.

Copy link
Owner

tbielawa commented Sep 23, 2013

Merged your pull request in!

ACK to the 'script for downloading the files' idea.

I've emailed the NCVEC in issue #2 and am hoping we'll have a proper solution to this issue by means of them including the necessary usage and redistribution information on their website.

Thanks for your patches! Feel free to check out #4 (the sed on mac issue) if you're still interested in contributing.

If you want to contact me to talk about this more I'm on freenode IRC from 9-5 EST as tbielawa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.