New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MusicXML Parser for Magenta #214
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
@cghawthorne or @iansimon: can one of your take this review? |
Yup, I can definitely look at it. Wouldn't mind a second pair of eyes, though. @jsawruk we'll start the review once the CLA is signed. Thanks again for your work on this! |
I'll take a look also. -Ian On Thu, Sep 15, 2016 at 2:02 PM cghawthorne notifications@github.com
|
I signed it! On Thu, Sep 15, 2016 at 3:56 PM, googlebot notifications@github.com wrote:
|
@jsawruk the cla bot still doesn't seem happy. Can you try replying using the github web interface? |
I updated the contact info email address on the CLA to match the email address on my Github account. Hopefully this resolves the issue. |
Might need to re-reply with "I signed it!" |
I signed it! |
@jsawruk: sorry about the confusion here. It looks like we're not able to map your GitHub username to the email address used in your commits. Could you add your *@jwpepper.com email address to your GitHub account? That should take care of this. |
Ok, I have added that email address to my Github account and it is now verified. |
CLAs look good, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of tests are failing because a nonexistent field called "bpm" is being accessed in the Tempo object in the NoteSequence protobuf. This field should be called "qpm" and is defined as quarter notes per minute. Here's the message on the Magenta mailing list describing the change from bpm to qpm: https://groups.google.com/a/tensorflow.org/d/msg/magenta-discuss/b33XYjXjj-U/ozfaXAvwDAAJ
It looks as though MusicXML also defines tempo as quarter notes per minute, so you probably want to call the field "qpm" in your Tempo class in musicxml_parser.py as well.
In process of updating my code. I just rebased against master, and am now getting this error when running the unit tests:
What changed? |
I think you're running into a merge conflict with #223. We recently moved a bunch of our modules that used to be in /lib to /music. You'll probably also want your musicxml code to be in /music now. |
I'm still getting errors. I checked out the master branch at commit 0056294, which includes #223. When I try to import using python, I get this error:
This does include the new /music directory, and the BUILD file there looks correct. |
Are you having issues when running bazel test, or only in interactive python? When working on libraries for core magenta, we recommend testing via bazel. |
I wasn't running the bazel test correctly, so I tried it in interactive Python. Once I ran the bazel test correctly, I was able to resolve the issue; the bazel tests now pass on my system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work here! I've mostly looked at style/structure issues thus far and will get to more specifics in a later pass.
@@ -171,7 +171,7 @@ py_library( | |||
py_test( | |||
name = "midi_io_test", | |||
srcs = ["midi_io_test.py"], | |||
data = ["//magenta/testdata"], | |||
data = ["//magenta/testdata:testdata"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for this change? the :testdata target is implicit for the /testdata package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Maybe I was having issues initially, but the tests pass now, so it should not have :testdata. Thanks.
@@ -180,6 +180,26 @@ py_test( | |||
) | |||
|
|||
py_library( | |||
name = "musicxml_reader", | |||
srcs = ["musicxml_reader.py", "musicxml_parser.py"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prefer for each python file to have its own py_library target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Will be in next commit.
# Constants | ||
|
||
# Number of MIDI ticks per beat. Required by NoteSequence structure | ||
TICKS_PER_BEAT = 960 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've standardized on magenta.music.constants.STANDARD_PPQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next commit.
@@ -34,5 +34,9 @@ filegroup( | |||
"example_is_drum.mid", | |||
"notesequences.tfrecord", | |||
"tfrecord_iterator_test.tfrecord", | |||
'clarinet_scale.xml', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd recommend putting these in music/testdata because they'll only be used to verify your importer. Also, when you do that, please update the README in that folder to describe the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. This is fixed and will be in the next commit.
|
||
# Global variables | ||
# Default to one division per measure | ||
current_divisions = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package-level constants should be named in uppercase: https://google.github.io/styleguide/pyguide.html#Naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed after using pylint.
current_tempo = 120 | ||
|
||
# Duration of a single beat in seconds | ||
current_seconds_per_beat = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like some of these package-level variables actually get modified in functions below. I would prefer that variables related to parsing a file live in the function or instance for parsing that file. For example, right now this class isn't threadsafe because every invocation modifies the same variables, but there's no reason it couldn't be threadsafe because this shared state isn't actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good solution for this at the moment. I know the code isn't threadsafe because I am mutating global variables, but parsing MusicXML files can be tricky. MusicXML files (like most/all? xml files) do not guarantee order of elements, only that the elements exist. For example, let's say there is a tempo marking over beat 1 in the first measure of quarter = 96. Ideally, I'd like to know the tempo before parsing the note on beat one, but MusicXML does not guarantee this. That example is trivial, but it gets more complicated when keeping track of time and you encounter the <forward>
and <backup>
elements.
I'd like to keep track of this state somehow, but my only two potential solutions were to pass around a state object to every class, or to use globals. Since I wanted to get things working quickly, I resorted to globals. I can now refactor to something more elegant and threadsafe. Do you have any alternative solutions to this problem? If not, I'll need to pass around some object that maintains these state variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you could use an instance variable rather than a global variable. So you could create an instance of the parser for every MusicXML file you're parsing and have instance variables to keep track of parsing state while you're going through the file.
return score_str | ||
|
||
|
||
class Part: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these classes should have a docstring describing what information they contain.
midi_pitch = (12 + pitch_class) + (int(octave) * 12) | ||
return midi_pitch | ||
|
||
def __str__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method outputting a particular format, or just something to make debugging easier to read? Would probably be good to document in the docstring for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're asking about the str function, it was just to make my debugging easier as I was developing. Would you like me to remove the str method altogether, or do you have a proposed format that would be helpful to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, just wanted to make sure I understood the intention.
from musicxml_reader import * | ||
import tensorflow as tf | ||
|
||
# self.flute_scale_filename contains an F-major scale of 8 quarter notes each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fit better in the docstring for the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to docstring. Will be in the next commit.
note.pitch = musicxml_note.pitch[1] # Index 1 = MIDI pitch number | ||
note.velocity = musicxml_note.velocity | ||
|
||
# TODO(@douglaseck): Estimate note type (e.g. quarter note) and populate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I feel this is extremely relevant based on conversations I had with you and @douglaseck. With MusicXML, we can get the exact note length. With MIDI, we just get a ticks duration. @douglaseck was suggesting a quantization algorithm when reading MIDI data to feed into your model. Since MusicXML specifies the exact note type, I'd like to also supply that to your model. However, I didn't implement it because I wasn't sure what format you were expecting.
My guess is:
- Whole note = 1
- Quarter note = 1/4
- Dotted Quarter = 3/8
- Triplet Eighth = 1/12
etc.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks correct to me. Feel free to start filling out those fields.
I recommend running pylint on your code. You can even get something close to Google's linter configuration, described in this StackOverflow answer: http://stackoverflow.com/questions/29597618/is-there-a-tool-to-lint-python-based-on-the-google-style-guide Basically, install pylint, download that googlecl-pylint.rc file, then run:
|
@iansimon: Thanks. I'm linting the code right now using that pylint configuration. Not done yet, but soon. Then I will address remaining issues raised by @cghawthorne. |
@iansimon: I am getting a pylint error that I don't know how to resolve. Bad option value 'g-import-not-at-top' (bad-option-value) Please compare with the code here: |
There seems to be a missing file el_capitan.xml being referred to in your tests. Did you forget to check this in? |
@iansimon: Thanks, I missed that when I changed where the test files are located. |
|
||
xml_duration = xml_forward.find('duration') | ||
forward_duration = int(xml_duration.text) | ||
midi_ticks = forward_duration * (constants.STANDARD_PPQ / CURRENT_DIVISIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have CURRENT_DIVISIONS commented out above.
I fixed many of the lint errors, mainly just ones that required no understanding of the code. There are still several that remain, largely missing/malformed docstring sections. There's also a CURRENT_DIVISIONS variable in musicxml_parser.py that is commented out in some places but not others. |
@iansimon: Thanks for fixing the lint errors. I have been extremely busy at work and was hoping to work more on this code at the Classical Music Hack Day at MIT this coming weekend. |
I believe this latest commit addresses all issues raised so far. I have implemented the note.numerator and note.denominator support (@douglaseck's TODO) |
…Fix tempo parsing errors
… MusicXMLParserState object
… by reading the symbolic note durations from the MusicXML. This completes Douglas Eck's TODO.
…Magenta. Also raise exception for unknown pitch steps
… handle parse errors. Fix MIDI program and channel mapping
…a single compressed MXL file
Hey Jeremy, I think this is looking pretty good! We might need some followup changes, but I think it's in a good shape to check in now! |
Lightweight MusicXML parser for Magenta with no dependencies. Handles uncompressed (.xml) and compressed (.mxl) files. Translates MusicXML file into NoteSequences similar to how MIDI is imported.
Because MusicXML differs from MIDI, different information is available to Magenta. For example, MusicXML does not contain MIDI CC data. However, MusicXML does contain other information (such as dynamic markings) which may be more useful in certain scenarios, such as using Magenta to generate sheet music (MusicXML files) in addition to MIDI files.
This parser currently only supports input and does not support output to MusicXML files.
Four public domain MusicXML files are included for unit testing.