Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MusicXML Parser for Magenta #214

Merged
merged 30 commits into from Nov 2, 2016
Merged

MusicXML Parser for Magenta #214

merged 30 commits into from Nov 2, 2016

Conversation

jsawruk
Copy link
Contributor

@jsawruk jsawruk commented Sep 15, 2016

Lightweight MusicXML parser for Magenta with no dependencies. Handles uncompressed (.xml) and compressed (.mxl) files. Translates MusicXML file into NoteSequences similar to how MIDI is imported.

Because MusicXML differs from MIDI, different information is available to Magenta. For example, MusicXML does not contain MIDI CC data. However, MusicXML does contain other information (such as dynamic markings) which may be more useful in certain scenarios, such as using Magenta to generate sheet music (MusicXML files) in addition to MIDI files.

This parser currently only supports input and does not support output to MusicXML files.

Four public domain MusicXML files are included for unit testing.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@adarob
Copy link
Contributor

adarob commented Sep 15, 2016

@cghawthorne or @iansimon: can one of your take this review?

@cghawthorne
Copy link
Contributor

Yup, I can definitely look at it. Wouldn't mind a second pair of eyes, though.

@jsawruk we'll start the review once the CLA is signed. Thanks again for your work on this!

@cghawthorne cghawthorne self-assigned this Sep 15, 2016
@iansimon
Copy link
Collaborator

I'll take a look also.

-Ian

On Thu, Sep 15, 2016 at 2:02 PM cghawthorne notifications@github.com
wrote:

Yup, I can definitely look at it. Wouldn't mind a second pair of eyes,
though.

@jsawruk https://github.com/jsawruk we'll start the review once the CLA
is signed. Thanks again for your work on this!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#214 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABTHb6_coRvnUz7DefMLSn9WPol1tVzlks5qqbJwgaJpZM4J-R_e
.

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 23, 2016

I signed it!

On Thu, Sep 15, 2016 at 3:56 PM, googlebot notifications@github.com wrote:

Thanks for your pull request. It looks like this may be your first
contribution to a Google open source project. Before we can look at your
pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/
https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll

verify. Thanks.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#214 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAZxWQclQSKQz5a6tlNA4FwAtj84Zpu2ks5qqaLdgaJpZM4J-R_e
.

@cghawthorne
Copy link
Contributor

@jsawruk the cla bot still doesn't seem happy. Can you try replying using the github web interface?

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 23, 2016

I updated the contact info email address on the CLA to match the email address on my Github account. Hopefully this resolves the issue.

@cghawthorne
Copy link
Contributor

Might need to re-reply with "I signed it!"

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 27, 2016

I signed it!

@willnorris
Copy link
Contributor

@jsawruk: sorry about the confusion here. It looks like we're not able to map your GitHub username to the email address used in your commits. Could you add your *@jwpepper.com email address to your GitHub account? That should take care of this.

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 27, 2016

Ok, I have added that email address to my Github account and it is now verified.

@googlebot
Copy link

CLAs look good, thanks!

Copy link
Collaborator

@iansimon iansimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of tests are failing because a nonexistent field called "bpm" is being accessed in the Tempo object in the NoteSequence protobuf. This field should be called "qpm" and is defined as quarter notes per minute. Here's the message on the Magenta mailing list describing the change from bpm to qpm: https://groups.google.com/a/tensorflow.org/d/msg/magenta-discuss/b33XYjXjj-U/ozfaXAvwDAAJ

It looks as though MusicXML also defines tempo as quarter notes per minute, so you probably want to call the field "qpm" in your Tempo class in musicxml_parser.py as well.

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 27, 2016

In process of updating my code. I just rebased against master, and am now getting this error when running the unit tests:

Traceback (most recent call last):
  File "musicxml_parser_test.py", line 20, in <module>
    from musicxml_reader import *
  File "/Users/jsawruk/magenta/magenta/music/musicxml_reader.py", line 29, in <module>
    from magenta.protobuf import music_pb2
ImportError: No module named magenta.protobuf

What changed?

@cghawthorne
Copy link
Contributor

I think you're running into a merge conflict with #223. We recently moved a bunch of our modules that used to be in /lib to /music. You'll probably also want your musicxml code to be in /music now.

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 28, 2016

I'm still getting errors. I checked out the master branch at commit 0056294, which includes #223. When I try to import using python, I get this error:

>>> from magenta.protobuf import music_pb2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "magenta/__init__.py", line 26, in <module>
    import magenta.music.melodies_lib
ImportError: No module named music.melodies_lib

This does include the new /music directory, and the BUILD file there looks correct.

@iansimon
Copy link
Collaborator

Are you having issues when running bazel test, or only in interactive python? When working on libraries for core magenta, we recommend testing via bazel.

@jsawruk
Copy link
Contributor Author

jsawruk commented Sep 28, 2016

I wasn't running the bazel test correctly, so I tried it in interactive Python. Once I ran the bazel test correctly, I was able to resolve the issue; the bazel tests now pass on my system.

Copy link
Contributor

@cghawthorne cghawthorne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work here! I've mostly looked at style/structure issues thus far and will get to more specifics in a later pass.

@@ -171,7 +171,7 @@ py_library(
py_test(
name = "midi_io_test",
srcs = ["midi_io_test.py"],
data = ["//magenta/testdata"],
data = ["//magenta/testdata:testdata"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for this change? the :testdata target is implicit for the /testdata package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. Maybe I was having issues initially, but the tests pass now, so it should not have :testdata. Thanks.

@@ -180,6 +180,26 @@ py_test(
)

py_library(
name = "musicxml_reader",
srcs = ["musicxml_reader.py", "musicxml_parser.py"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prefer for each python file to have its own py_library target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Will be in next commit.

# Constants

# Number of MIDI ticks per beat. Required by NoteSequence structure
TICKS_PER_BEAT = 960
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've standardized on magenta.music.constants.STANDARD_PPQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next commit.

@@ -34,5 +34,9 @@ filegroup(
"example_is_drum.mid",
"notesequences.tfrecord",
"tfrecord_iterator_test.tfrecord",
'clarinet_scale.xml',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd recommend putting these in music/testdata because they'll only be used to verify your importer. Also, when you do that, please update the README in that folder to describe the files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This is fixed and will be in the next commit.


# Global variables
# Default to one division per measure
current_divisions = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package-level constants should be named in uppercase: https://google.github.io/styleguide/pyguide.html#Naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed after using pylint.

current_tempo = 120

# Duration of a single beat in seconds
current_seconds_per_beat = 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like some of these package-level variables actually get modified in functions below. I would prefer that variables related to parsing a file live in the function or instance for parsing that file. For example, right now this class isn't threadsafe because every invocation modifies the same variables, but there's no reason it couldn't be threadsafe because this shared state isn't actually needed.

Copy link
Contributor Author

@jsawruk jsawruk Oct 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good solution for this at the moment. I know the code isn't threadsafe because I am mutating global variables, but parsing MusicXML files can be tricky. MusicXML files (like most/all? xml files) do not guarantee order of elements, only that the elements exist. For example, let's say there is a tempo marking over beat 1 in the first measure of quarter = 96. Ideally, I'd like to know the tempo before parsing the note on beat one, but MusicXML does not guarantee this. That example is trivial, but it gets more complicated when keeping track of time and you encounter the <forward> and <backup> elements.

I'd like to keep track of this state somehow, but my only two potential solutions were to pass around a state object to every class, or to use globals. Since I wanted to get things working quickly, I resorted to globals. I can now refactor to something more elegant and threadsafe. Do you have any alternative solutions to this problem? If not, I'll need to pass around some object that maintains these state variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you could use an instance variable rather than a global variable. So you could create an instance of the parser for every MusicXML file you're parsing and have instance variables to keep track of parsing state while you're going through the file.

return score_str


class Part:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these classes should have a docstring describing what information they contain.

midi_pitch = (12 + pitch_class) + (int(octave) * 12)
return midi_pitch

def __str__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method outputting a particular format, or just something to make debugging easier to read? Would probably be good to document in the docstring for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're asking about the str function, it was just to make my debugging easier as I was developing. Would you like me to remove the str method altogether, or do you have a proposed format that would be helpful to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, just wanted to make sure I understood the intention.

from musicxml_reader import *
import tensorflow as tf

# self.flute_scale_filename contains an F-major scale of 8 quarter notes each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fit better in the docstring for the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to docstring. Will be in the next commit.

note.pitch = musicxml_note.pitch[1] # Index 1 = MIDI pitch number
note.velocity = musicxml_note.velocity

# TODO(@douglaseck): Estimate note type (e.g. quarter note) and populate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I feel this is extremely relevant based on conversations I had with you and @douglaseck. With MusicXML, we can get the exact note length. With MIDI, we just get a ticks duration. @douglaseck was suggesting a quantization algorithm when reading MIDI data to feed into your model. Since MusicXML specifies the exact note type, I'd like to also supply that to your model. However, I didn't implement it because I wasn't sure what format you were expecting.

My guess is:

  • Whole note = 1
  • Quarter note = 1/4
  • Dotted Quarter = 3/8
  • Triplet Eighth = 1/12
    etc.
    Is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks correct to me. Feel free to start filling out those fields.

@iansimon
Copy link
Collaborator

I recommend running pylint on your code. You can even get something close to Google's linter configuration, described in this StackOverflow answer: http://stackoverflow.com/questions/29597618/is-there-a-tool-to-lint-python-based-on-the-google-style-guide

Basically, install pylint, download that googlecl-pylint.rc file, then run:

pylint --rcfile=/path/to/googlecl-pylint.rc <your files>

@jsawruk
Copy link
Contributor Author

jsawruk commented Oct 4, 2016

@iansimon: Thanks. I'm linting the code right now using that pylint configuration. Not done yet, but soon. Then I will address remaining issues raised by @cghawthorne.

@jsawruk
Copy link
Contributor Author

jsawruk commented Oct 5, 2016

@iansimon: I am getting a pylint error that I don't know how to resolve.

Bad option value 'g-import-not-at-top' (bad-option-value)

Please compare with the code here:
https://github.com/tensorflow/magenta/blob/master/magenta/music/midi_io.py#L21

@iansimon
Copy link
Collaborator

There seems to be a missing file el_capitan.xml being referred to in your tests. Did you forget to check this in?

@jsawruk
Copy link
Contributor Author

jsawruk commented Oct 17, 2016

@iansimon: Thanks, I missed that when I changed where the test files are located.


xml_duration = xml_forward.find('duration')
forward_duration = int(xml_duration.text)
midi_ticks = forward_duration * (constants.STANDARD_PPQ / CURRENT_DIVISIONS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have CURRENT_DIVISIONS commented out above.

@iansimon
Copy link
Collaborator

I fixed many of the lint errors, mainly just ones that required no understanding of the code. There are still several that remain, largely missing/malformed docstring sections. There's also a CURRENT_DIVISIONS variable in musicxml_parser.py that is commented out in some places but not others.

@jsawruk
Copy link
Contributor Author

jsawruk commented Oct 19, 2016

@iansimon: Thanks for fixing the lint errors. I have been extremely busy at work and was hoping to work more on this code at the Classical Music Hack Day at MIT this coming weekend.

@jsawruk
Copy link
Contributor Author

jsawruk commented Oct 26, 2016

I believe this latest commit addresses all issues raised so far. I have implemented the note.numerator and note.denominator support (@douglaseck's TODO)

Jeremy Sawruk added 25 commits November 2, 2016 15:40
… by reading the symbolic note durations from the MusicXML. This completes Douglas Eck's TODO.
…Magenta. Also raise exception for unknown pitch steps
… handle parse errors. Fix MIDI program and channel mapping
@cghawthorne
Copy link
Contributor

Hey Jeremy, I think this is looking pretty good! We might need some followup changes, but I think it's in a good shape to check in now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants