Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement microphone input & pitch detection #12

Closed
7 tasks done
basisbit opened this issue Jun 25, 2018 · 21 comments
Closed
7 tasks done

implement microphone input & pitch detection #12

basisbit opened this issue Jun 25, 2018 · 21 comments

Comments

@basisbit
Copy link
Member

basisbit commented Jun 25, 2018

detect what pitch(es) was "sung" by the player(s) since last time it was checked (polling from some pitch detection service per microphone).

  • support multiple input devices
  • support multiple channels per device -> Support multiple channels in recording device (e.g. left and right microphone) using PortAudio #85
  • support getting the best pitches guess for certain time lengths (depending on beats per minute of the current song file)
  • detect and properly handle sudden input / output devices being unplugged or added or locked
  • check if unity actually detects mics being changed
  • support linking mic to player by having the player make noise in front of the desired mic
  • auto-detect system total audio latency by playing 3 different pitches and checking for each mic how much latency there is until reception of these tones -> Audio latency detection #86
  • audio input -> pitch -> sung note between two beats has to be framerate independent. Thus, don't just rely on the Update() method
  • use Application.RequestUserAuthorization before accessing webcam or microphone
@basisbit
Copy link
Member Author

basisbit commented Jun 26, 2018

regarding pitch detection:

@daniel-j
Copy link
Member

daniel-j commented Aug 8, 2018

I find aubio's YIN implementation good for pitch detection. C library.

@achimmihca
Copy link
Collaborator

The Unity API seems to slow for real-time (i.e. around 33ms at 30 FPS) Microphone Input.

The bottleneck seems to be
Microphone.Start(string deviceName, bool loop, int lengthSec, int frequency);. It starts recording from a mic into a buffer with at least 1 second length. The AudioClip that is returned seems to be updated only after the lengthSec has been elapsed. In the MicrophoneDemoScene, I toyed with the parameters and when setting lengthSec to 3, then the AudioClip data did not change for three seconds.
It's a pity that lengthSec is an integer, so less than 1 second is not possible.

Thus, I think we need an external lib (propably C/C++) to capture Microphone input fast enough.
I tried NAudio, but it is even slower than Unity's API.

@achimmihca
Copy link
Collaborator

The bottleneck seems to be
Microphone.Start(string deviceName, bool loop, int lengthSec, int frequency);

This is wrong. Unity API of the mic is not the bottleneck. I was using the API wrong.
The pitch detection lib that I used so far was not giving good results (I might have used it with non-optimal parameters).

Anyway, as daniel-j pointed out, aubio implements some nice algorithms. I found a C# binding of its API called Aubio.NET.
Using the YIN algorithm of aubio and the C# binding of Aubio.NET, the pitch detection is good enough in UltraStar Play to actually be playable.

Yay!

@basisbit
Copy link
Member Author

basisbit commented Nov 25, 2019

As far as I understand copyleft software licensing, you may not use any GPLv3 licensed code or library in anything that is not GPLv3 licensed - not even when just dynamically linking to it at runtime. Thus, to use Aubio, this project would have to change its license to GPLv3. Personally, I do not think that this is necessary, and we should be able to (for example) self-implement CAMDF or anything similar like a autocorellation based algorithm. There exist plenty of public domain licensed algorithms that do pitch detection from audio samples just fine.

Regarding the GPLv3 only allowing use in GPLv3 licensed software when publishing it, see https://softwareengineering.stackexchange.com/questions/204410/how-are-gpl-compatible-licenses-like-mit-usable-in-gpl-programs-without-being-su

@basisbit
Copy link
Member Author

@achimmihca I did some more research on this topic and it turns out, there is no legal way to integrate Aubio into UltraStar Play and then continue using the Unity framework.
Integrating Aubio would require UltraStar Play to be licensed under GPLv3 or newer as soon as a binary build of the game is shared. Unfortunately, because of the deep integration with Unity framework libraries of such a binary build of UltraStar Play, the Unity framework would have to be akso licensed under GPLv3+. Regarding this, please see https://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins

We could try contacting the authors of aubio and aubio.net and ask them for a MIT licensed version.

@achimmihca
Copy link
Collaborator

Uhhh, what a bummer!
I think in this case implementing pitch detection ourselves is the way to go.
At least we now know that the mic input from Unity is fast enough and that YIN is one possible algorithm of which the results are good enough.
I assume it would also be fast enough when implementing YIN in pure C#, but I am not sure.

I will do some refactoring such that the algorithm for the pitch detection can be swapped more easily.

However, I am not that much into signal processing. I have no idea how complex YIN or autocorellation actually are. Maybe, it is not that difficult to implement? I will leave this task to someone else ;)

@psarkozy
Copy link

psarkozy commented Nov 28, 2019

Long time USDX fan here, google sent me here as I was wondering how USDX did pitch detection. Im writing a tool to automate the conversion of mp3s to ultrastar files.

I have tried enhanced autocorrelation by Tolonen et. al. (A Computationally Efficient Multipitch Analysis Model - Tero Tolonen, Matti Karjalainen) in both python and C, and found it to be very fast and reliable for test vocal tracks. Here is a simple python implementation: https://gist.github.com/anjiro/e148efe17c1e994981638b1a0c6d0954

And one based more closely on Audacity's implementation:
https://bitbucket.org/yeisoneng/python-eac/src/default/EAC/__init__.py

It works fine with a sliding window size N of 512 samples, and the most expensive term in the calculation is just O(2xN*log(N)) as it needs only 2 N sized FFT's per window.

If you wish to quickly take a look at the output of the algorithm, then load up a vocals track in Audacity, select spectrogram view and choose the algorithm Pitch (EAC) in the spectrogram settings dialog box.

A sample of the EAC of the vocals track from a famous pop hit song:
image

@basisbit
Copy link
Member Author

basisbit commented Nov 28, 2019

@psarkozy current USDX uses the CAMDF algorithm (which is public domain licensed). It was implemented as part of this pull request: UltraStar-Deluxe/USDX#461

@achimmihca
Copy link
Collaborator

I will do some refactoring such that the algorithm for the pitch detection can be swapped more easily.

Done. To change the pitch detection algorithm, one has to provide a different implementation of IAudioSamplesAnalyzer in the MicrophonePitchTracker.

@basisbit
Copy link
Member Author

basisbit commented Dec 3, 2019

With that pull request merged, I did some comparison tests. A friend sung a song in ultrastar deluxe with above 9500 points on difficulty medium and he recorded that with Audacity. Then I used vbcable driver pack on windows to have a virtual audio output device directly be connected to a virtual audio input device as karaoke microphone. I swapped the song.ogg file, corrected the GAP in the song.txt and then have UltraStar Deluxe and UltraStar Play do these songs, thus, the games both graded the same audio file against the same ultrastar txt song file.
The result:
UltraStar Deluxe:
image

UltraStar Play:
image

Tests with other songs showed mostly similar results.
Deactivating the median calculation in CamdAudioSamplesAnalyzer.OnPitchDetected or lowering pitchRecordHistoryLength below 3 or above 10 significantly lowered the total points reached for the song. The median calculation (which introduces additional lag of 3 screen frames on average) filters out sudden low or high pitch detections and thus improves perceived quality of the pitch detection. As a result, we should probably keep this median calculation for now.

Anyways, the pitch evaluation + grading should still be improved. Increasing the audio sample-window-size to roughly (one third of?) the length of one beat of the ultrastar song txt file (and then averaging the thirds pitches) will probably improve filtering out noise and thus improve pitch detection.

@achimmihca
Copy link
Collaborator

First, I must say this is a very smart comparison!

Second, with the better pitch detecion I noticed that the PlayerNoteRecorder is very buggy. It records notes where none should be. Furthermore, sometimes notes dissapear and popup at a different position. I will have to look into this again.

Scoring is also still buggy. The PerfectSinger script, which simulates singing the expected notes of the song, receives a total score near 10000, but not 10000. This could be some rounding issues / floating point inaccuracy with the current approach. It would be better if calculating the player score would be done with integers and a different formula. I will have to look into this again.

@achimmihca
Copy link
Collaborator

@basisbit have you re-done the comparison with USDX recording and playback in Play? I would be interested in how the results changed with the latest changes.

Anyway, I suggest to close this ticket because basic mic input and pitch detection is working so far.
The other issues can be done in dedicated tickets, such as #85 for multiple mic channels.


check if unity actually detects mics being changed

I just tested this. Unity detects the changes. For example, in the recording options scene the label "Hardware not connected" goes away when plugging in the mic and selecting the mic again. The Unity API does not seem to have an event that we can hook into to detect the changes. Still, I think it is sufficient because there are no crashes when plugging in a new mic and the user will most likely try a manual refresh of the recording device.

When removing a connected mic, the MicData buffer is not receiving new samples from the mic. Thus, in the sing scene for example, the last sung note will be repeated until the end of the song. Again, I think this is sufficient because there are no crashes and the behaviour is somewhat reasonable.

detect and properly handle sudden input / output devices being unplugged or added or locked

Is this also related to mics or do you mean other input / output devices? It feels like the same question as above.

@basisbit
Copy link
Member Author

basisbit commented Dec 20, 2019

We are currently at ~ 6400 points with UltraStar Play at difficulty easy compared to ~ 9530 with UltraStar Deluxe and difficulty medium.
When watching the game grade itself, it looks like UltraStar Play does not consider / factor in the audio output + microphone input latency difference. UltraStar Deluxe by default assumes 140ms of input device latency.

@achimmihca
Copy link
Collaborator

@basisbit Please redo your benchmark. I am curious how it performs with the latest changes.
Personally, I already enjoy singing in USPlay.

Anyway, I suggest to close this issue now as its main feature has been implemented (mic input and pitch detection).

@daggeg
Copy link

daggeg commented Apr 9, 2020

Have you guys had a look at Vocaluxe? The guys developing Vocaluxe were first developing Ultrastar Deluxe 'til they reached a point where they wanted to startover fresh and make a much better game engine. So about 10 years ago they left Deluxe to wither and started developing Vocaluxe instead. And they put a lot of emphasis on note detection and latency. Making it superior to the Ultrastar clones.

Perhaps your're past this stage, but since I heard of your project to do I thought I'd just say this. And if we ever were to switch from Vocaluxe, the game engine has to be as good as Vocaluxe's. And as we have 14.000+ entries in our highscore database it would be nice if the scores were comparable to Vocaluxe's 'cause we're not starting over from scratch again :)

@basisbit
Copy link
Member Author

basisbit commented Apr 9, 2020

@daggeg what highscore database?
Also, we can't reuse any code from Vocaluxe because of the projects code being license-incompatible to UltraStar Play or any other game which uses the Unity framework.

@daggeg
Copy link

daggeg commented Apr 9, 2020

@daggeg what highscore database?
Also, we can't reuse any code from Vocaluxe because of the projects code being license-incompatible to UltraStar Play or any other game which uses the Unity framework.

Ah, that's too bad. Did not know that.
I meant our family's highscore database.

@achimmihca
Copy link
Collaborator

Have you guys had a look at Vocaluxe?

Thanks for the hint. I took a look at their code. I did not find out exactly what they do, there are multiple pitch tracker implementations. However, this way I found dywapitchtrack. It is MIT licensed and implements a wavelet algorithm which seems to be very fast and accurate. I migrated the code (just a few functions) to pure C# for USPlay and it works great so far! Even without the tricks and the rounding hack that I implemented to make the CAMD pitch detection feel more reliable.

So, I will create a new PR soon with this and an option to choose the pitch detection algorithm. Then everyone can evaluate it by singing some songs.

@achimmihca
Copy link
Collaborator

Another important change that I have planned is about the way notes are recorded from incoming mic samples.
At the moment, the samples of only the current frame are analyzed and then thrown away. This does not fit the beats of the song (frames and beats can mismatch, beats are skipped etc.).

Instead, it might be better to buffer samples of some frames and analyze the samples that make a beat, or event multiple beats of the same note. Buffering enables to analyze with some surrounding sample context (in both directions). Furthermore it should make the pitch detection less frame-rate dependent.

@achimmihca
Copy link
Collaborator

The new stuff works like a charm. Finally :) !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants