implement microphone input & pitch detection #12

basisbit · 2018-06-25T15:25:53Z

detect what pitch(es) was "sung" by the player(s) since last time it was checked (polling from some pitch detection service per microphone).

support multiple input devices
~~support multiple channels per device~~ -> Support multiple channels in recording device (e.g. left and right microphone) using PortAudio #85
support getting the best pitches guess for certain time lengths (depending on beats per minute of the current song file)
detect and properly handle sudden input / output devices being unplugged or added or locked
check if unity actually detects mics being changed
support linking mic to player by having the player make noise in front of the desired mic
~~auto-detect system total audio latency by playing 3 different pitches and checking for each mic how much latency there is until reception of these tones~~ -> Audio latency detection #86
audio input -> pitch -> sung note between two beats has to be framerate independent. Thus, don't just rely on the Update() method
use Application.RequestUserAuthorization before accessing webcam or microphone

basisbit · 2018-06-26T16:33:59Z

regarding pitch detection:

vocaluxe uses the solution from performous.
performous uses a deep recursive FFT algorithm, which might be too slow and or not working when implemented in c# - should be tested and measured eventually. See https://www.eetimes.com/document.asp?doc_id=1275415
performous code is GPLv2+
autocorrelation seems promising because it is easy to implement and also fast
more advanced, based on autocorrel.: https://asa.scitation.org/doi/10.1121/1.1458024 or http://www.cs.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf
windowing is important for the pitch detection quality but rather difficult to do. Maybe just using the songs beats and calculating pitch for first third, second third and last third, then adding them together but using a 1:3:1 or 1:2:1 weight ratio and using trigonometric functions / modulo % (360°/11) would result in a usable "fair" pitch result for such a karaoke game. This static window size might be a problem for very low pitch singers, but I doubt a human can sing that low,

daniel-j · 2018-08-08T06:41:42Z

I find aubio's YIN implementation good for pitch detection. C library.

achimmihca · 2019-10-05T12:29:47Z

The Unity API seems to slow for real-time (i.e. around 33ms at 30 FPS) Microphone Input.

The bottleneck seems to be
Microphone.Start(string deviceName, bool loop, int lengthSec, int frequency);. It starts recording from a mic into a buffer with at least 1 second length. The AudioClip that is returned seems to be updated only after the lengthSec has been elapsed. In the MicrophoneDemoScene, I toyed with the parameters and when setting lengthSec to 3, then the AudioClip data did not change for three seconds.
It's a pity that lengthSec is an integer, so less than 1 second is not possible.

Thus, I think we need an external lib (propably C/C++) to capture Microphone input fast enough.
I tried NAudio, but it is even slower than Unity's API.

achimmihca · 2019-11-25T21:13:15Z

The bottleneck seems to be
Microphone.Start(string deviceName, bool loop, int lengthSec, int frequency);

This is wrong. Unity API of the mic is not the bottleneck. I was using the API wrong.
The pitch detection lib that I used so far was not giving good results (I might have used it with non-optimal parameters).

Anyway, as daniel-j pointed out, aubio implements some nice algorithms. I found a C# binding of its API called Aubio.NET.
Using the YIN algorithm of aubio and the C# binding of Aubio.NET, the pitch detection is good enough in UltraStar Play to actually be playable.

Yay!

basisbit · 2019-11-25T22:12:59Z

As far as I understand copyleft software licensing, you may not use any GPLv3 licensed code or library in anything that is not GPLv3 licensed - not even when just dynamically linking to it at runtime. Thus, to use Aubio, this project would have to change its license to GPLv3. Personally, I do not think that this is necessary, and we should be able to (for example) self-implement CAMDF or anything similar like a autocorellation based algorithm. There exist plenty of public domain licensed algorithms that do pitch detection from audio samples just fine.

Regarding the GPLv3 only allowing use in GPLv3 licensed software when publishing it, see https://softwareengineering.stackexchange.com/questions/204410/how-are-gpl-compatible-licenses-like-mit-usable-in-gpl-programs-without-being-su

basisbit · 2019-11-26T19:47:11Z

@achimmihca I did some more research on this topic and it turns out, there is no legal way to integrate Aubio into UltraStar Play and then continue using the Unity framework.
Integrating Aubio would require UltraStar Play to be licensed under GPLv3 or newer as soon as a binary build of the game is shared. Unfortunately, because of the deep integration with Unity framework libraries of such a binary build of UltraStar Play, the Unity framework would have to be akso licensed under GPLv3+. Regarding this, please see https://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins

We could try contacting the authors of aubio and aubio.net and ask them for a MIT licensed version.

achimmihca · 2019-11-26T21:19:52Z

Uhhh, what a bummer!
I think in this case implementing pitch detection ourselves is the way to go.
At least we now know that the mic input from Unity is fast enough and that YIN is one possible algorithm of which the results are good enough.
I assume it would also be fast enough when implementing YIN in pure C#, but I am not sure.

I will do some refactoring such that the algorithm for the pitch detection can be swapped more easily.

However, I am not that much into signal processing. I have no idea how complex YIN or autocorellation actually are. Maybe, it is not that difficult to implement? I will leave this task to someone else ;)

psarkozy · 2019-11-28T09:59:55Z

Long time USDX fan here, google sent me here as I was wondering how USDX did pitch detection. Im writing a tool to automate the conversion of mp3s to ultrastar files.

I have tried enhanced autocorrelation by Tolonen et. al. (A Computationally Efficient Multipitch Analysis Model - Tero Tolonen, Matti Karjalainen) in both python and C, and found it to be very fast and reliable for test vocal tracks. Here is a simple python implementation: https://gist.github.com/anjiro/e148efe17c1e994981638b1a0c6d0954

And one based more closely on Audacity's implementation:
https://bitbucket.org/yeisoneng/python-eac/src/default/EAC/__init__.py

It works fine with a sliding window size N of 512 samples, and the most expensive term in the calculation is just O(2xN*log(N)) as it needs only 2 N sized FFT's per window.

If you wish to quickly take a look at the output of the algorithm, then load up a vocals track in Audacity, select spectrogram view and choose the algorithm Pitch (EAC) in the spectrogram settings dialog box.

A sample of the EAC of the vocals track from a famous pop hit song:

basisbit · 2019-11-28T12:49:16Z

@psarkozy current USDX uses the CAMDF algorithm (which is public domain licensed). It was implemented as part of this pull request: UltraStar-Deluxe/USDX#461

achimmihca · 2019-11-29T17:25:12Z

I will do some refactoring such that the algorithm for the pitch detection can be swapped more easily.

Done. To change the pitch detection algorithm, one has to provide a different implementation of IAudioSamplesAnalyzer in the MicrophonePitchTracker.

basisbit · 2019-12-03T20:49:29Z

With that pull request merged, I did some comparison tests. A friend sung a song in ultrastar deluxe with above 9500 points on difficulty medium and he recorded that with Audacity. Then I used vbcable driver pack on windows to have a virtual audio output device directly be connected to a virtual audio input device as karaoke microphone. I swapped the song.ogg file, corrected the GAP in the song.txt and then have UltraStar Deluxe and UltraStar Play do these songs, thus, the games both graded the same audio file against the same ultrastar txt song file.
The result:
UltraStar Deluxe:

UltraStar Play:

Tests with other songs showed mostly similar results.
Deactivating the median calculation in CamdAudioSamplesAnalyzer.OnPitchDetected or lowering pitchRecordHistoryLength below 3 or above 10 significantly lowered the total points reached for the song. The median calculation (which introduces additional lag of 3 screen frames on average) filters out sudden low or high pitch detections and thus improves perceived quality of the pitch detection. As a result, we should probably keep this median calculation for now.

Anyways, the pitch evaluation + grading should still be improved. Increasing the audio sample-window-size to roughly (one third of?) the length of one beat of the ultrastar song txt file (and then averaging the thirds pitches) will probably improve filtering out noise and thus improve pitch detection.

achimmihca · 2019-12-03T21:56:33Z

First, I must say this is a very smart comparison!

Second, with the better pitch detecion I noticed that the PlayerNoteRecorder is very buggy. It records notes where none should be. Furthermore, sometimes notes dissapear and popup at a different position. I will have to look into this again.

Scoring is also still buggy. The PerfectSinger script, which simulates singing the expected notes of the song, receives a total score near 10000, but not 10000. This could be some rounding issues / floating point inaccuracy with the current approach. It would be better if calculating the player score would be done with integers and a different formula. I will have to look into this again.

achimmihca · 2019-12-20T08:53:51Z

@basisbit have you re-done the comparison with USDX recording and playback in Play? I would be interested in how the results changed with the latest changes.

Anyway, I suggest to close this ticket because basic mic input and pitch detection is working so far.
The other issues can be done in dedicated tickets, such as #85 for multiple mic channels.

check if unity actually detects mics being changed

I just tested this. Unity detects the changes. For example, in the recording options scene the label "Hardware not connected" goes away when plugging in the mic and selecting the mic again. The Unity API does not seem to have an event that we can hook into to detect the changes. Still, I think it is sufficient because there are no crashes when plugging in a new mic and the user will most likely try a manual refresh of the recording device.

When removing a connected mic, the MicData buffer is not receiving new samples from the mic. Thus, in the sing scene for example, the last sung note will be repeated until the end of the song. Again, I think this is sufficient because there are no crashes and the behaviour is somewhat reasonable.

detect and properly handle sudden input / output devices being unplugged or added or locked

Is this also related to mics or do you mean other input / output devices? It feels like the same question as above.

basisbit · 2019-12-20T10:39:49Z

We are currently at ~ 6400 points with UltraStar Play at difficulty easy compared to ~ 9530 with UltraStar Deluxe and difficulty medium.
When watching the game grade itself, it looks like UltraStar Play does not consider / factor in the audio output + microphone input latency difference. UltraStar Deluxe by default assumes 140ms of input device latency.

achimmihca · 2020-03-12T19:55:02Z

@basisbit Please redo your benchmark. I am curious how it performs with the latest changes.
Personally, I already enjoy singing in USPlay.

Anyway, I suggest to close this issue now as its main feature has been implemented (mic input and pitch detection).

daggeg · 2020-04-09T15:09:51Z

Have you guys had a look at Vocaluxe? The guys developing Vocaluxe were first developing Ultrastar Deluxe 'til they reached a point where they wanted to startover fresh and make a much better game engine. So about 10 years ago they left Deluxe to wither and started developing Vocaluxe instead. And they put a lot of emphasis on note detection and latency. Making it superior to the Ultrastar clones.

Perhaps your're past this stage, but since I heard of your project to do I thought I'd just say this. And if we ever were to switch from Vocaluxe, the game engine has to be as good as Vocaluxe's. And as we have 14.000+ entries in our highscore database it would be nice if the scores were comparable to Vocaluxe's 'cause we're not starting over from scratch again :)

basisbit · 2020-04-09T15:21:00Z

@daggeg what highscore database?
Also, we can't reuse any code from Vocaluxe because of the projects code being license-incompatible to UltraStar Play or any other game which uses the Unity framework.

daggeg · 2020-04-09T15:41:32Z

@daggeg what highscore database?
Also, we can't reuse any code from Vocaluxe because of the projects code being license-incompatible to UltraStar Play or any other game which uses the Unity framework.

Ah, that's too bad. Did not know that.
I meant our family's highscore database.

achimmihca · 2020-04-26T20:12:59Z

Have you guys had a look at Vocaluxe?

Thanks for the hint. I took a look at their code. I did not find out exactly what they do, there are multiple pitch tracker implementations. However, this way I found dywapitchtrack. It is MIT licensed and implements a wavelet algorithm which seems to be very fast and accurate. I migrated the code (just a few functions) to pure C# for USPlay and it works great so far! Even without the tricks and the rounding hack that I implemented to make the CAMD pitch detection feel more reliable.

So, I will create a new PR soon with this and an option to choose the pitch detection algorithm. Then everyone can evaluate it by singing some songs.

achimmihca · 2020-04-26T20:24:46Z

Another important change that I have planned is about the way notes are recorded from incoming mic samples.
At the moment, the samples of only the current frame are analyzed and then thrown away. This does not fit the beats of the song (frames and beats can mismatch, beats are skipped etc.).

Instead, it might be better to buffer samples of some frames and analyze the samples that make a beat, or event multiple beats of the same note. Buffering enables to analyze with some surrounding sample context (in both directions). Furthermore it should make the pitch detection less frame-rate dependent.

achimmihca · 2020-06-04T20:26:47Z

The new stuff works like a charm. Finally :) !

basisbit added enhancement labels Jun 25, 2018

basisbit mentioned this issue Nov 26, 2019

pitch detection with aubio #64

Closed

basisbit mentioned this issue Dec 2, 2019

basic CAMDiff based pitch detection and threshhold loudness checking #70

Merged

achimmihca mentioned this issue Jan 5, 2020

requirements for first release version #104

Closed

achimmihca mentioned this issue May 2, 2020

player note recorder on buffered samples #169

Merged

achimmihca closed this as completed Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement microphone input & pitch detection #12

implement microphone input & pitch detection #12

basisbit commented Jun 25, 2018 •

edited by achimmihca

Loading

basisbit commented Jun 26, 2018 •

edited

Loading

daniel-j commented Aug 8, 2018

achimmihca commented Oct 5, 2019

achimmihca commented Nov 25, 2019

basisbit commented Nov 25, 2019 •

edited

Loading

basisbit commented Nov 26, 2019

achimmihca commented Nov 26, 2019

psarkozy commented Nov 28, 2019 •

edited

Loading

basisbit commented Nov 28, 2019 •

edited

Loading

achimmihca commented Nov 29, 2019

basisbit commented Dec 3, 2019 •

edited

Loading

achimmihca commented Dec 3, 2019

achimmihca commented Dec 20, 2019

basisbit commented Dec 20, 2019 •

edited

Loading

achimmihca commented Mar 12, 2020

daggeg commented Apr 9, 2020 •

edited

Loading

basisbit commented Apr 9, 2020 •

edited

Loading

daggeg commented Apr 9, 2020

achimmihca commented Apr 26, 2020

achimmihca commented Apr 26, 2020

achimmihca commented Jun 4, 2020

implement microphone input & pitch detection #12

implement microphone input & pitch detection #12

Comments

basisbit commented Jun 25, 2018 • edited by achimmihca Loading

basisbit commented Jun 26, 2018 • edited Loading

daniel-j commented Aug 8, 2018

achimmihca commented Oct 5, 2019

achimmihca commented Nov 25, 2019

basisbit commented Nov 25, 2019 • edited Loading

basisbit commented Nov 26, 2019

achimmihca commented Nov 26, 2019

psarkozy commented Nov 28, 2019 • edited Loading

basisbit commented Nov 28, 2019 • edited Loading

achimmihca commented Nov 29, 2019

basisbit commented Dec 3, 2019 • edited Loading

achimmihca commented Dec 3, 2019

achimmihca commented Dec 20, 2019

basisbit commented Dec 20, 2019 • edited Loading

achimmihca commented Mar 12, 2020

daggeg commented Apr 9, 2020 • edited Loading

basisbit commented Apr 9, 2020 • edited Loading

daggeg commented Apr 9, 2020

achimmihca commented Apr 26, 2020

achimmihca commented Apr 26, 2020

achimmihca commented Jun 4, 2020

basisbit commented Jun 25, 2018 •

edited by achimmihca

Loading

basisbit commented Jun 26, 2018 •

edited

Loading

basisbit commented Nov 25, 2019 •

edited

Loading

psarkozy commented Nov 28, 2019 •

edited

Loading

basisbit commented Nov 28, 2019 •

edited

Loading

basisbit commented Dec 3, 2019 •

edited

Loading

basisbit commented Dec 20, 2019 •

edited

Loading

daggeg commented Apr 9, 2020 •

edited

Loading

basisbit commented Apr 9, 2020 •

edited

Loading