-
-
Notifications
You must be signed in to change notification settings - Fork 262
Conversation
Improves performance of project loading substantively. Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org> Helped-by: Alex Disibio <alexdisibio@gmail.com>
What are the substantive changes here? I mostly see whitespace fixes? |
Lines 26-31 and Lines 36-39. I just changed how we allocate the floats so it causes less cache misses. |
Akleja tested it on GCC and it doesn't improve speed, but I think MSVC didn't make the same optimizer decisions. For me it seems much faster to load on Windows. If the optimizer doesn't come into play this saves |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
Stop! Think! I beg you! Was #412 really a good idea to start with? |
I was actually wasting your time with the memory leak fix, though it might have been good C++ education. THE “FIX” OF ALLOCA WAS THE BIGGER MISTAKE. It was not broken to begin with! You must learn not to let the static analyzer substitute for real understanding of the program. |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
src/AudioIO.cpp
Outdated
@@ -3979,6 +3979,7 @@ bool AudioIoCallback::FillOutputBuffers( | |||
// wxASSERT( maxLen == toGet ); | |||
|
|||
em.RealtimeProcessEnd(); | |||
delete bufHelper.release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a double delete? unique_ptr::release
should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.P.S. No, not a double delete. But an unnecessarily verbose single delete that could be left completely implicit. Whereas, “release” alone (distinct from “reset”) would be wrong, a memory leak.
Learn RAII. Learn “deterministic destruction.” Learn what makes C++ not like C on the one hand and not like Java on the other hand. You will like it, love it. I know I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know that it removes the unique_ptr
at the end of scope automatically. It was a bug testing change made for someone reporting issues on gcc
that I couldn't repro on my system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/memory/unique_ptr
Release vs. reset and much else explained there.
P. S. RealtimeEffectManager.cpp had the OTHER memory leak, obvious to me on inspection, that you did not figure out deductively. But the real fix you need is reverting #412 altogether. Cordially. |
The real fix would be moving memory allocation completely out of the real-time callback entirely. I haven't gotten to that yet, so I'm doing a half-measure and making sure that we don't get UB on a user's machine because the stack wasn't as big as it is on the developer's machine. Just as an example, MSVC gives 1MB stack by default and GCC gives 8MB. It's asking a lot to be 100% sure you aren't going to smash the stack when you use |
There's memory allocation in the audio thread? 😬 |
If it were a real problem in practice, affecting playback itself, an essential function— wouldn’t it have been well known and corrected long before now? |
Not necessarily. Glitches in audio playback can be difficult to reproduce. The allocation may be fast enough in the vast majority of cases but on rare occasions the OS may need more time to allocate and cause an audible glitch. |
Yes, friend, if “alloca” is considered “allocation,” but it is adjustment of some registers for variable sized stack allocation, in this real time critical thread, which always has a shallow stack. I still think this is fixin what ain’t broke, just because a static analyzer flags it, but that is not wisdom. |
Such hypotheses will be devilishly hard to demonstrate. I don’t say you are wrong. There is a certain dance of three threads, the main, which updates the screen and polls the stop button; the real time critical portAudio callback, in which these alloca calls happen (but would new and delete really improve on them?), and another thread that reads from (play) and/or writes to (record) the drive, with a lesser frequency and bigger grain. And RingBuffer mediating between the last two. I heard rumors of unusual bugs that I could not reproduce at will, I was given data, I saw odd patterns suggesting “time travel” of recorded data by numbers of samples that were powers of 2. I made speculative fixes to RingBuffer using std::atomic more rigorously. I didn’t hear further rumors of the recording glitches. Did I really fix it? I still don’t know, but I made my best educated guess. |
Could this be related to https://github.com/tenacityteam/tenacity/issues/288 ? |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
Is this new commit strictly whitespace changes? What are you using to format it? I doubt we should be doing such mass changes before automating coding standards. |
You're right. I was just trying to standardize the layout to something consistent. I'll avoid doing that again until we set a consistent standard. |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
AudioIOBufferHelper.h
cache friendly
Emily, you are letting a static analyzer or a checklist of alleged "anti-patterns" substitute for thinking about the code you are reading. What if I told you the bound of the loops in question was always a very small number? That bound is the number of playback channels, which is not more than two for most users of Audacity, and for more complicated setups is still not a large number. Did you pause to figure this out? Did you put breakpoints in the code, figure out how to get there, and observe the loop bound? Did you also observe in your breakpoint that this isn't the main thread, and did you wonder, is this a realtime-critical thread perhaps, in which breaking the usual recommended rules might be important and specially justified? No, you just took the output of an analyzer as unquestionable advice and wasted time fixing things not really broken. Please just revert #412 and move on, or prove me wrong by demonstrating a real bug traceable to these allocas and fixed by other allocation. This is painful to watch. |
Note, you can use |
Now I'm wasting MY time too, but it's Sunday morning and this is kinda fun. Added to FillOutputBuffers (non-portable macOS code):
That tells me stacksize is 524288 -- even less than that Windows default. But avail is 122390, and needed is 7818 or just about 6.4% of what's available. Of course all this can vary a lot. I can influence "needed" by changing Buffer Length in Device preferences, but it won't go above 8192 no matter how big I make that value. But by all means, try this for yourself with other OS equivalents. Try to prove that you can come anywhere close to the alloca stack overflow and smash. Or instead, just stop think about it. These alloca-s have been there several years. The Audacity team left them there. Don't you think the stack smashes, if they happened -- "Audacity crashes every time I play!" -- would have been a notorious, high priority bug that would have been fixed by now? |
Using
You have made a bunch of incorrect assumptions. I knew it was in a real time thread. That's why there shouldn't even be memory allocation happening there at all. I know you said you don't think modifying the stack segment register is a memory allocation, but you aren't really considering the the SSR points to memory and it still has to store those bits somewhere. On x86 it decrements the SP register and copies the new allocation to the memory area whose size is the difference between the two SP values, thus expanding the stack frame. It's still a memory allocation, but it is faster of course (if the memory area is something like L1 cache that is). There is just no portable standards compliant way to do this. I don't want to have to worry about stack overflows where it's not even needed. You also made a bunch of incorrect statements about Also, it just occurred to me: what happens if any of those values are zero? |
Here is the
|
Is the ancient history of gcc versions from 2001 actually relevant? Audacity/Tenacity presupposes a compiler that can do C++17. Those versions, predating 2017, surely won't. alloca is non-standard, it's true. But the pertinent question is, what versions of gcc support C++17, and how is alloca implemented in them? Don't hate me, I'm just trying to help you. |
I don't personally know the implementations of every c++17 compiler, across every version and across every machine. If you do then I applaud you. My point is that it's basically relying on a kludge to get a little bit of extra speed. I want to move memory allocation outside of the real-time thread, and because of the way that |
And neither to I. But if Tenacity really wants to support a compiler minimum of GCC 2.95, well then, be my guest update your BUILDING.md, and also change a whole lot of source code using C++11 syntax or later. Say goodbye to lambdas. If you want to support some other variety of compilers, not quite that wide, well still, document it there. Meanwhile, I'll work on Audacity, where we are content to drop support for really old compilers and stipulate which small few are guaranteed to work.
You do? Then pray tell, why did you add calls to operator new (!) within that thread, which were much more expensive than alloca, and they were not paired with delete (!), and therefore were leaky, and you pushed it to your master without review from your teammates, and you wouldn't even have known it was leaky if I hadn't spoken up. (Well maybe @nyanpasu64 at least would have figured it out soon enough.) I honestly didn't know how horribly alloca was sometimes implemented circa 2001. Thank you for that bit of education. But even thick-witted I have long known about std::auto_ptr superseding explicit delete, which was in the standard library since 1998 (deprecated now, please use unique_ptr). And, about the general princple of RAII, which existed in C++ since 1979, before Bjarne had even renamed it C++. https://youtu.be/u0aU0NTJ0hQ?t=858
Um, no? I mean, yeah? That was not, ever, the intended lifetime of those arrays in question.
Now THAT makes sense as a possible improvement of AudioIO.cpp. Honestly. |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
I don't appreciate you speaking on my behalf about how I'd respond to code I haven't reviewed. |
🤷 I was complimenting you. |
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
Signed-off-by: Emily Mabrey <emabrey@tenacityaudio.org>
Improves performance of project loading substantively.
Signed-off-by: Emily Mabrey emabrey@tenacityaudio.org
Helped-by: Alex Disibio alexdisibio@gmail.com
Checklist
-s
orSigned-off-by
* (See: Contributing § DCO)* indicates required