Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output scaling factor #15

Open
KlepD-SAL opened this issue Jan 12, 2023 · 2 comments
Open

Output scaling factor #15

KlepD-SAL opened this issue Jan 12, 2023 · 2 comments

Comments

@KlepD-SAL
Copy link

KlepD-SAL commented Jan 12, 2023

Hello!

In the spec, it is mentioned that the core calculates the regular DFT "to within a scale factor".

After trying out the FFT core with several different configurations, it is not quite clear to me what this scaling factor is or where it comes from.

As a reference, I compared the FFT core's results to a Python FFT (NumPy, as well as a manual DFT loop) and came to the following conclusion:

Input bitwidth (N) Output bitwidth FFT length (F) Scale factor
8 10 8 2
8 11 16 1
8 11 32 4
8 12 64 2
8 12 128 8
8 13 256 4
8 13 512 16

(Command for core generation: fft-gen -f F -n N -p 1024 -x 4)

The output bitwith was not manually capped, so the generator decided how many bits are required. The same test was performed with an input bitwidth of 16, leading to the same scale factors. So I assume that these are deterministic and due to them being all factors of 2, I also assume that they are the result of certain implementation details (shifting, truncating, etc.).

Unfortunately, I didn't find more information on that topic in other issues, the blog post, spec, or README.

Would it be possible to elaborate on that?

I'm also a little bit surprised that no one already mentioned this since it seems pretty important to me to have the correct scaling factor in order to get "correct" results (i.e.: the same as in a co-simulation, for example).

Thanks a lot!

@ZipCPU
Copy link
Owner

ZipCPU commented Jan 12, 2023

The output scales at one additional bit for every two stages. This is based upon two principles:

  1. Any incoming noise (zero mean, any standard deviation, any distribution) increases at a half a bit per stage. That is, the variance of the noise grows with the number of stages, but the standard deviation with the square root of the FFT length.
  2. Any tones (i.e. non-zero mean) will increase in value at one bit per stage. That is, the mean output will at most be N times the mean input, where N is the FFT length.

Therefore, the FFT drops one bit every other stage. This minimizes the computational power required, at no loss of relevant precision.

Dan

@KlepD-SAL
Copy link
Author

Hello Dan,
thanks for your explaination!

Following that logic, the output scale should be calculated according to 2 ^ (log2(FFT_LEN) / 2), right?
log2(FFT_LEN) to get the number of stages, and / 2 to account for the dropped bit in every other stage.

Now, if I understand the source code correctly, the first stage never drops a bit, so only the 3rd, 5th, etc.
This would lead to: 2 ^ ( (log2(FFT_LEN)-1) / 2) for the scale factor.

What is now still unclear to me, is the fact that is sometime "un-drops" a bit, according to the scale factors I found.
F=16, for example, has a scale factor of 1 (2^0), so it apparently never dropped a bit. But the generated Verilog actually shows, that the qtrstage drops a bit (input width == output width).

So I guess there is some other condition that affects the scale factor. Do you know what am I missing here?

Thanks and best regards,
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants