# COMP28512 Laboratory Task1: Sound Sampling

## Part 1.1. Generating and listening to sine-waves
### Python function playSin(F, Fs):
Generate an "int16" numpy array containing a sine-wave of amplitude 30000 and frequency F(Hz).
The sine-wave must be sampled at Fs Hz and last for 3 seconds.

In [None]:
import numpy as np
from numpy import int16
from comp28512_utils import Audio


def playSin(F, Fs):
    T = 1.0/Fs # sampling period(seconds)
    A = 30000 # amplitude
    last = int(3//T+1)
    n = np.arange(0, last)
    y = np.array([0]*last, int16)
    y = int16(A*np.sin(2*np.pi*F*n*T)) # y=Asin(2pifnT)
    Audio(y, rate=Fs)

### Call playSin(F, Fs) twice with Fs = 44100Hz, F = 220Hz and 440Hz

In [None]:
print("sine-wave audio with amplitude 30000, frequency 220Hz, sampling frequency 44100Hz.")
playSin(220, 44100)
print("sine-wave audio with amplitude 30000, frequency 440Hz, sampling frequency 44100Hz.")
playSin(440, 44100)

### Call playSin(F, Fs) for F: 1000Hz, 2000Hz, 4000Hz, 8000Hz, 16000Hz, 18000Hz, 20000Hz

In [None]:
print("sine-wave audio with amplitude 30000, frequency 1000Hz, sampling frequency 44100Hz.")
playSin(1000, 44100)
print("sine-wave audio with amplitude 30000, frequency 2000Hz, sampling frequency 44100Hz.")
playSin(2000, 44100)
print("sine-wave audio with amplitude 30000, frequency 4000Hz, sampling frequency 44100Hz.")
playSin(4000, 44100)
print("sine-wave audio with amplitude 30000, frequency 8000Hz, sampling frequency 44100Hz.")
playSin(8000, 44100)
print("sine-wave audio with amplitude 30000, frequency 16000Hz, sampling frequency 44100Hz.")
playSin(16000, 44100)
print("sine-wave audio with amplitude 30000, frequency 18000Hz, sampling frequency 44100Hz.")
playSin(18000, 44100)
print("sine-wave audio with amplitude 30000, frequency 20000Hz, sampling frequency 44100Hz.")
playSin(20000, 44100)

### Part 1.1 Q&A:
1. Describe the sound produced by a sine-wave of frequency 220 Hz.
   * The sound is kind of low and consistent, representing music note "A" near middle C. 
2. How does the sound differ from that produced by musical instruments and the human voice?
   * The tones heard different. This sound is not naturally produced and it will not faded over time.
3. Describe any differences and similarities between the sound at F=220 to F=440.
   * Differences: F=440 sounds "sharper" than F=220.
   * Similarities: They both represent music note "A", and they are both steady which is kind of "artificial".
4. What is the highest frequency you could hear?
   * For very high volume, 20000Hz, for lower volume, 16000Hz.
5. Could there be other factors that affect your answer to question 4?
   * Different devices and headphones may have different results.
6. Why is it best not to use ‘for’ loops for this software?
   * Because the data we have to deal with is too much, and "for" loops are slower than numpy, so we can save a lot of time using numpy instead of 'for' loops.

## Part 1.2. Demonstration of aliasing for sine-waves
### call playSin(F, Fs) with Fs=4000Hz, and F from 500-3.5kHz

In [None]:
print("sine-wave audio with amplitude 30000, frequency 500Hz, sampling frequency 4000Hz.")
playSin(500, 4000)
print("sine-wave audio with amplitude 30000, frequency 1kHz, sampling frequency 4000Hz.")
playSin(1000, 4000)
print("sine-wave audio with amplitude 30000, frequency 1.5kHz, sampling frequency 4000Hz.")
playSin(1500, 4000)
print("sine-wave audio with amplitude 30000, frequency 2kHz, sampling frequency 4000Hz.")
playSin(2000, 4000)
print("The following audio has sampling frequency Fs < 2*F, so there should be aliasing distorion:")
print
print("sine-wave audio with amplitude 30000, frequency 2.5kHz, sampling frequency 4000Hz.")
playSin(2500, 4000)
print("sine-wave audio with amplitude 30000, frequency 3kHz, sampling frequency 4000Hz.")
playSin(3000, 4000)
print("sine-wave audio with amplitude 30000, frequency 3.5kHz, sampling frequency 4000Hz.")
playSin(3500, 4000)

### Part 1.2 Q&A:
1. Why does aliasing distortion occur in this experiment?
   * Because sampling frequency is not high enough(sampling frequency < 2* sine-wave max frequency) when we increasing the frequency of the sine-wave.
2. What is the effect of aliasing on each of the six sine-waves?
   * For the first 3 sine-waves(500, 1k, 1.5k), we can still reconstruct the sound with our sampling, but the last 3 sine-waves(2.5k, 3k, 3.5k), we cannot and the sound we listened is not the real sine-wave sound at those frequency, in fact, they sound like 1.5k, 2k, 500 Hz sine-wave respectively.
3. Explain what happens when F = 2000 Hz.
   * It has no sound coming out since we always sample at "amplitude=0" points.

## Part 1.3. Processing a music file to demonstrate aliasing
### Reads violin music, reduce the sampling frequency without filter, and then use "decimate" or "resample"

In [None]:
from numpy import int16
from scipy.io import wavfile
from comp28512_utils import Audio
from scipy.signal import decimate, resample
# read music from wavfile
(Fs, violin_music) = wavfile.read("SVivaldi44.1mono.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"
print("Unmodified violin music:")
Audio(violin_music, rate=Fs)
# reduce sampling frequency by a factor of 11 without antialiasing filter
n = np.arange(0, violin_music.size/11)
violin_r = violin_music[n*11]
print("Violin music with reducing sampling frequency by 11:")
Audio(violin_r, rate=Fs/11)
# decimate and resample
violin_r = decimate(violin_music, 11, zero_phase=True).astype(int16)
print("Decimated violin music:")
Audio(violin_r, rate=Fs/11)
# why is resample slower than decimate?
violin_r = resample(violin_music, (violin_music.size)/11).astype(int16)
print("Resampled violin music:")
Audio(violin_r, rate=Fs/11)

### Part 1.3 Q&A:
1. How does your program know what is the original sampling frequency?
   * We can get the sampling frequency by get the return value of wavfile.read function.
2. Could you hear any distortion in the original wav file?
   * No, maybe there are some distortion but definitely not obvious.
3. Describe the two effects you heard with the sampling frequency reduced to about 4 kHz without antialiasing filtering.
   * Some distortions like pitch is lower and inaccurate tunes.
   * Some musical notes became muffled.
4. What was the effect of the antialiasing filtering when you used 'resample' or 'decimate'?
   * The sound becomes decent, but not as good as the original one.
5. How does the aliasing distortion affect musical notes?
   * The musical notes could sound off the pitch and become muffled.
6. Is an antialiasing filter always necessary before sampling music?
   * Yes, we will need filter when there are sounds which have frequency more than half of our sampling frequency, since we cannot know our recorded music(including some noises)'s maximum frequency, we will need a low pass filter to ensure that no high frequency sounds which may cause aliasing distortion exist before we sample the music.

## Part 1.4. Reducing the bit-rate of a speech or music file by reducing the sampling frequency
### Use the 'decimate' function with different sampling rates for the high quality speech file and the high quality music file to avoid aliasing distortion

In [None]:
from numpy import int16
from scipy.io import wavfile
from comp28512_utils import Audio
from scipy.signal import decimate, resample
# read music from wavfile
(Fs, speech) = wavfile.read("HQ-speech44100-mono.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"
# print("Unmodified HQ-speech:")
# Audio(speech, rate=Fs)
s_factor = 8
# decimate by factor
speech_r = decimate(speech, s_factor, zero_phase=True).astype(int16)
print "Decimated HQ-speech by factor", s_factor, " :"
Audio(speech_r, rate=Fs/s_factor)
# read music from wavfile
(Fs, music) = wavfile.read("HQ-music44100-mono.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"
# print("Unmodified HQ-music:")
# Audio(music, rate=Fs)
m_factor = 3
# decimate by factor
music_r = decimate(music, m_factor, zero_phase=True).astype(int16)
print "Decimated HQ-music by factor", m_factor, " :"
Audio(music_r, rate=Fs/m_factor)

### Part 1.4 Q&A:
1. What is the effect on speech of reducing the sampling rate ?
   * The sound can get indecipherable and have some noises in background if the reducing factor is high.
2. What you consider to be the minimum acceptable sampling rate forspeech that you would like to hear from the built in speaker of your mobile phone?
   * 5512.5Hz(Factor 8).
3. What is the effect on music of reducing the sampling rate ?
   * The sound will get muffled, and if the reducing factor is high, then the sound can get have noises in background.
4. What you consider to be the minimum acceptable sampling rate for music that you would like to hear from your mobile phonewhen using headphones or a good quality speaker?
   * 14700Hz(Factor 3) is acceptable, but I would like the HQ soundtrack if possible.

## Part 1.5. Reducing the bit-rate for music by reducing number of bits per sample
### Compare the HQ-music original and quantised to specific 'NB' version

In [None]:
import numpy as np
from numpy import int16
from scipy.io import wavfile
from comp28512_utils import Audio
# read music from wavfile
(Fs, music) = wavfile.read("HQ-music44100-mono.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"
print("Original(NB=16) HQ-music:")
Audio(music, rate=Fs)
# quantised version
SM = max(abs(music))           # Get maximum amplitude
music = music/float(SM)    # Scale maximum to 1 (note the float)


def HQ_quant(NB, music): 
    quant_music = np.round(music*(2**(NB-1)-0.5)-0.5)
    iquant_music = np.int16(quant_music)
    iquant_music = int16((iquant_music+0.5) * (2**(16-NB)))
    print "Quantised music using ", NB, " per sample:"
    Audio(iquant_music, rate=Fs)


for i in range(15, 2, -1):
    if i == 5:
        print "Minimum acceptable, can hear some noises but not much:"
    HQ_quant(i, music)

### Part 1.5 Q&A:
1. What do you consider to be the minimum acceptable value of NB for music sampled at 44.1 kHz?
   * NB=5, because the sound is clear, but NB=4 is not.
2. Describe the distortion that occurs as NB is decreased from 16 towards 3.
   * The sound has little changes when NB is above 6, we can hear some noises in NB=5 version, and the noises become much more severe when NB=4 or 3.
3. Does the nature of the distortion change when the number of bits per sample becomes three or less?
   * The nature does not change, but it is more severe when NB=3 or less.

## Part 1.6. Telephone quality speech
### Evaluate uniform quantisation effects

In [None]:
import numpy as np
from numpy import int16
from scipy.io import wavfile
from comp28512_utils import Audio, get_pesq_scores, audio_from_file 
# read speech from wavfile
(Fs, nbSpeech) = wavfile.read("NarrobandSpeech8k.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"
print("Original Narrrowbandspeech:")
Audio(nbSpeech, rate=Fs)


def uniform_quant(NB, nbSpeech):
    # scale to [-1, 1]
    SM = max(abs(nbSpeech))           # Get maximum amplitude
    nbSpeech = nbSpeech/float(SM)    # Scale maximum to 1 (note the float)
    # number per bits = NB
    quant_speech = np.round(nbSpeech*(2**(NB-1)-0.5)-0.5)
    iquant_speech = int16(quant_speech)
    iquant_speech = int16((iquant_speech+0.5) * (2**(16-NB)))
    print "Quantised narrowband speech using ", NB, " per sample:"
    nb_speech_file = "speechNB"+str(NB)+".wav"
    wavfile.write(nb_speech_file, Fs, iquant_speech)
    audio_from_file(nb_speech_file)
    # score PESQ
    ! rm pesq_results.txt
    # Running PESQMain (in working directory with ‘chmod a+x’ set ) to obtain PESQ-MOS score:
    ! ./linux_pesqmain +8000 NarrobandSpeech8k.wav $nb_speech_file > /dev/null
    # comp28512_utils must be in working directory
    pesq_results = get_pesq_scores()
    score = pesq_results["NarrobandSpeech8k.wav"] [nb_speech_file] 
    print "PESQMain score for NarrobandSpeech8k.wav against "+nb_speech_file+" = ", score
    print
    

uniform_quant(10, nbSpeech) # NB=10
uniform_quant(8, nbSpeech) # NB=8
uniform_quant(6, nbSpeech) # NB=6
uniform_quant(4, nbSpeech) # NB=4
uniform_quant(3, nbSpeech) # NB=3

### Part 1.6 Q&A:
1. Can you hear any difference between the original 16 bit per sample version and your 8 bit version? 
   * There seems to be some slight noises in the background for 8 bit version.
2. Taking ‘Narrobandspeech8k.wav’ as the reference, what are the PESQ scores for 
   * (a) your 8 bit per sample version 3.834
   * (b) a 4-bit per sample version 2.338
   * (c) any others you tested? Yes, NB=10: 4.238; NB=6: 3.005; NB=3: 2.037   
3. Compare your own assessments with the PESQ scores obtained for several values of NB.
   * Basically the same, the sounds with higher scores are higher in quality for me. 
4. Decide what you consider to be a reasonable number of bits (NB) per sample for telephone speech when the sampling rate is 8 kHz with uniform quantisation. Summarise your reasons in one sentence, and note whether your decision is significantly different from the PESQ assessment. 
   * I think NB=6 could be appropriate, since the sound quality is decent and clear, and compared to PESQ score, just a little under "good" according to PESQ MOS score table.
5. You have heard that land-line telephone calls use 64000 bits/second links. Based on your experiments today, do you consider that 8 bits per sample with uniform quantisation may be acceptable for telephone quality speech sampled at 8 kHz? 
   * Yes, as long as the sounds are decipherable and does not have too much noises. I think 6 bits per sample is acceptable, 8 bits will be even better.
6. Mobile telephony cannot afford 64000 bits/second, and must use considerably less than 16,000  bits/second. 
   * How  many bits per sample would be possible using 16000 bits/second with uniform quantisation of speech sampled at 8 kHz?
     * NB = 16000/8000 = 2 bits
   * Based  on  your experiments, do you believe that reasonable quality speech can be encoded in this way for mobile telephony?
     * No, do not think the quality can reach a reasonable state, the sounds could be so unclear.

## Part 1.7. Log-PCM encoding speech

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from numpy import int16
from scipy.io import wavfile
from comp28512_utils import Audio, audio_from_file, get_pesq_scores 
% matplotlib inline	

# mulaw function and inverse
def mu_law(x):
    mu = 255
    y = np.sign(x)*np.log(1+mu*abs(x))/np.log(1+mu)
    return y


def inverse_mu_law(y): 
    mu = 255
    u = np.sign(y)*(np.exp(np.log(1+mu)*abs(y))-1)/mu
    return u


# plot compressor and expander
fig, axs = plt.subplots(1)
xc = np.arange(-1, 1, 0.01)
yc = mu_law(xc)
axs.plot(xc, yc)
axs.grid(True)
axs.set_xlabel("Before companding: x") 
axs.set_ylabel("After companding(mu-law(x)): y")
axs.set_title("Compressor Graph(output y against input x)")

fig, axs = plt.subplots(1)
xe = np.arange(-1, 1, 0.01)
ye = inverse_mu_law(xe)
axs.plot(xe, ye)
axs.grid(True)
axs.set_xlabel("Before expanding: y") 
axs.set_ylabel("After expanding(inverse-mu-law(y)): u")
axs.set_title("Expander Graph(output u against input y)")

# read speech from wavfile
(Fs, speech) = wavfile.read("NarrobandSpeech8k.wav")
print "Sampling frequency Fs as read from wav-file: ",Fs," Hz"


def quant_cmp(NB, speech):
    # scale to [-1, 1]
    SM = max(abs(speech))  # Get maximum amplitude
#     print where(abs(speech)==max(abs(speech)))
#     print abs(speech)[108630]
#     print abs(speech)[101515]
    speech = speech/float(SM)    # Scale maximum to 1 (note the float)
    
    
    # mu-law NB = 8
    mu_speech = mu_law(speech)
    SM = max(abs(mu_speech))           # Get maximum amplitude
    mu_speech = mu_speech/float(SM)    # Scale maximum to 1 (note the float)
    quant_speech = np.round(mu_speech*(2**(NB-1)-0.5)-0.5)
    iquant_speech = int16(quant_speech)
    iquant_speech = int16((iquant_speech+0.5) * (2**(16-NB)))
    
    SM = max(abs(iquant_speech))           # Get maximum amplitude
    iquant_speech = iquant_speech/float(SM)    # Scale maximum to 1 (note the float)
    iquant_speech = inverse_mu_law(iquant_speech)
    # quantisize to 16 bit
    mu_speech = int16(iquant_speech * (2**15))
    print("NB="+str(NB)+" mu-law speech:")
    mu_file_name = "mu-law"+str(NB)+".wav"
    wavfile.write(mu_file_name, Fs, mu_speech)
    audio_from_file(mu_file_name)
    
    # uniform version NB = 8
    quant_speech = np.round(speech*(2**(NB-1)-0.5)-0.5)
    iquant_speech = int16(quant_speech)
    iquant_speech = int16((iquant_speech+0.5) * (2**(16-NB)))
    print("NB="+str(NB)+" uniform quantisation speech:")
    file_name = "speechNB"+str(NB)+".wav"
    wavfile.write(file_name, Fs, iquant_speech)
    audio_from_file(file_name)

    # score PESQ
    ! rm pesq_results.txt
    # Running PESQMain (in working directory with ‘chmod a+x’ set ) to obtain PESQ-MOS score:
    ! ./linux_pesqmain +8000 NarrobandSpeech8k.wav $mu_file_name > /dev/null
    pesq_results = get_pesq_scores()
    score = pesq_results["NarrobandSpeech8k.wav"] [mu_file_name] 
    print "PESQMain score for NarrobandSpeech8k.wav against "+mu_file_name+" = ", score
    
    ! rm pesq_results.txt
    # Running PESQMain (in working directory with ‘chmod a+x’ set ) to obtain PESQ-MOS score:
    ! ./linux_pesqmain +8000 NarrobandSpeech8k.wav $file_name > /dev/null
    pesq_results = get_pesq_scores()
    score = pesq_results["NarrobandSpeech8k.wav"] [file_name] 
    print "PESQMain score for NarrobandSpeech8k.wav against "+file_name+" = ", score
    
    print

    
quant_cmp(8, speech)
quant_cmp(7, speech)
quant_cmp(6, speech)

### Part 1.7 Q&A:
1. What do we learn from the mu-law companding and expansion graphs?
   * expansion is the inverse version of companding.
   * mu-law increase the bits used for low values(closed to 0) rather than uniformly quantisation.
2. Compare mu-Law PCM speech at 64000 bit/s with the result of uniform quantisation at the same bit-rate. Give PESQ scores and your own assessments for both.
   * It is much better than the uniform quantisation version, according to PESQ scores. And for me, I can tell the difference between them and clearly mu-law version is better.
3. If you have time, compare mu-Law PCM speech with NB=7, 6, etc. with result of uniform quantisation at the same bit-rate (56,000 bit/s, 48,000 bit/s, etc).
   * For NB=7 and NB=6, mu-law versions are all better(clearer and less noises) than uniform quantisation version.