Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing features as 'ark,scp' by pipeline with 'copy-feats' #45

Open
faber6911 opened this issue May 3, 2020 · 2 comments
Open

Writing features as 'ark,scp' by pipeline with 'copy-feats' #45

faber6911 opened this issue May 3, 2020 · 2 comments

Comments

@faber6911
Copy link

Hi,
I can't correctly execute the example you provided to write the .ark and .scp files at the same time.
error_kaldi_io
If instead I create the ark file and use copy-feats to create its copy and the attached .scp file, I don't encounter any problems.

@jensen199105
Copy link

i wanna ask the the same thing, here is my code snippet and errors:

`import numpy as np
from kaldiio import ReadHelper
import kaldi_io

ark_scp_output = 'ark:| copy-feats ark:- ark,scp:ark:/home/jensen/Document/feats.ark,scp:/home/jensen/Document/feats.scp'

with kaldi_io.open_or_fd(ark_scp_output, 'w') as f:
dic = {}
for i in range(10):
arr = np.random.randn(200, 10)
dic[str(i)] = arr
for k,v in dic.items():
kaldi_io.write_mat(f, v, k)`
copy-feats ark:- ark,scp:ark:/home/jensen/Document/feats.ark,scp:/home/jensen/Document/feats.scp
WARNING (copy-feats[5.5.8391-0c6a]:Open():util/kaldi-table-inl.h:1311) When writing to both archive and script, the script file will generally not be interpreted correctly unless the archive is an actual file: wspecifier = ark,scp:ark:/home/jensen/Document/feats.ark,scp:/home/jensen/Document/feats.scp
WARNING (copy-feats[5.5.839
1-0c6a]:Open():kaldi-io.cc:729) Invalid output filename format ark:/home/jensen/Document/feats.ark
ERROR (copy-feats[5.5.839~1-0c6a]:TableWriter():util/kaldi-table-inl.h:1469) Failed to open table for writing with wspecifier: ark,scp:ark:/home/jensen/Document/feats.ark,scp:/home/jensen/Document/feats.scp: errno (in case it's relevant) is: Success

[ Stack-Trace: ]
copy-feats(kaldi::MessageLogger::LogMessage() const+0x77b) [0x561ab18e1275]
copy-feats(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x561ab1852001]
copy-feats(kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::MatrixBase > >::TableWriter(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0xee) [0x561ab1861ba6]
copy-feats(main+0x4c9) [0x561ab184ff92]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f82d5ae10b3]
copy-feats(_start+0x2e) [0x561ab184fa0e]

kaldi::KaldiFatalErrorException in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/jensen/.local/lib/python3.8/site-packages/kaldi_io/kaldi_io.py", line 97, in cleanup
raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
kaldi_io.kaldi_io.SubprocessFailed: cmd copy-feats ark:- ark,scp:ark:/home/jensen/Document/feats.ark,scp:/home/jensen/Document/feats.scp returned 255 !
Traceback (most recent call last):
File "test.py", line 13, in
kaldi_io.write_mat(f, v, k)
File "/home/jensen/.local/lib/python3.8/site-packages/kaldi_io/kaldi_io.py", line 554, in write_mat
fd.write(m.tobytes())
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 13, in
kaldi_io.write_mat(f, v, k)
BrokenPipeError: [Errno 32] Broken pipe

@faber6911
Copy link
Author

@jensen199105
I solved it by importing WriteHelper, in particular using "from kaldiio import WriteHelper" and using a script like this:

abs_path = args.abs_path
ark_path_train = os.path.join(abs_path, 'data/train/train.ark')
scp_path_train = os.path.join(abs_path, 'data/train/train.scp')
ark_path_test = os.path.join(abs_path, 'data/test/test.ark')
scp_path_test = os.path.join(abs_path, 'data/test/test.scp')


start = time.time()

if not os.path.isfile(ark_path_train):

    writer = WriteHelper('ark,scp:{},{}'.format(ark_path_train, scp_path_train), compression_method=compression_method)

    noise_choice = {'music':659, 'noise':929, 'speech':425}

    for count, line in enumerate(open('../data/train/wav.scp')):
        # clean audio path
        utt, path = line.rstrip().split()
        # clean audio file
        clean_audio, _ = librosa.load(path, sr = sample_rate)
        # now for every noise type we augment n times the clean audio file using random noise audio files
        for noise_type in noise_choice:
            for aug in range(train_augmentation):
                noise_track = np.random.randint(0, noise_choice[noise_type])
                _, noise_path = open('../data/musan_{}.scp'.format(noise_type)).readlines()[noise_track].rstrip().split()
                noise_audio, _ = librosa.load(noise_path, sr = sample_rate)
                noisy_audio = add_noise(clean_audio, noise_audio, snr=random.choice([2.5, 7.5, 12.5, 17.5]))
                # write ark and associated scp file in train directory
                writer(utt,np.concatenate((clean_audio.reshape(1, -1), noisy_audio.reshape(1, -1))))

Using this strategy you will be able to create the ark file and the associated scp file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants