Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

while running run.sh in gop_speechocean it get error in visualize_feat.py AttributeError: 'tuple' object has no attribute 'shape' #20

Open
amandeepbaberwal opened this issue Mar 18, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@amandeepbaberwal
Copy link

`(env) amandeep@vitubuntu:~/Desktop/kaldi-master/egs/gop_speechocean762/s5$ ./run.sh
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/train
local/data_prep.sh: successfully prepared data in data/train
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/test
local/data_prep.sh: successfully prepared data in data/test
steps/make_mfcc.sh --nj 1 --mfcc-config conf/mfcc_hires.conf --cmd run.pl data/train
steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for train
steps/compute_cmvn_stats.sh data/train
Succeeded creating CMVN stats for train
fix_data_dir.sh: kept all 1 utterances.
fix_data_dir.sh: old files are kept in data/train/.backup
steps/make_mfcc.sh --nj 1 --mfcc-config conf/mfcc_hires.conf --cmd run.pl data/test
steps/make_mfcc.sh: moving data/test/feats.scp to data/test/.backup
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test
steps/compute_cmvn_stats.sh data/test
Succeeded creating CMVN stats for test
fix_data_dir.sh: kept all 1 utterances.
fix_data_dir.sh: old files are kept in data/test/.backup
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 1 data/train ../../librispeech/s5/exp/nnet3_cleaned/extractor data/train/ivectors
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to data/train/ivectors using the extractor in ../../librispeech/s5/exp/nnet3_cleaned/extractor.
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 1 data/test ../../librispeech/s5/exp/nnet3_cleaned/extractor data/test/ivectors
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to data/test/ivectors using the extractor in ../../librispeech/s5/exp/nnet3_cleaned/extractor.
steps/nnet3/compute_output.sh --cmd run.pl --nj 1 --online-ivector-dir data/train/ivectors data/train ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/probs_train
steps/nnet3/compute_output.sh: WARNING: no such file ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.raw. Trying ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.mdl instead.
steps/nnet3/compute_output.sh --cmd run.pl --nj 1 --online-ivector-dir data/test/ivectors data/test ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/probs_test
steps/nnet3/compute_output.sh: WARNING: no such file ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.raw. Trying ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.mdl instead.
Preparing phone lists
2 silence phones saved to: data/local/dict_nosp/silence_phones.txt
1 optional silence saved to: data/local/dict_nosp/optional_silence.txt
39 non-silence phones saved to: data/local/dict_nosp/nonsilence_phones.txt
5 extra triphone clustering-related questions saved to: data/local/dict_nosp/extra_questions.txt
Lexicon text file saved as: data/local/dict_nosp/lexicon.txt
utils/prepare_lang.sh --phone-symbol-table ../../librispeech/s5/data/lang_test_tgsmall/phones.txt data/local/dict_nosp data/local/lang_tmp_nosp data/lang_nosp
Checking data/local/dict_nosp/silence_phones.txt ...
--> reading data/local/dict_nosp/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/silence_phones.txt is OK

Checking data/local/dict_nosp/optional_silence.txt ...
--> reading data/local/dict_nosp/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/optional_silence.txt is OK

Checking data/local/dict_nosp/nonsilence_phones.txt ...
--> reading data/local/dict_nosp/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking data/local/dict_nosp/lexicon.txt
--> reading data/local/dict_nosp/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/lexicon.txt is OK

Checking data/local/dict_nosp/lexiconp.txt
--> reading data/local/dict_nosp/lexiconp.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/lexiconp.txt is OK

Checking lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt
--> lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt match

Checking data/local/dict_nosp/extra_questions.txt ...
--> reading data/local/dict_nosp/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict_nosp]

fstaddselfloops data/lang_nosp/phones/wdisambig_phones.int data/lang_nosp/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang_nosp
Checking existence of separator file
separator file data/lang_nosp/subword_separator.txt is empty or does not exist, deal in word case.
Checking data/lang_nosp/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp/phones.txt is OK

Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt

Checking data/lang_nosp/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.int corresponds to data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.csl corresponds to data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.{txt, int, csl} are OK

Checking data/lang_nosp/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 320 entry/entries in data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.int corresponds to data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.csl corresponds to data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.int corresponds to data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.csl corresponds to data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.int corresponds to data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.csl corresponds to data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 6 entry/entries in data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.int corresponds to data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.csl corresponds to data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.{txt, int, csl} are OK

Checking data/lang_nosp/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 41 entry/entries in data/lang_nosp/phones/roots.txt
--> data/lang_nosp/phones/roots.int corresponds to data/lang_nosp/phones/roots.txt
--> data/lang_nosp/phones/roots.{txt, int} are OK

Checking data/lang_nosp/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 41 entry/entries in data/lang_nosp/phones/sets.txt
--> data/lang_nosp/phones/sets.int corresponds to data/lang_nosp/phones/sets.txt
--> data/lang_nosp/phones/sets.{txt, int} are OK

Checking data/lang_nosp/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 14 entry/entries in data/lang_nosp/phones/extra_questions.txt
--> data/lang_nosp/phones/extra_questions.int corresponds to data/lang_nosp/phones/extra_questions.txt
--> data/lang_nosp/phones/extra_questions.{txt, int} are OK

Checking data/lang_nosp/phones/word_boundary.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 330 entry/entries in data/lang_nosp/phones/word_boundary.txt
--> data/lang_nosp/phones/word_boundary.int corresponds to data/lang_nosp/phones/word_boundary.txt
--> data/lang_nosp/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ...
--> reading data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> data/lang_nosp/phones/disambig.txt has "#0" and "#1"
--> data/lang_nosp/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang_nosp/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang_nosp/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang_nosp/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols...
--> data/lang_nosp/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking word_boundary.int and disambig.int
--> generating a 20 word/subword sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 10 word/subword sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK

Checking data/lang_nosp/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp/oov.txt
--> data/lang_nosp/oov.int corresponds to data/lang_nosp/oov.txt
--> data/lang_nosp/oov.{txt, int} are OK

--> data/lang_nosp/L.fst is olabel sorted
--> data/lang_nosp/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang_nosp]
steps/align_mapped.sh --cmd run.pl --nj 1 --graphs exp/ali_train data/train exp/probs_train ../../librispeech/s5/data/lang_test_tgsmall ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/ali_train
steps/align_mapped.sh: aligning data in data/train using model from ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp, putting alignments in exp/ali_train
steps/diagnostic/analyze_alignments.sh --cmd run.pl ../../librispeech/s5/data/lang_test_tgsmall exp/ali_train
steps/diagnostic/analyze_alignments.sh: see stats in exp/ali_train/log/analyze_alignments.log
steps/align_mapped.sh: done aligning data.
steps/align_mapped.sh --cmd run.pl --nj 1 --graphs exp/ali_test data/test exp/probs_test ../../librispeech/s5/data/lang_test_tgsmall ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/ali_test
steps/align_mapped.sh: aligning data in data/test using model from ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp, putting alignments in exp/ali_test
steps/diagnostic/analyze_alignments.sh --cmd run.pl ../../librispeech/s5/data/lang_test_tgsmall exp/ali_test
steps/diagnostic/analyze_alignments.sh: see stats in exp/ali_test/log/analyze_alignments.log
steps/align_mapped.sh: done aligning data.
local/visualize_feats.py --phone-symbol-table data/lang_nosp/phones-pure.txt exp/gop_train/feat.scp data/local/scores.json exp/gop_train/feats.png
Traceback (most recent call last):
File "local/visualize_feats.py", line 75, in
main()
File "local/visualize_feats.py", line 68, in main
features = TSNE(n_components=2).fit_transform(features)
File "/home/amandeep/Desktop/kaldi-master/egs/gop_speechocean762/s5/env/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py", line 1118, in fit_transform
self._check_params_vs_input(X)
File "/home/amandeep/Desktop/kaldi-master/egs/gop_speechocean762/s5/env/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py", line 828, in _check_params_vs_input
if self.perplexity >= X.shape[0]:
AttributeError: 'tuple' object has no attribute 'shape'
`

@YuanGongND
Copy link
Owner

If it occurs in gop_speechocean762/s5$ ./run.sh, then it should be a question to Kaldi maintainer. Are you test with your own data?

@YuanGongND YuanGongND added the bug Something isn't working label Mar 18, 2023
@amandeepbaberwal
Copy link
Author

amandeepbaberwal commented Mar 18, 2023 via email

@YuanGongND
Copy link
Owner

See the first line of warning utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea., it might due to you only use part of the data. Using the entire dataset might be a better idea since it is not large.

But again, this should be a question to https://github.com/kaldi-asr/kaldi/issues?q=is%3Aissue+gop , they might updated their code and I am not part of the recipe developers.

@infinite-darkness108
Copy link

infinite-darkness108 commented Jun 9, 2024

solved! In local/visualize_feats.py, add features=np.array(features)
You should now be able to see results in

  1. exp/gop_train/feats.png (t-SNE plot for two phones; the higher quality ones (as scored by 5 human experts) being well separated and low scored ones being in the mix of two clusters.)
  2. exp/gop_test/result_gop.txt (It is in the format

<utt_id>.0 2.0 1.0
<utt_id>.1 2.0 2.0
.
.
<utt_id>.n 1.0 0.0
.
.
.
so on
)
meaning, the utterance's zeroth phone was scored 2.0 by human expert (avg of 5) and the prediction by the GoP based approach is 1.0, and the same utterance's next canonical phone was given a score of 2.0 by both human experts and the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants