Skip to content
Permalink
Branch: master
Find file Copy path
3 contributors

Users who have contributed to this file

@urialon @ronw @jonathanasdf
516 lines (406 sloc) 17.3 KB

List of publications using Lingvo.

Translation

[1] M. X. Chen, O. Firat, A. Bapna, M. Johnson, W. Macherey, G. Foster, L. Jones, M. Schuster, N. Shazeer, N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, Z. Chen, Y. Wu, and M. Hughes, “The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation,” in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), 2018. [ pdf ]
[2] C. Cherry, G. Foster, A. Bapna, O. Firat, and W. Macherey, “Revisiting character-based neural machine translation with capacity and compression,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[3] A. Bapna, M. X. Chen, O. Firat, Y. Cao, and Y. Wu, “Training deeper neural machine translation models with transparent attention,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[4] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, “Google's neural machine translation system: Bridging the gap between human and machine translation,” tech. rep., 2016. [ pdf ]

Speech recognition

[1] C.-C.Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani, “State-of-the-art speech recognition with sequence-to-sequence models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[2] S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and K. Rao, “Multilingual speech recognition with a single end-to-end model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[3] B. Li, T. N. Sainath, K. Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen, Y. Wu, and K. Rao, “Multi-Dialect Speech Recognition With a Single Sequence-to-Sequence Model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[4] T. N. Sainath, P. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach, V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen, and C. C. Chiu, “No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[5] D. Lawson, C. C. Chiu, G. Tucker, C. Raffel, K. Swersky, and N. Jaitly, “Learning hard alignments with variational inference,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[6] A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An analysis of incorporating an external language model into a sequence-to-sequence model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[7] R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C. C. Chiu, and A. Kannan, “Minimum Word Error Rate Training for Attention-based Sequence-to-sequence Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[8] T. N. Sainath, C. C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen, and Z. C. Z, “Improving the Performance of Online Neural Transducer Models,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[9] C. C. Chiu and C. Raffel, “Monotonic Chunkwise Attention,” in Proc. International Conference on Learning Representations (ICLR), 2018. [ pdf ]
[10] I. Williams, A. Kannan, P. Aleksic, D. Rybach, and T. N. S. TN, “Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search,” in Proc. Interspeech, 2018. [ pdf ]
[11] C. C. Chiu, A. Tripathi, K. Chou, C. Co, N. Jaitly, D. Jaunzeikare, A. Kannan, P. Nguyen, H. Sak, A. Sankar, J. Tansuwan, N. Wan, Y. Wu, and X. Zhang, “Speech recognition for medical conversations,” in Proc. Interspeech, 2018. [ pdf ]
[12] R. Pang, T. N. Sainath, R. Prabhavalkar, S. Gupta, Y. Wu, S. Zhang, and C. C. Chiu, “Compression of End-to-End Models,” in Proc. Interspeech, 2018. [ pdf ]
[13] S. Toshniwal, A. Kannan, C. C. Chiu, Y. Wu, T. N. Sainath, and K. Livescu, “A comparison of techniques for language model integration in encoder-decoder speech recognition,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]
[14] G. Pundak, T. N. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, “Deep context: End-to-end contextual speech recognition,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]
[15] B. Li, Y. Zhang, T. N. Sainath, Y. Wu, and W. Chan, “Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[16] J. Guo, T. N. Sainath, and R. J. Weiss, “A spelling correction model for end-to-end speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
[17] U. Alon, G. Pundak, and T. N. Sainath, “Contextual speech recognition with difficult negative training examples,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]

Language understanding

[1] A. Kannan, K. Chen, D. Jaunzeikare, and A. Rajkomar, “Semi-Supervised Learning for Information Extraction from Dialogue,” in Proc. Interspeech, 2018. [ pdf ]
[2] S. Yavuz, C. C. Chiu, P. Nguyen, and Y. Wu, “CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization,” in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ pdf ]
[3] P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur, P. Moreno, R. Prabhavalkar, Z. Qu, and A. Waters, “From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2018. [ pdf ]

Speech synthesis

[1] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerry-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[2] J. Chorowski, R. J. Weiss, R. A. Saurous, and S. Bengio, “On using backpropagation for speech texture generation and voice conversion,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. [ pdf ]
[3] Y. Jia, Y. Zhang, R. J. Weiss, Q. Wang, J. Shen, F. Ren, Z. Chen, P. Nguyen, R. Pang, I. Lopez-Moreno, and Y. Wu, “Transfer learning from speaker verification to multispeaker text-to-speech synthesis,” in Advances in Neural Information Processing Systems, 2018. [ pdf ]
[4] W. N. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia, Z. Chen, J. Shen, P. Nguyen, and R. Pang, “Hierarchical generative modeling for controllable speech synthesis,” in Proc. International Conference on Learning Representations (ICLR), 2019. [ pdf ]
[5] W. N. Hsu, Y. Zhang, R. J. Weiss, Y. A. Chung, Y. Wang, Y. Wu, and J. Glass, “Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization,” in NeurIPS 2018 Workshop on Interpretability and Robustness in Audio, Speech, and Language, 2018. [ pdf ]

Speech-to-text translation

[1] R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen, “Sequence-to-sequence models can directly translate foreign speech,” in Proc. Interspeech, 2017. [ pdf ]
[2] Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C. C. Chiu, N. Ari, S. Laurenzo, and Y. Wu, “Leveraging weakly supervised data to improve end-to-end speech-to-text translation,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. [ pdf ]
You can’t perform that action at this time.