Skip to content
View xieh97's full-sized avatar
:octocat:
I may be slow to respond.
:octocat:
I may be slow to respond.

Block or report xieh97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
xieh97/README.md

Hello World! πŸ‘‹ I'm Huang Xie (谒晃)

"Science is an error-correcting process."

β€” Charles S. Peirce

✨ About Me

I am a PhD student specializing in Machine Learning and Signal Processing, with a research focus on Audio-Language Learning, Audio Information Retrieval, and Audio Content Analysis. My work explores topics such as contrastive learning, zero-shot learning, multimodal learning, language-based audio retrieval, and audio classification. My full resume can be found here.

πŸ”₯ Research Interests

  • Audio-Language Learning focuses on developing systems that integrate audio signals with natural language, enabling seamless interpretation and interaction between these modalities. It employs deep learning, transformer models, and multimodal alignment strategies to map audio features to textual representations. Common applications include audio captioning, spoken language understanding, language-based audio retrieval, and audio question answering.
  • Audio Information Retrieval involves analyzing and retrieving unstructured information from large-scale audio datasets. It leverages signal processing, feature extraction, machine learning, and indexing methods to organize and search audio content efficiently. Key applications include music recommendation, sound classification, similarity-based retrieval, and audio fingerprinting.
  • Audio Content Analysis focuses on extracting meaningful patterns and insights from audio signals. It utilizes signal decomposition, feature extraction, deep learning, and statistical modeling to analyze different sound components. It enables tasks like speech recognition, sound event detection, audio sentiment analysis, and music genre classification.

🎯 Tech and Interests

πŸ“š Publications

  • πŸ“ƒ H. Xie, K. Khorrami, O. RΓ€sΓ€nen and T. Virtanen, "Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions," in IEEE Signal Processing Letters, vol. 32, pp. 221-225, 2025, doi: 10.1109/LSP.2024.3511414. πŸ”₯πŸ”₯πŸ”₯
  • πŸ“ƒ H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2024, pp. 201-205. arXiv
  • πŸ“ƒ H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2023, pp. 226-230. arXiv
  • πŸ“ƒ H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "On Negative Sampling for Contrastive Audio-Text Retrieval," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2023, pp. 1-5. arXiv
  • πŸ“ƒ H. Xie, S. Lipping, and T. Virtanen, "Language-based Audio Retrieval Task in DCASE 2022 Challenge," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2022, pp. 216-220. arXiv
  • πŸ“ƒ H. Xie, O. RΓ€sΓ€nen, K. Drossos, and T. Virtanen, "Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2022, pp. 8867-8871. arXiv
  • πŸ“ƒ H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2021, pp. 326-330. arXiv
  • πŸ“ƒ H. Xie and T. Virtanen, "Zero-Shot Audio Classification via Semantic Embeddings," in IEEE/ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 1233-1242, 2021. arXiv
  • πŸ“ƒ H. Xie and T. Virtanen, "Zero-Shot Audio Classification Based on Class Label Embeddings," in Proc. Work. Appl. Signal Process. Audio and Acoustic. (WASPAA), 2019, pp. 264-267. arXiv

πŸ† Activities

  • πŸ‘¨β€πŸ”¬ Active reviewer for journals and conferences, including TASLP, SPL, ICASSP, INTERSPEECH, IJCNN, EUSIPCO, WASPAA, and others.
  • πŸ§‘β€πŸ’» Task coordinator for Language-based Audio Retrieval in DCASE Challenge 2024 (Task 8).
  • πŸ§‘β€πŸ’» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2023 (Task 6).
  • πŸ§‘β€πŸ’» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2022 (Task 6).

πŸ’¬ Connect with Me

Pinned Loading

  1. contrastive-negative-sampling Public

    Source code for negative sampling for contrastive audio-text retrieval (ICASSP 2023)

    Python 3

  2. audio-caption-aligning Public

    Source code for audio-caption aligning (ICASSP 2022)

    Python

  3. dcase2023-audio-retrieval Public

    Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge

    Python 9 3

  4. dcase2022-audio-retrieval Public

    Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2022 Challenge

    Python 7 1

  5. retrieval-relevance-crowdsourcing Public

    Data and instructions for crowdsourcing text-based audio retrieval relevance

    HTML

  6. audiocaps-dl Public

    Python program to download AudioCaps from YouTube.com

    Python 1