"Science is an error-correcting process."
β Charles S. Peirce
I am a PhD student specializing in Machine Learning and Signal Processing, with a research focus on Audio-Language Learning, Audio Information Retrieval, and Audio Content Analysis. My work explores topics such as contrastive learning, zero-shot learning, multimodal learning, language-based audio retrieval, and audio classification. My full resume can be found here.
- Audio-Language Learning focuses on developing systems that integrate audio signals with natural language, enabling seamless interpretation and interaction between these modalities. It employs deep learning, transformer models, and multimodal alignment strategies to map audio features to textual representations. Common applications include audio captioning, spoken language understanding, language-based audio retrieval, and audio question answering.
- Audio Information Retrieval involves analyzing and retrieving unstructured information from large-scale audio datasets. It leverages signal processing, feature extraction, machine learning, and indexing methods to organize and search audio content efficiently. Key applications include music recommendation, sound classification, similarity-based retrieval, and audio fingerprinting.
- Audio Content Analysis focuses on extracting meaningful patterns and insights from audio signals. It utilizes signal decomposition, feature extraction, deep learning, and statistical modeling to analyze different sound components. It enables tasks like speech recognition, sound event detection, audio sentiment analysis, and music genre classification.
- π² Machine Learning / Deep Learning (PyTorch, MLflow, Ray Tune, scikit-learn, etc.)
- π Data Analysis (NumPy, SciPy, Pandas, etc.)
- π Audio & Text Analysis (Librosa, NLTK, etc.)
- π Visualization (Matplotlib, etc.)
- π Software Development (Django, Spring, Hibernate, etc.)
- π» Programming (Python, Java, JavaScript, SQL, etc.)
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen and T. Virtanen, "Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions," in IEEE Signal Processing Letters, vol. 32, pp. 221-225, 2025, doi: 10.1109/LSP.2024.3511414. π₯π₯π₯
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2024, pp. 201-205. arXiv
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2023, pp. 226-230. arXiv
- π H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "On Negative Sampling for Contrastive Audio-Text Retrieval," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2023, pp. 1-5. arXiv
- π H. Xie, S. Lipping, and T. Virtanen, "Language-based Audio Retrieval Task in DCASE 2022 Challenge," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2022, pp. 216-220. arXiv
- π H. Xie, O. RΓ€sΓ€nen, K. Drossos, and T. Virtanen, "Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2022, pp. 8867-8871. arXiv
- π H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2021, pp. 326-330. arXiv
- π H. Xie and T. Virtanen, "Zero-Shot Audio Classification via Semantic Embeddings," in IEEE/ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 1233-1242, 2021. arXiv
- π H. Xie and T. Virtanen, "Zero-Shot Audio Classification Based on Class Label Embeddings," in Proc. Work. Appl. Signal Process. Audio and Acoustic. (WASPAA), 2019, pp. 264-267. arXiv
- π¨βπ¬ Active reviewer for journals and conferences, including TASLP, SPL, ICASSP, INTERSPEECH, IJCNN, EUSIPCO, WASPAA, and others.
- π§βπ» Task coordinator for Language-based Audio Retrieval in DCASE Challenge 2024 (Task 8).
- π§βπ» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2023 (Task 6).
- π§βπ» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2022 (Task 6).
- π§ Drop me an email at huang.xie@outlook.com