NLP & Speech

Papers published at the venues related to NLP and speech such as ACL, EMNLP, NAACL, ICASSP, Interspeech, etc.


Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho. KLUE: Korean Language Understanding Evaluation. arXiv. 2021.

Jaesong Lee, Jingu Kang, Shinji Watanabe (CMU). Layer Pruning on Demand with Intermediate CTC. Interspeech 2021.

Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung. Adapting Speaker Embeddings for Speaker Diarisation. Interspeech 2021.

Huu-Kim Nguyen (Yonsei Univ.), Kihyuk Jeong (Yonsei Univ.), Seyun Um (Yonsei Univ.), Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang (Yonsei Univ.). LiteTTS: A Decoder-free Lightweight Text-to-wave Synthesis Based on Generative Adversarial Networks. Interspeech 2021.

Eunbi Choi, Hwayeon Kim, Jonghwan Kim, Jae-Min Kim. Label Embedding for Chinese Grapheme-to-Phoneme Conversion. Interspeech 2021.

Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim. High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model. Interspeech 2021.

Hemlata Tak (EURECOM), Jee-weon Jung, Jose Patino (EURECOM), Massimiliano Todisco (EURECOM) and Nicholas Evans (EURECOM). Graph Attention Networks for Anti-Spoofing. Interspeech 2021.

Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee. Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network. Interspeech 2021.

Lukas Lee, Youna Ji, Minjae Lee, Min-Seok Choi. DEMUCS-Mobile : On-device lightweight speech enhancemen. Interspeech 2021.

You Jin Kim, Hee-Soo Heo, So Yeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung. Look Who’s Talking: Active Speaker Detection in the Wild. Interspeech 2021.

Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park. GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. arXiv. 2021

Raphael Shu, Kang Min Yoo, Jung-Woo Ha. Reward Optimization for Neural Machine Translation with Learned Metrics. arXiv. 2021. Github

Seunghyun Seo, Donghyun Kwak, Bowon Lee (Inha Univ.). Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding. arXiv. 2021

Yeon Seonwoo (KAIST), Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh (KAIST). Weakly Supervised Pre-Training for Multi-Hop Retriever. ACL 2021 (Findings).

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang (KAIST), Minjoon Seo (KAIST). Spatial Dependency Parsing for Semi-Structured Document Information Extraction. ACL 2021 (Findings).

Soyoung Yoon, Gyuwan Kim, Gyumin Park (KAIST). SSMix: Saliency-based Span Mixup for Text Classification. ACL 2021 (Findings).

Taeuk Kim, Kang Min Yoo, Sang-goo Lee (SNU). Self-Guided Contrastive Learning for BERT Sentence Representations. ACL 2021.

Sungdong Kim, Minsuk Chang, Sang-Woo Lee. NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation. ACL 2021.

Gyuwan Kim, Kyunghyun Cho (NYU). Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search. ACL 2021.

Sohee Yang, Minjoon Seo. Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering. NAACL 2021.

Seongbin Kim*, Gyuwan Kim*, Seongjin Shin, Sangmin Lee (Inha Univ). Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding. ICASSP 2021.

Minjeong Kim, Gyuwan Kim, Sang-Woo Lee, Jung-Woo Ha. ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding. ICASSP 2021.

Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis. ICASSP 2021.

Ryuichi Yamamoto, Eunwoo Song , Min-Jae Hwang, Jae-Min Kim. Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators. ICASSP 2021.

Hwayeon Kim, Jonghwan Kim, Jae Min Kim. NN-KOG2P: A Novel Grapheme-Phoneme model for Korean language. ICASSP 2021.

Yoohwan Kwon, Hee-Soo Heo, Bong-Jin Lee, Joon Son Chung. The ins and outs of speaker recognition: lessons from VoxSRC 2020. ICASSP 2021.

Andrew Brown (U. of Oxford), Jaesung Huh (U. of Oxford), Arsha Nagrani (U. of Oxford), Joon Son Chung, Andrew Zisserman (U. of Oxford). Playing a Part: Speaker Verification at the Movies. ICASSP 2021.

Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu(UOS), Joon Son Chung. Graph Attention Networks for Speaker Verification. ICASSP 2021.

Jaesong Lee, Shinji Watanabe (CMU). Intermediate Loss Regularization for CTC-based Speech Recognition. ICASSP 2021.


Sungrae Park, Geewook Kim, Junyeop Lee, Junbum Cha, Ji-Hoon Kim, Hwalsuk Lee. Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model. COLING 2020.

Gyuwan Kim, Tae-Hwan Jung. Large Product Key Memory for Pretrained Language Models. EMNLP 2020 (Findings).

Kang Min Yoo, Hanbit Lee (Seoul National University), Franck Dernoncourt (Adobe), Trung Bui (Adobe), Walter Chang (Adobe), Sang-goo Lee (Seoul National University). Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation. EMNLP 2020.

Yeon Seonwoo (KAIST), Ji-Hoon Kim, Jung-Woo Ha, Alice Oh (KAIST). Context-Aware Answer Extraction in Question Answering. EMNLP 2020.

Jisung Wang, Jihwan Kim (VUNO), Sangki Kim (VUNO), Yeha Lee (VUNO). Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition. Interspeech 2020.

Teakgyu Hong, Oh-Woog Kwon(ETRI) Institute), Young-Kil Kim (ETRI). End-to-End Task-oriented Dialog System through Template Slot Value Generation. Interspeech 2020.

Won Ik Cho(Seoul National University), Donghyun Kwak, Jiwon Yoon (Seoul National University), Nam Soo Kim (Seoul National University). Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation. Interspeech 2020.

Eunwoo Song, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim. Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder. Interspeech 2020.

Triantafyllos Afouras (University of Oxford), Joon Son Chung, Andrew Zisserman(University of Oxford). Now you're speaking my language: Visual language identification. Interspeech 2020.

Soo-Whan Chung, Hong-Goo Kang (Yonsei University) , Joon Son Chung. Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision. Interspeech 2020.

Joon Son Chung, Jaesung Huh, Arsha Nagrani (University of Oxford), Triantafyllos Afouras (University of Oxford), Andrew Zisserman(University of Oxford). Spot the conversation: speaker diarisation in the wild. Interspeech 2020.

Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang (Yonsei University). FaceFilter: Audio-visual speech separation using still images. Interspeech 2020.

Hye-jin Shim(University of Seoul), Hee-Soo Heo, Jee-weon Jung(University of Seoul), Ha-Jin Yu(University of Seoul). Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection. Interspeech 2020.

Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim. ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers. Interspeech 2020.

Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han. In defence of metric learning for speaker recognition. Interspeech 2020.

Sang-Woo Lee, Hyunhoon Jung, SukHyun Ko, Sunyoung Kim, Hyewon Kim, Kyoungtae Doh, Hyunjung Park, Joseph Yeo, Sang-Houn Ok, Joonhaeng Lee, Sungsoon Lim, Minyoung Jeong, Seongjae Choi, SeungTae Hwang, Eun-Young Park (Seongnam city), Gwang-Ja Ma (Seongnam city), Seok-Joo Han (Seongnam city), Kwang-Seung Cha (Seongnam city), Nako Sung, Jung-Woo Ha. CareCall: a Call-Based Active Monitoring Dialog Agent for Managing COVID-19 Pandemic. ArXiv. 2020

Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha. Efficient Active Learning for Automatic Speech Recognition via Augmented Consistency Regularization. ArXiv. 2020

Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee. Efficient Dialogue State Tracking by Selectively Overwriting Memory. ACL 2020.

Jinhyuk Lee, Minjoon Seo, Hanna Hajishirzi (Univ. of Washington), Jaewoo Kang (Korea Univ.). Contextualized Sparse Representations for Real-Time Open-Domain Question Answering. ACL 2020.

Minz Won (Univ. of Pompeu Fabra), Sanghyuk Chun, Oriol Nieto (Pandora), Xavier Serra (Univ. of Pompeu Fabra). Data-Driven Harmonic Filters for Audio Representation Learning. ICASSP 2020.

Seongkyu Mun, Soyeon Choe, Jaesung Huh, Joon Son Chung. The Sound of My Voice: Speaker Representation Loss for Target Voice Separation. ICASSP 2020.

Arsha Nagrani* (Univ. of Oxford), Joon Son Chung*, Samuel Albanie (Univ. of Oxford), Andrew Zisserman (Univ. of Oxford). Disentangled Speech Embeddings Using Cross-modal Self-supervision. ICASSP 2020.

Triantafyllos Afouras (Univ. of Oxford), Joon Son Chung, Andrew Zisserman (Univ. of Oxford). ASR is All You Need: Cross-modal Distillation for Lip Reading. ICASSP 2020.

Hsuan-I Ho, Minho Shim, Dongyoon Wee. Learning From Dances : Pose-invariant Re-identification for Multi-person Tracking. ICASSP 2020.

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim. Parallel WaveGAN: A Fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. ICASSP 2020.

Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong (MSRA), Hong-Goo Kang (Yonsei Univ). Improving LPCNet-based text-to-speech with linear prediction-structured mixture density network. ICASSP 2020.


Daesik Kim, Seonhoon Kim, Nojun Kwak. Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension. ACL 2019. ​

Kyungjae Lee, Sunghyun Park, Hojae Han, Jinyoung Yeo, Seung-won Hwang, Juho Lee. Learning with Limited Data for Multilingual Reading Comprehension. EMNLP 2019. ​

Seonhoon Kim, Inho Kang, Nojun Kwak. Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information. AAAI 2019. ​ ​Sunghyun Park, Seung-won Hwang,

Fuxiang Chen, Jaegul Choo, Jung-Woo Ha, Sunghun Kim, Jinyeong Yim. Paraphrase Diversification using Counterfactual Debiasing. AAAI 2019.

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. arXiv. 2019

Fuxiang Chen, Seung-won Hwang (Yonsei Univ.), Jaegul Choo (Korea Univ.), Jung-Woo Ha, Sung Kim. NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions. EMNLP-IJCNLP 2019. 2019

Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi (Univ. of Washington). Mixture Content Selection for Diverse Sequence Generation. EMNLP-IJCNLP 2019. 2019

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019

Eunwoo Song, Kyungguen Byun, Hong-Goo Kang. ExcitNet Vocoder: A Neural Excitation Model for Parametric Speech Synthesis Systems. EUSIPCO 2019. 2019

Ohsung Kwon, Eunwoo Song, Jae-Min Kim,Hong-Goo Kang. Effective Parameter Estimation Methods for an ExcitNet Model in Generative Text-to-Speech Systems. arXiv. 2019

Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang. Excitation-by-SampleRNN Model for Text-to-Speech. ITC-CSCC 2019. 2019

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim. Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation. INTERSPEECH 2019. 2019

Min-Jae Hwang, Hong-Goo Kang. Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment. INTERSPEECH 2019. 2019

Joon Son Chung, Bong-Jin Lee, Icksang Han. Who Said that: Audio-Visual Speaker Diarisation of Real-World Meetings. INTERSPEECH 2019. 2019

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman. My Lips are Concealed: Audio-Visual Speech Enhancement through Obstructions. INTERSPEECH 2019. 2019

Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index. ACL 2019. 2019

Sung Kyu Moon, Suwon Shon. Domain Mismatch Robust Acoustic Scene Classification Using Channel Information Conversion. ICASSP 2019.

Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang. Perfect Match: Improved Cross-Modal Embeddings for Audio-Visual Synchronisation. ICASSP 2019. 2019

Ohsung Kwon, Inseon Jang (ETRI), ChungHyun Ahn (ETRI), Hong-Goo Kang (Yonsei Univ.). An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis. IEEE Signal Processing Letters (presented @ ICASSP 2020). 2019

~ 2018

Jang-Hyun Kim, Jaejun Yoo, Sanghyuk Chun, Adrian Kim, Jung-Woo Ha. Multi-Domain Processing via Hybrid Denoising Networks for Speech Enhancement. arXiv. 2018

Eunwoo Song, Jinseob Kim, Kyungguen Byun, Hong-Goo Kang. Speaker-Adaptive Neural Vocoders for Statistical Parametric Speech Synthesis Systems. arXiv. 2018

Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi. Phrase-Indexed Question Answering: A New Challenge towards Scalable Document Comprehension. EMNLP 2018. 2018

Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang. A Unified Framework for the Generation of Glottal Signals in Deep Learning-Based Parametric Speech Synthesis Systems. Interspeech 2018. 2018

Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim, Eunwoo Song. Acoustic Modeling Using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis. Interspeech 2018. 2018

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman. Deep Lip Reading: a Comparison of Models and an Online Application. Interspeech 2018. 2018

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman. The Conversation: Deep Audio Visual Speech Enhancement. Interspeech 2018. 2018

Joon Son Chung, Arsha Nagrani, Andrew Zisserman. VoxCeleb2: Deep Speaker Recognition. Interspeech 2018. 2018

Eunwoo Song, Frank K. Soong, Hong-Goo Kang. Perceptual Quality and Modeling Accuracy of Excitation Parameters in DLSTM-Based Speech Synthesis Systems. ASRU 2017. 2017

Chanyoung Park, Kyungduk Kim, Songkuk Kim. Attention-based Dialog Embedding for Dialog Breakdown Detection. DSTC 2017