Technical Publications

2026

WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions. Sanjari Srivastava, Gang Li, Cheng Chang, Rishu Garg, Manpreet Kaur, Charlene Y. Lee, Yuezhang Li, Yining Mao, Ignacio Cases, Yanan Xie, Peng Qi. Accepted to ICLR 2026.
PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction. Simon Yu, Gang Li, Weiyan Shi, Peng Qi. Accepted to ICLR 2026.
AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning. Mengzhao Jia, Zhihan Zhang, Ignacio Cases, Zheyuan Liu, Meng Jiang, Peng Qi. Accepted to ACL 2026 Findings.
Towards Policy-Compliant Agents: Learning Efficient Guardrails for Policy Violation Detection. Xiaofei Wen, Wenjie Jacky Mo, Yanan Xie, Peng Qi, Muhao Chen. Accepted to ICML 2026.
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations. Chengzhi Liu, Yuzhe Yang, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yanan Xie, Peng Qi, Xin Eric Wang. Accepted to ICLR 2026.
Slot Filling as a Reasoning Task for SpeechLLMs. Kadri Hacioglu, Manjunath K E, Andreas Stolcke. Accepted to IEEE ICASSP 2026.
Text-only Adaptation in LLM-based ASR Through Text Denoising. Andrés Carofilis, Sergio Burdisso, Esaú Villatoro-Tello, Shashi Kumar, Kadri Hacioglu, Srikanth Madikeri, Pradeep Rangappa, Manjunath K. E., Petr Motlicek, Shankar Venkatesan, Andreas Stolcke. Accepted to IEEE ICASSP 2026.
Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection. Sergio Burdisso, Esaú Villatoro-Tello, Shashi Kumar, Srikanth Madikeri, Pradeep Rangappa, Andrés Carofilis, Manjunath K. E., Kadri Hacioğlu, Petr Motlicek, Andreas Stolcke. Accepted to IEEE ICASSP 2026.

2025

Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings. Aaron Zheng, Mansi Rana, Andreas Stolcke. Accepted to COLING 2025.
Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems. Mansi Rana, Kadri Hacioglu, Sindhuja Gopalan, Maragathamani Boothalingam. Accepted to COLING 2025.
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models. Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Nigmatulina, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju. Accepted to ICASSP 2025.
Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering. Pradeep Rangappa, Juan Zuluaga-Gomez, Srikanth Madikeri, Andres Carofilis, Jeena Prakash, Sergio Burdisso, Shashi Kumar, Esaú Villatoro-Tello, Iuliia Nigmatulina, Petr Motlicek, Karthik Pandia and Aravind Ganapathiraju. Accepted to ICASSP 2025.
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward. Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esau Villatoro-Tello, Manjunath K E, Kadri Hacioglu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju and Andreas Stolcke. Accepted to ICASSP 2025 SALMA Workshop (Best Paper Award).
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents. Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su. Accepted to ICLR 2025.
Unifying Streaming and Non-streaming Zipformer-based ASR. Bidisha Sharma, Karthik Pandia Durai, Shankar Venkatesan, Jeena J Prakash, Shashi Kumar, Malolan Chetlur, Andreas Stolcke. Accepted to ACL 2025 Industry Track.
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents. Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini.
Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM. Jeena J Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke. Accepted to Interspeech 2025.
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering. Pradeep Rangappa, Andrés Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esaú Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke. Accepted to Interspeech 2025.
Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering. Andrés Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esaú Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke. Accepted to Interspeech 2025.
Unifying Global and Near-Context Biasing in a Single Trie Pass. Iuliia Thorbecke, Esaú Villatoro-Tello, Juan Pablo Zuluaga, Shashi Kumar, Sergio Burdisso, Pradeep Rangappa, Andrés Carofilis, Srikanth Madikeri, Petr Motlicek, Karthik Pandia, Kadri Hacioglu, and Andreas Stolcke. Accepted to Text, Speech and Dialogue (TSD) conference 2025.
SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling. Kadri Hacioglu, Manjunath K E, Andreas Stolcke. Accepted to EMNLP 2025 Industry Track.
TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation. Shashi Kumar et al. Accepted to IEEE ASRU Workshop 2025.
When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents. Mrinal Rawat, Arkajyoti Chakraborty, Neha Gupta, Roberto Pieraccini. (arXiv preprint)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su. Accepted to TMLR.

2024

Probability-Aware Word-Confusion-Network-To-Text Alignment Approach for Intent Classification. Esaú Villatoro-Tello, Srikanth Madikeri, Bidisha Sharma, Driss Khalil, Shashi Kumar, Iuliia Nigmatulina, Petr Motlicek and Aravind Ganapathiraju. Accepted to ICASSP 2024.
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers. Shashi Kumar, Srikanth Madikeri, Iuliia Nigmatulina, Esaú Villatoro-Tello, Petr Motlicek, Karthik Pandia, S. Pavankumar Dubagunta and Aravind Ganapathiraju. Accepted to ICASSP 2024.
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models. Ambuje Gupta, Mrinal Rawat, Andreas Stolcke and Roberto Pieraccini. Accepted to AJCAI 2024.
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR. Shashi Kumar et al. Accepted to EMNLP 2024.
Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output. Hithesh Sankararaman, M. Nasheed Yasin, Tanner Sorensen, Alessandro di Bari and Andreas Stolcke. Accepted to EMNLP 2024.

2023

Controllable Discovery of Intents: Incremental Deep Clustering Using Semi-Supervised Contrastive Learning. Mrinal Rawat, Hithesh Sankararaman, Victor Barres. Accepted in IJCNLP-AACL 2023
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition. Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju. Accepted in InterSpeech 2023
Implementing contextual biasing in GPU decoder for online ASR. Iuliia Nigmatulina, Srikanth Madikeri, Esau Villatoro-Tello, Petr Motlicek, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju. Accepted in InterSpeech 2023
Towards Learning Emotion Information from Short Segments of Speech. Tilak Purohit, Sarthak Yadav, Bogdan Vlasenko, S. Pavankumar Dubagunta, Mathew Magimai. Accepted in ICASSP 2023
Effectiveness of text, acoustic, and lattice-based representations in spoken language understanding tasks. Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V Ivanov, Aravind Ganapathiraju. Accepted in ICASSP 2023
Pre-annotation Based Approach for Development of a Sanskrit Named Entity Recognition Dataset. Sarkar Sujoy, Amrith Krishna, Pawan Goyal. Accepted in 18th World Sanskrit Conference, Jan. 2023.
Neural Approaches for Data Driven Dependency Parsing in Sanskrit. Amrith Krishna, Ashim Gupta, Deepak Garasangi, Jeevnesh Sandhan, Pavankumar Satuluri, Pawan Goyal. Accepted in World Sanskrit Conference, Jan. 2023
Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems. Ashim Gupta, Amrith Krishna. Proceedings of the 8th Workshop on Representation Learning for NLP RepL4NLP 2023 at ACL 2023.

2022

Linguistically Informed Post-processing for ASR Error correction in Sanskrit. Rishabh Kumar, Devaraja Adiga, Rishav Ranjan, Amrith Krishna, Ganesh Ramakrishnan, Pawan Goyal, Preethi Jyothi.
Does Meta-learning Help mBERT for Few-shot Question Generation in a Cross-lingual Transfer Setting for Indic Languages? Aniruddha Roy, Rupak Kumar Thakur, Isha Sharma, Ashim Gupta, Amrith Krishna, Sudeshna Sarkar, Pawan Goyal. Accepted in COLING 2022
A Benchmark and Dataset for Post-OCR text correction in Sanskrit. Ayush Maheshwari, Nikhil Singh, Amrith Krishna, Ganesh Ramakrishnan. Accepted in EMNLP 2022
Real-time Caller Intent Detection In Human-Human Customer Support Spoken Conversations. Mrinal Rawat, Victor Barres. Accepted in IJCAI 22
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings. Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek, Aravind Ganapathiraju, Alexei V Ivanov. Accepted in Sigir 2022