Important Dates
| Paper submission due | January 6 (Tue), 2026 |
| Notification of acceptance | January 28 (Wed), 2026 |
| Camera-ready due | February 3 (Tue), 2026 |
| Workshop | March 29, 2026 co-located with EACL 2026 |
| * These dates are approximate dates based on EACL 2026 and are subject to changes. | |
Programme
09:30 - 10:30 Poster Session I: Linguistic Analysis and Language Structure
- Session Chairs: TBA
- When Multilingual Evaluation Assumptions Fail: Tokenization Effects Across Scripts
Manodnya K H and Luc De Nardi - Learning from Scarcity: Building and Benchmarking Speech Technology for Sukuma
Macton Mgonzo, Kezia Oketch, Naome A Etori, Winnie Mang'eni, Elizabeth Fabian Nyaki and Michael Samwel Mollel - Evaluating Large Language Models on Lithuanian Grammatical Cases
Urtė Jakubauskaitė and Raquel G. Alhama - Tokenization and Morphological Fidelity in Uralic NLP: A Cross-Lingual Evaluation
Nuo Xu and Ahrii Kim - Out-Of-Tune rather than Fine-Tuned: How Pre-training, Fine-tuning and Tokenization Affect Semantic Similarity in a Historical, Non-Standardized Domain
Stella Verkijk and Piek Vossen - LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers
Christophe Friezas Gonçalves, Salima Lamsiyah and Christoph Schommer
- When Multilingual Evaluation Assumptions Fail: Tokenization Effects Across Scripts
10:30 - 11:00 Coffee Break
11:00 - 12:00 Poster Session II: Language Modelling and NLP Applications
- Session Chairs: TBA
- To make someone do something: mining alert-style directives in Bulgarian social media for low-resource language modelling
Ruslana Margova and Stanislav Penkov - Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Vlad-Andrei Negoiță, Mihai Masala and Traian Rebedea - LLM-as-a-Judge for Low-Resource Languages: Adapting Ragas and Comparative Ranking for Romanian
Claudiu Creanga and Liviu P Dinu - When LLMs Annotate: Reliability Challenges in Low-Resource NLI
Solmaz Panahi, John Kelleher and Vasudevan Nedumpozhimana - Cross-Lingual Emotion Recognition in Balinese Text using Multilingual-LLMs under Peer-Collaborations Settings
Putu Kussa Laksana Utama, Tsegaye Misikir Tashu and Jilles Steeve Dibangoye - "We Are (Language) Family": Adapting Transformer models to related minority languages with linguistic data
Miguel López-Otal and Jorge Gracia - Tracking the evolution of LLM capabilities for Belarusian with OpenAI Evals
Vladislav Poritski, Oksana Volchek, Maksim Aparovich, Volha Harytskaya and Pavel Smrz - Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition
Van-Hien Tran, Huy Hien Vu, Hideki Tanaka and Masao Utiyama - KyrText: A Multi-Domain Large-Scale Corpus for Kyrgyz Language
Tilek Chubakov - Pretraining and Benchmarking Modern Encoders for Latvian
Arturs Znotins - UrHiOdSynth: A Multilingual Synthetic Corpus for Speech-to-Speech Translation in Low-Resource Indic Languages
Jamaluddin, Subhankar Panda, Aditya Narendra, Kamanksha Prasad Dubey and Mohammad Nadeem - Parameter-Efficient Quality Estimation via Frozen Recursive Models
Umar Abubacar, Roman Bauer and Diptesh Kanojia - Neural Machine Translation for French–Mooré: Adapting Large Language Models to Low-Resource Languages
Walker Stanislas Rocksane Compaore, Maimouna Ouattara, Rodrique Kafando, Tegawendé F. Bissyandé, Abdoul Kader Kabore and Aminata Sabane - Contributing to Speech-to-Speech Translation for African Low-Resource Languages: Study of French-Mooré Pair
Soutongnoma Alex Fayçal Ouedraogo, Maimouna Ouattara, Rodrique Kafando, Abdoul Kader Kabore, Aminata Sabane and Tegawendé F. Bissyandé
- To make someone do something: mining alert-style directives in Bulgarian social media for low-resource language modelling
09:30 - 12:00 Virtual Poster Session
- Session Chairs: TBA
- NE-BERT: A Multilingual Language Model for Nine Northeast Indian Languages
Badal Nyalang - Competence Collapse in Code-Mixed Generation: Spectral Evidence and Mechanistic Recovery via Cross-Lingual Activation Steering
Tanushree Ravindra Pratap Yadav - Do Tokenizers Fail on Informal Hindi Expressions? Evidence from Static, Downstream, and Robustness Analyses
Manikandan Ravikiran, Tanmay Tiwari, Vibhu Gupta, Rakesh Prakash, Rohit Saluja and Shayan Mohanty - Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language
Prathamesh Devadiga and Paras Chopra - BanglaLlama: LLaMA for Bangla Language
Abdullah Khan Zehady, Shubhashis Roy Dipta, Naymul Islam, Safi Al Mamun and Santu Karmaker - Evaluating Retrieval-Augmented Generation for Medication Question Answering on Nigerian Drug Labels in Yorùbá
Zainab Tairu and Aramide Adebesin - Quantifying Cross-Lingual Interference: Algorithmic Standardization of Kamtapuri in Large Language Models
Roumak Das - SinhaLegal: A Benchmark Corpus for Information Extraction and Analysis in Sinhala Legislative Texts
Minduli Lasandi and Nevidu Jayatilleke - BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali
Jakir Hasan, Shrestha Datta, Md Saiful Islam, Shubhashis Roy Dipta and Ameya Debnath - Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models
Joy Olusanya - QARI: Neural Architecture for Urdu Extractive Machine Reading Comprehension
Samreen Kazi and Shakeel Ahmed Khoja - Anchoring the Judge: Curriculum-Based Adaptation and Reference-Anchored MQM for LLM-Based Machine Translation of an Unseen Low-Resource Language - A Case of Nupe
Umar Baba Umar, Sulaimon Adebayo Bashir and Mohammed Danlami Abdulmalik - "So, How Much Do LLMs Hallucinate on Low-Resource Languages?" A Quantitative and Qualitative Analysis
Kushal Trivedi, Murtuza Shaikh and Sriyansh Sharma - A Comprehensive Evaluation of Chain-of-Thought Faithfulness in Persian Classification Tasks
Shakib Yazdani, Cristina España-Bonet, Eleftherios Avramidis, Yasser Hamidullah and Josef van Genabith - Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac
Chahan Vidal-Gorène, Bastien Kindt and Florian Cafiero - Large Language Models for Mental Health: A Multilingual Evaluation
Nishat Raihan, Sadiya Sayara Chowdhury Puspo, Ana-Maria Bucur, Stevie Chancellor and Marcos Zampieri - Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models
Mitar Perovic and Teodora Mihajlov - Less is More: Adapting Text Embeddings for Low-Resource Languages with Small Scale Noisy Synthetic Data
Zaruhi Navasardyan, Bagratuni Minsayan, Spartak Bughdaryan and Hrant Davtyan - Beyond Many-Shot Translation: Scaling In‑Context Demonstrations For Low‑Resource Machine Translation
Luis Frentzen Salim, Esteban Carlin, Alexandre Morinvil, Xi Ai and Lun-Wei Ku - Escaping the Probability Trap: Mitigating Semantic Drift in Cantonese-Mandarin Translation
Yuzhi Liang and Fangqi Chen - How multilingual are multilingual LLMs? A case study in Northern Sámi-Finnish Translation
Jonne Sälevä and Constantine Lignos - Hebrew Diacritics Restoration using Visual Representation
Yair Elboher and Yuval Pinter - MTQE.en-he: Machine Translation Quality Estimation for English-Hebrew
Andy Rosenbaum, Assaf Siani and Ilan Kernerman - Tokenization Cost, Retention, and Orthography Robustness for Ladin and Italian Varieties
Alessio Staffini - BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization
Ahmed Rafid, Rumman Adib, Fariya Ahmed, Ajwad Abrar and Mohammed Saidul Islam - Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios
Namrata Bhalchandra Patil Gurav, Akashdeep Ranu, Archchana Sindhujan and Diptesh Kanojia - TeluguEval: A Comprehensive Benchmark for Evaluating LLM Capabilities in Telugu
Revanth Kumar Gundam and Radhika Mamidi - MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
Sumit Yadav, Raju Kumar Yadav, Utsav Maskey, Gautam Siddharth Kashyap, Ganesh Gautam and Usman Naseem - The Indonesian Religiolect Corpus: Data Curation for Muslim, Protestant, and Catholic Language Varieties
Dan Sachs
- NE-BERT: A Multilingual Language Model for Nine Northeast Indian Languages
12:00 - 14:00 Lunch Break
14:00 - 15:00 Keynote Speech - Beyond the standard: NLP for low-resource language varieties
- Speaker: Prof. Barbara Plank
15:00 - 15:30 Oral Session I: Syntactic Analysis in the Era of Language Models
- Session Chairs: TBA
- 15:00 - 15:15: Cross-Lingual and Cross-Domain Transfer Learning for POS Tagging in Historical Germanic Low-Resource Languages
Irene Miani, Sara Stymne and Gregory R. Darwin - 15:15 - 15:30: Targeted Syntactic Evaluation of Language Models on Georgian Case Alignment
Daniel Gallagher and Gerhard Heyer
- 15:00 - 15:15: Cross-Lingual and Cross-Domain Transfer Learning for POS Tagging in Historical Germanic Low-Resource Languages
15:30 - 16:00 Coffee Break
16:00 - 17:00 Oral Session II: Language Modelling and Applications
- Session Chairs: TBA
- 16:00 - 16:15: Qomhrá: A Bilingual Irish and English Large Language Model
Joseph McInerney, Khanh-Tung Tran, Liam Lonergan, Neasa Ní Chiaráin, Ailbhe Ni Chasaide and Barry Devereux - 16:15 - 16:30: Bootstrapping Embeddings for Low Resource Languages
Merve Basoz, Andrew Horne and Mattia Opper - 16:30 - 16:45: Enabling Structured Reasoning in Sindhi with Culturally Grounded Instruction Tuning
Mehak Mehak, Kamyar Zeinalipour, Pireh Soomro, Cristiano Chesi, Marco Gori and Marco Maggini - 16:45 - 17:00: Grammatical Error Correction for Low-Resource Languages: The Case of Zarma
Mamadou K. Keita, Marcos Zampieri, Christopher M Homan, Adwoa Asantewaa Bremang, Dennis Asamoah Owusu and Huy Le
- 16:00 - 16:15: Qomhrá: A Bilingual Irish and English Large Language Model