Important Dates
Paper submission due | |
Notification of acceptance | December 6 (Fri), 2024 |
Camera-ready due | December 12 (Thu), 2024 |
Workshop | January 20 (Mon), 2025 co-located with COLING 2025 |
* These dates are approximate dates based on COLING 2025 and are subject to changes. |
Programme
Monday, January 20, 2025
08:45 - 9:00 Opening Remarks
09:00 - 10:00 Keynote Speech
- Title: TBA
- Speaker: Jose Camacho-Collados
10:00 - 10:30 Oral Session 1: Language Modelling
- Session Chair: TBA
- 10:00 - 10:15: Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
Guokan Shang, Hadi Abdine, Yousef Khoubrane, Amr Mohamed, Yassine ABBAHADDOU, Sofiane Ennadir, Imane Momayiz, Xuguang Ren, Eric Moulines, Preslav Nakov, Michalis Vazirgiannis and Eric Xing - 10:15 - 10:30: Empowering Persian LLMs for Instruction Following: A Novel Dataset and Training Approach
Hojjat Mokhtarabadi, Ziba Zamani, Abbas Maazallahi and Mohammad Hossein Manshaei
- 10:00 - 10:15: Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
10:30 - 11:00 Coffee Break
11:00 - 12:00 Poster Session 1: Language Model Applications/ Sentiment Analysis/ Machine Translation
- Session Chair: TBA
- BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment Analysis
Sadia Alam, Md Farhan Ishmam, Navid Hasin Alvee, Md Shahnewaz Siddique, Md Azam Hossain and Abu Raihan Mostofa Kamal - Using Language Models for Assessment of Users' Satisfaction with Their Partner in Persian
Zahra Habibzadeh and Masoud Asadpour - Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing
Atharva Mutsaddi and Aditya Prashant Choudhary - Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa
Sani Abdullahi Sani, Shamsuddeen Hassan Muhammad and Devon Jarvis - Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language
Anastasia Zhukova, Christian E. Matt and Bela Gipp - Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia
Lance Calvin Lim Gamboa and Mark Lee - Does Machine Translation Impact Offensive Language Identification? The Case of Indo-Aryan Languages
Alphaeus Dmonte, Shrey Satapara, Rehab Alsudais, Tharindu Ranasinghe and Marcos Zampieri - Exploiting Word Sense Disambiguation in Large Language Models for Machine Translation
Van-Hien Tran, Raj Dabre, Hour Kaing, Haiyue Song, Hideki Tanaka and Masao Utiyama - Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek
Maciej Rapacz and Aleksander Smywiński-Pohl - Language verY Rare for All
Ibrahim Merad, Amos Wolf, Ziad Mazzawi and Yannick Léo - Improving LLM Abilities in Idiomatic Translation
Sundesh Donthi, Maximilian Spencer, Om B. Patel, Joon Young Doh, Eid Rodan, Kevin Zhu and Sean O'Brien
- BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment Analysis
12:00 - 13:00 Oral Session 2: Language Model Applications
- Session Chair: TBA
- 12:00 - 12:15: A Comparative Study of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters
Yifan Liu, Gelila Tilahun, Xinxiang Gao, Qianfeng Wen and Michael Gervers - 12:15 - 12:30: From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords
Kamyar Zeinalipour, Moahmmad Saad, Marco Maggini and Marco Gori - 12:30 - 12:45: Bridging Literacy Gaps in African Informal Business Management with Low-Resource Conversational Agents
Maimouna Ouattara, Abdoul Kader Kaboré, Jacques Klein and Tegawendé F. Bissyandé - 12:45 - 13:00: Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias
Jayanta Sadhu, Maneesha Rani Saha and Rifat Shahriyar
- 12:00 - 12:15: A Comparative Study of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters
13:00 - 14:00 Lunch Break
14:00 - 15:00 Poster Session 2 – Language Modelling/ Linguistic Insights, Parsing and Semantic Tagging with Language Models
- Session Chair: TBA
- Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Jan Christian Blaise Cruz - Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models
Sina Bagheri Nezhad, Ameeta Agrawal and Rhitabrat Pokharel - BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos, Charl Hendriks, Hishaam Mahomed and Francois Meyer - Mapping Cross-Lingual Sentence Representations for Low-Resource Language Pairs Using Pre-trained Language Models
Tsegaye Misikir Tashu and Andreea Ioana Tudor - How to Age BERT Well: Continuous Training for Historical Language Adaptation
Anika Harju and Rob van der Goot - Exploiting Task Reversibility of DRS Parsing and Generation: Challenges and Insights from a Multi-lingual Perspective
Muhammad Saad Amin, Luca Anselma and Alessandro Mazzei - BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer and Sophie Fellenz - When Every Token Counts: Optimal Segmentation for Low-Resource Language Models
Vikrant Dewangan, Bharath Raj S, Garvit Suri and Raghav Sonavane - IsiZulu Noun Classification Based on Replicating the Ensemble Approach for Runyankore
Zola Mahlaza, C. Maria Keet, Imaan Sayed and Alexander Van Der Leek - Recent Advancements and Challenges of Turkic Central Asian Language Processing
Yana Veitsman and Mareike Hartmann
- Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
15:00 - 15:30 Oral Session 3: Language Models for Question Answering
- Session Chair: TBA
- 15:00 - 15:15: CaLQuest.PT: Towards the Collection and Evaluation of Natural Causal Ladder Questions in Portuguese for AI Agents
Uriel Anderson Lasheras and Vladia Pinheiro - 15:15 - 15:30: PersianMCQ-Instruct: A Comprehensive Resource for Generating Multiple-Choice Questions in Persian
Kamyar Zeinalipour, Neda Jamshidi, Fahimeh Akbari, Marco Maggini, Monica Bianchini and Marco Gori
- 15:00 - 15:15: CaLQuest.PT: Towards the Collection and Evaluation of Natural Causal Ladder Questions in Portuguese for AI Agents
15:30 - 16:00 Coffee Break
16:00 - 17:00 Oral Session 4: Language Modelling and Evaluation
- Session Chair: TBA
- 16:00 - 16:15: Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss
Galim Turumtaev - 16:15 - 16:30: Towards Inclusive Arabic LLMs: A Culturally Aligned Benchmark in Arabic Large Language Model Evaluation
Omer Nacar, Serry Taiseer Sibaee, Samar Ahmed, Safa Ben Atitallah, Adel Ammar, Yasser Alhabashi, Abdulrahman S. Al-Batati, Arwa Alsehibani, Nour Qandos, Omar Elshehy, Mohamed Abdelkader and Anis Koubaa - 16:30 - 16:45: Controlled Evaluation of Syntactic Knowledge in Multilingual Language Models
Daria Kryvosheieva and Roger Levy - 16:45 - 17:00: Evaluating Large Language Models for In-Context Learning of Linguistic Patterns in Unseen Low Resource Languages
Hongpu Zhu, Yuqi Liang, Wenjing Xu and Hongzhi Xu
- 16:00 - 16:15: Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss
17:00 - 17:30 Oral Session 5: Machine Translation with Language Models
- Session Chair: TBA
- 17:00 - 17:15: Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs
Yuqian Dai, Chun Fai Chan, Ying Ki Wong and Tsz Ho Pun - 17:15 - 17:30: When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan and Shenbin Qian
- 17:00 - 17:15: Next-Level Cantonese-to-Mandarin Translation: Fine-Tuning and Post-Processing with LLMs