Natural Questions in Icelandic

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

We present the first extractive question answering (QA) dataset for Icelandic, Natural Questions in Icelandic (NQiI). Developing such datasets is important for the development and evaluation of Icelandic QA systems. It also aids in the development of QA methods that need to work for a wide range of morphologically and grammatically different languages in a multilingual setting. The dataset was created by asking contributors to come up with questions they would like to know the answer to. Later, they were tasked with finding answers to each others questions following a previously published methodology. The questions are Natural in the sense that they are real questions posed out of interest in knowing the answer. The complete dataset contains 18 thousand labeled entries of which 5,568 are directly suitable for training an extractive QA system for Icelandic. The dataset is a valuable resource for Icelandic which we demonstrate by creating and evaluating a system capable of extractive QA in Icelandic.

OriginalsprogEngelsk
Titel2022 Language Resources and Evaluation Conference, LREC 2022
RedaktørerNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis
Antal sider9
ForlagEuropean Language Resources Association (ELRA)
Publikationsdato2022
Sider4488-4496
ISBN (Elektronisk)9791095546726
StatusUdgivet - 2022
Eksternt udgivetJa
Begivenhed13th International Conference on Language Resources and Evaluation Conference, LREC 2022 - Marseille, Frankrig
Varighed: 20 jun. 202225 jun. 2022

Konference

Konference13th International Conference on Language Resources and Evaluation Conference, LREC 2022
LandFrankrig
ByMarseille
Periode20/06/202225/06/2022
Sponsor3M, Emvista, et al., Google, SADILAR, Vocapia

Bibliografisk note

Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.

ID: 371184733