Better, Faster, Stronger Sequence Tagging Constituent Parsers

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

OA-Better, Faster, Stronger Sequence Tagging Constituent Parsers
Forlagets udgivne version, 484 KB, PDF-dokument

David Vilares
Mostafa Abdou
Søgaard, Anders

Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.

Originalsprog	Engelsk
Titel	Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Forlag	Association for Computational Linguistics
Publikationsdato	2019
Sider	3372-3383
DOI	https://doi.org/10.18653/v1/N19-1341
Status	Udgivet - 2019
Begivenhed	2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - NAACL-HLT 2019 - Minneapolis, USA Varighed: 3 jun. 2019 → 7 jun. 2019

Konference

Konference	2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - NAACL-HLT 2019
Land	USA
By	Minneapolis
Periode	03/06/2019 → 07/06/2019

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 240419283

Datalogisk Institut