Abstract
We address the challenge of syntactic parsing for Urdu, a morphologically rich language, and present state-of-the-art results for both constituency and dependency parsing. This paper offers four major contributions: 1) the conversion of the CLE-UTB phrase structure treebank into a dependency treebank by developing language-specific head-word and phrase-to-dependency label mapping rules; 2) a novel sequence labeling scheme that transforms the parsing task into a unified representation; 3) the training of contextualized word representations on a large 220 million tokens Urdu corpus collected from the web; and 4) development of parsing framework using two learning paradigms, single-task and multi-task learning. Several post-processing rules are applied to improve the quality of the automatically converted dependency structure treebank. The proposed sequence labeling scheme enables the use of a shared architecture that learns the syntactic structures from both grammatical structures simultaneously and hence improves generalization. Experiments show that the multi-task learning setup significantly enhances parsing performance, achieving an F1 score of 91.39 for constituency parsing (an improvement of 3.29 points) and a labeled attachment score of 85.69 for dependency parsing (an improvement of 1.49 points). These results demonstrate that learning cross-task representations provides measurable benefits and advances the state of syntactic parsing for Urdu.
| Original language | English |
|---|---|
| Article number | e0332580 |
| Pages (from-to) | e0332580 |
| Journal | PLoS ONE |
| Volume | 20 |
| Issue number | 9 |
| DOIs | |
| Publication status | Published - Sept 2025 |
| MoE publication type | A1 Journal article-refereed |
Funding
This research work was funded by Umm Al-Qura University, Saudi Arabia under grant number: 25UQU4320430GSSR02.
Keywords
- Language
- Humans
- Learning
- Linguistics/methods
Fingerprint
Dive into the research topics of 'Multi-task learning by using contextualized word representations for syntactic parsing of a morphologically rich language'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver