Abstract
Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-outside-begin) labels. Comprehensive guidelines have been developed for four phrases which are noun phrase (NP), verb phrase (VP), post-positional phrase (PP) and prepositional phrase (PRP). The annotated text has been evaluated for completeness and correctness automatically. Inter-annotator agreement has been calculated for ten percent reference corpus. A neural chunker has been developed and trained on the annotated corpus. The chunker is based on long–short- term memory networks. Transfer learning has been employed to improve the chunking results. For that purpose, context-free (Word2Vec) and contextualized (ELMo) word representations have been trained. The chunker performed with an f-score of 94.9 when trained by using third layer of ELMo embeddings.
| Original language | English |
|---|---|
| Pages (from-to) | 9781-9799 |
| Number of pages | 19 |
| Journal | Arabian Journal for Science and Engineering |
| Volume | 47 |
| DOIs | |
| Publication status | Published - Aug 2022 |
| MoE publication type | A1 Journal article-refereed |
Keywords
- BiLSTM
- Chunking
- ELMo
- Shallow Parsing
- Urdu
Fingerprint
Dive into the research topics of 'Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver