Skip to main navigation Skip to search Skip to main content

Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language

  • Toqeer Ehsan*
  • , Javairia Khalid
  • , Saadia Ambreen
  • , Asad Mustafa
  • , Sarmad Hussain
  • *Corresponding author for this work
  • University of Gujrat

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-outside-begin) labels. Comprehensive guidelines have been developed for four phrases which are noun phrase (NP), verb phrase (VP), post-positional phrase (PP) and prepositional phrase (PRP). The annotated text has been evaluated for completeness and correctness automatically. Inter-annotator agreement has been calculated for ten percent reference corpus. A neural chunker has been developed and trained on the annotated corpus. The chunker is based on long–short- term memory networks. Transfer learning has been employed to improve the chunking results. For that purpose, context-free (Word2Vec) and contextualized (ELMo) word representations have been trained. The chunker performed with an f-score of 94.9 when trained by using third layer of ELMo embeddings.
Original languageEnglish
Pages (from-to)9781-9799
Number of pages19
JournalArabian Journal for Science and Engineering
Volume47
DOIs
Publication statusPublished - Aug 2022
MoE publication typeA1 Journal article-refereed

Keywords

  • BiLSTM
  • Chunking
  • ELMo
  • Shallow Parsing
  • Urdu

Fingerprint

Dive into the research topics of 'Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language'. Together they form a unique fingerprint.

Cite this