Paper abstract

A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis

Xinhao Wang - Peking University, China
Jiazhong Nie - Peking University, China
Dingsheng Luo - Peking University, China
Xihong Wu - Peking University, China

Session: Learning and Mining Text and NLP
Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_35

This paper introduces an approach which jointly performs a cascade of segmentation and labeling subtasks for Chinese lexical analysis, including word segmentation, named entity recognition and part-of-speech tagging. Unlike the traditional pipeline manner, the cascaded subtasks are conducted in a single step simultaneously, therefore error propagation could be avoided and the information could be shared among multi-level subtasks. In this approach, Weighted Finite State Transducers (WFSTs) are adopted. Within the unified framework of WFSTs, the models for each subtask are represented and then combined into a single one. Thereby, through one-pass decoding the joint optimal outputs for multi-level processes will be reached. The experimental results show the effectiveness of the presented joint processing approach, which significantly outperforms the traditional method in pipeline style.