基于条件随机场的中文分词算法改进
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

伦理声明:



Improvement on CRFs-based Chinese word segmentation algorithm
Author:
Ethical statement:

Affiliation:

Funding:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    在中文分词领域,基于字标注的方法得到广泛应用,通过字标注分词问题可转换为序列标注问题,现在分词效果最好的是基于条件随机场(CRFs)的标注模型。作战命令的分词是进行作战指令自动生成的基础,在将CRFs模型应用到作战命令分词时,时间和空间复杂度非常高。为提高效率,对模型进行分析,根据特征选择算法选取特征子集,有效降低分词的时间与空间开销。利用CRFs置信度对分词结果进行后处理,进一步提高分词精确度。实验结果表明,特征选择算法及分词后处理方法可提高中文分词识别性能。

    Abstract:

    In Chinese word segmentation fields,the most widely used method is character-based tagging,which reformulates segmentation task to a sequence tagging task. The Conditional Random Fields (CRFs) tagger is the best tagger which can achieve state-of-the-art performance. The segmentation of the command orders is one of the basics of the auto-generation of command orders. Yet when using the model for command orders segmentation,problems of bad time and space efficiency are encountered. The model is analyzed and feature subsets are selected by using the feature selection algorithm,which cut the overhead of time and space effectively and improve the efficiency of the model. Then a novel post-process using CRFs confidence is presented to further improve performance. By combining the feature selection method and the confidence-based post-process,great improvement is achieved and the experimental results are satisfactory.

    参考文献
    相似文献
    引证文献
引用本文

顾佼佼,杨志宏,姜文志,胡文萱.基于条件随机场的中文分词算法改进[J].太赫兹科学与电子信息学报,2012,10(2):184~187

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
历史
  • 收稿日期:2011-05-24
  • 最后修改日期:2011-08-23
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
关闭