Abstract:Lexicon is a significant part of an Automatic Speech Recognition(ASR) system. Small lexicon size will result in high Out Of Vocabulary(OOV) rates and degrade the performance of speech recognition system. A novel method is proposed to automatically expand the lexicon, which recovers OOVs from the pronunciations without large text corpus to discover new words. Firstly, the complement forms of Finite State Transducer(FST) expression of the lexicon and P2G conversion are adopted to get new word-pronunciation pairs. Then,a two-stage verification strategy, namely pronunciations verification and words verification, is utilized to filter the errors. Finally, the learned new words are incorporated into the Language Model(LM) by adopting linear interpolation of the base LM and a new LM trained with the crawled texts. The proposed method is tested through Continuous Speech Recognition(CSR) task of English and Czech. There is significant reduction of OOV rates after the lexicon expanding. The WERs have been improved with a relative gain of about 9% for English and 2.3% for Czech over the baseline systems,and the Actual Term-Weight Value(ATWV) improves by 9.7% for English and by 10.0% for Czech.