mozilla 开源 TensorFlow 实现的 Baidu 的 DeepSpeech 架构


百度论文地址:Scaling up end-to-end speech recognition


pip install deepspeech


deepspeech output_model.pb my_audio_file.wav alphabet.txt

说明文档:Welcome to DeepSpeech’s documentation!


Project DeepSpeech is an open source Speech-To-Text engine. It uses a model trained by machine learning techniques, based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow project to make the implementation easier.

Pre-built binaries that can be used for performing inference with a trained model can be installed with pip. Proper setup using virtual environment is recommended and you can find that documented below.

Once installed you can then use the deepspeech binary to do speech-to-text on an audio file:

Alternatively, quicker inference (The realtime factor on a GeForce GTX 1070 is about 0.44.) can be performed using a supported NVIDIA GPU on Linux. (See the release notes to find which GPU’s are supported.) This is done by instead installing the GPU specific package:

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).

