![]() For proof of concept, we build voices in the English language using an audiobook (read speech) in a female voice from LibriVox and a lecture (spontaneous speech) in a male voice from Coursera, using both reference and hypotheses transcriptions, and evaluate them in terms of intelligibility and naturalness with the help of a perceptual listening test on the Blizzard 2013 corpus. In this work, we use a large vocabulary public domain automatic speech recognition (ASR) system to obtain transcripts, followed by confidence measure-based data pruning which together address the five issues with the found data and also ensure the above three points. ![]() Audiobook creator error manual#Earlier works on long audio alignment addressing the first and second issue generally preferred reasonable transcripts and mainly focused on (1) less manual intervention, (2) mispronunciation detection, and (3) segmentation error recovery. But, a few problems are associated with readily using this data such as (1) these audio files are generally long, and audio-transcription alignment is memory intensive (2) precise corresponding transcriptions are unavailable, (3) many times, no transcriptions are available at all (4) the audio may contain dis-fluencies and non-speech noises, since they are not specifically recorded for building synthetic voices and (5) if we obtain automatic transcripts, they will not be error free. These data are rich in prosody and provide a plethora of voices to choose from, and their availability can significantly reduce the overhead of data preparation and help rapid building of synthetic voices. In addition, we can effortlessly record and store audio data such as a read, lecture, or impromptu speech on handheld devices. Today, a large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins, etc. Alignment experiments performed for Catalan and Spanish show the feasibility to obtain accurate alignments that can be used to successfully train a speech recognizer. Finally, the aligned data is split into short audio/text segments and the speech recognizer is trained using Kaldi toolkit. Correspondence between phonemes and graphemes is done through a matrix of approximate sound-tographeme matching. Phoneme sequences are then aligned to the normalized text transcripts through dynamic programming. First, the audio is decoded into a phoneme sequence by an off-theshelf phonetic recognizer in Hungarian. All training data has been harvested online (e.g. In the proposed approach a speech recognizer is trained from scratch by using audio recordings aligned with (sometimes approximate) text transcripts. Most prior work in the area use initial acoustic models, trained on the target or a similar language, to force-align new data and then retrain the models with it. Moreover, for the experiments shown here, we use grapheme-based speech recognizers. We consider that neither proper training databases nor initial acoustic models are available for the target language. In this paper we present our efforts in building a speech recognizer constrained by the availability of very limited resources. Finally, we also show a user interface implementation in the Ipad for synchronized e-book reading while listening to the associated audiobook. Experiments done using 12 five-minute excerpts of 6 different audio-books (read by men and women) yield usable word alignment errors below 120ms for 90% of the words. We propose an audiobook-to-ebook alignment system by applying a Text-to-Speech(TTS)-based text to audio alignment algorithm, and enhance it with a silence filtering algorithm to cope with the difference on reading style between the TTS output and the speakers in the ebook environment. In this paper, we focus on the augmentation of the written text with its associated audiobook, so that users can listen to the book they are (currently) reading. As users are increasingly moving from traditional paper books to e-books, there is an opportunity to reinvent and enhance their reading experience, for example, by leveraging the multimedia capabilities of these devices in order to turn the act of reading into a real multimedia experience. ![]() The e-book industry is starting to flourish due, in part, to the availability of affordable and user-friendly e-book readers. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |