Design and Implementation of a PDF to Audio System

Design and Implementation of a PDF to Audio System

Chapter One

AIMS AND OBJECTIVES OF THE STUDY

This research aims to design and implement a PDF-to-Audio system that enhances accessibility and facilitates easy text-to-voice conversion of documents in PDF format.

The following are the objectives of the study:

Develop a system that will convert PDF text to audio for easy assimilation of the document.
A system to easily detect a PDF file and convert it to audio.
To design a system that will assist people with reading disabilities in easily converting PDF text to audio files.
To design and implement a system that will assist students’ reading comprehension skills.

CHAPTER TWO

LITERATURE REVIEW

OVERVIEW OF PDF

The Portable Document Format (PDF) is a file format developed by Adobe in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF was standardized as ISO 32000 in 2008, and no longer requires any royalties for its implementation.

PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, rich media (including video content) and three dimensional objects using U3D or PRC, and various other data formats. The PDF specification also provides for encryption and digital signatures, file attachments and metadata to enable workflows requiring these features.

PDF TEXT TO SPEECH

A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. Let us try to be clear. There is a fundamental difference between the system we are about to discuss here and any other talking machine (as a cassette-player for example) in the sense that we are interested in the automatic production of new sentences. This definition still needs some refinements. Systems that simply concatenate isolated words or parts of sentences, denoted as Voice Response Systems, are only applicable when a limited vocabulary is required (typically a few one hundreds of words), and when the sentences to be pronounced respect a very restricted structure, as is the case for the announcement of arrivals in train stations for instance. In the context of TTS synthesis, it is impossible (and luckily useless) to record and store all the words of the language. It is thus more suitable to define Text-To-Speech as the automatic production of speech, through a grapheme-to-phoneme transcription of the sentences to utter.

At first sight, this task does not look too hard to perform. After all, is not the human being potentially able to correctly pronounce an unknown sentence, even from his childhood ? We all have, mainly unconsciously, a deep knowledge of the reading rules of our mother tongue. They were transmitted to us, in a simplified form, at primary school, and we improved them year after year.

CHAPTER THREE

METHODOLOGY AND ANALYSIS OF THE EXISTING

GENERAL DESCRIPTION OF THE EXISTING SYSTEM

In today’s world where most information is shared digitally, visually impaired persons always require their reading glasses to have access to this information, in a situation where they somehow forgot their reading glasses, they won’t be to have access this information. But with text to speech system digital information can be read out to a visually impaired person.

FACT FINDING METHODS USED

There are two main sources of data collection in carrying out this study, information was basically obtained from the two sources which are:

(a) Primary source and

(b) Secondary source

Primary Source

Primary source refers to the sources of collecting original data in which the researcher makes use of empirical approach such as personal interview, questionnaires or observation.

In my research, I used a method of observation were I was attentive to how contact are being operated and saved using a manual method.

Secondary Source

The need of the secondary sources of data for this kind of project cannot be over emphasized. The secondary data were obtained by me from the library source and most of the information from the library research has been covered in my literature review in the previous chapter of this project.

CHAPTER FOUR

DESIGN AND IMPLEMENTATION OF THE NEW SYSTEM

DESIGN STANDARD

OUTPUT SPECIFICATION AND DESIGN

The output design was based on the inputs. The report generated gives a meaningful report. These outputs can be generated as softcopy or printed in hard copy.

CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATIONS

SUMMARY

In summary, this Academic Work project has done a great deal of giving a broad knowledge of what Text-To-Speech system is all about and how it can be operated.

CONCLUSION

From this Academic Work, I have been able to show the application of Text-to-Speech and how text can be synthesized for the visually impaired and children to read easily.

RECOMMENDATION

I hereby recommend this Academic work to be used by staff and management of ……. and indeed any other Institution with similar structure and organizational framework for the following reasons:

The academic work has been able to solve the problem related to easy access of contact information.
It has aided the visually impaired to read without stress.
It has aided Children to learn how to read and pronounce English words.

References

[Abrantes et al. 91] A.J. ABRANTES, J.S. MARQUES, I.M. TRANSCOSO, “Hybrid Sinusoïdal Modeling of Speech without Voicing Decision”, EUROSPEECH 91, pp. 231-234.
[Allen 85] J. ALLEN, “A Perspective on Man-Machine Communication by Speech”, Proceedings of the IEEE, vol. 73, n11, November 1985, pp. 1541-1550.
[Allen et al. 87] J. ALLEN, S. HUNNICUT, D. KLATT, From Text To Speech, The MITTALK System, Cambridge University Press, 1987, 213 pp.
[Bachenko & Fitzpatrick 90] J. BACHENKO, E. Fitzpatrick, “Acomputational grammar of discourse-neutral prosodic phrasing in English”, Computational Linguistics, n16, September 1990, pp. 155-167.
[Belrhali et al. 94] R. BELRHALI, V. AUBERGE, L.J. BOE, “From lexicon to rules : towards a descriptive method of French text-to-phonetics transcription”, Proc. ICSLP 92, Alberta, pp. 1183-1186.
[Benello et al. 88] J. BENELLO, A.W. MACKIE, J.A. ANDERSON, “Syntactic category disambiguation with neural networks”, Computer Speech and Language, 1989, n3, pp. 203-217.
[Carlson et al. 82] R. CARLSON, B. GRANSTRÖM, S. HUNNICUT, “A multi-language Text-To-Speech module”, ICASSP 82, Paris, vol. 3, pp. 1604-1607.
[Coker 85] C.H. COKER, “A Dictionary-Intensive Letter-to-Sound Program”, J. Ac. Soc. Am., suppl. 1, n78, 1985, S7.

Other Topics