Advertisements

Computer Science Project Topics

Deep Learning Methods for Short Text Analysis in Disease Control.

Deep Learning Methods for Short Text Analysis in Disease Control.

Advertisements

Deep Learning Methods for Short Text Analysis in Disease Control.

CHAPTER ONE

Aim and objectives of theย study

  1. Implement text analytics on short texts from social media based on deep learning
  2. Achieve disease event surveillance by leveraging social
  3. Develop a recommendation system for disease control decision-making.

CHAPTER TWO

LITERATURE REVIEW

ย Introduction

Textย analyticsย isย aย fieldย inย naturalย languageย processing.ย Itย aimsย toย extractย theย semantic,ย syntacticย and contextualย informationย ofย anyย writtenย languageย (Farzindarย &ย Inkpen,ย 2015).ย Theย desiredย contentsย are extractedย fromย aย largeย poolย of dataย forย theย purposeย ofย knowledgeย discovery.ย Accordingย toย Vijayarani, Ilamathi and Nithya (2015), information extraction and retrieval are common processes peculiar to research areas of text mining, web mining, data mining, graph mining, multimedia mining and structural mining. In this chapter, the current methods in NLP will be treated together with their potentialย inย diseaseย controlย throughย socialย mediaย resources.ย Also,ย toย betterย describeย theย effectย ofย this workย andย itsย relevanceย inย existingย naturalย sciencesย problem-interests,ย epidemiologicalย compartments for infection transmission areย discussed.

Text preprocessing: the rudiments of NLP

Written information comes as a continuous connection of letters to form words โ€“ words then form phrases and phrases then form sentences. These chunks of information are further identified as parts of speech and named entities. Before computer programs can make these distinctions, a number of processes are carried out on the raw text. They include the following:

Tokenization: The process used to get discrete words by breaking texts based on punctuation marks and white space occurrence. These words form the vocabulary content of the system.

Stopย wordsย elimination: These words are not the major language terms in documents, they usually comprise determiners and conjunctions. Stop words can be removed using a compiled list of words that add no extra purpose other than grammatical completeness to a document. Advanced approaches apply Zipfโ€™s law based on these criteria: words that appear only once in a document, words that appear least in the pool of documents (inverse document frequency), and words whose frequency of appearance are excessively high, according to Vijayarani et al. (2015).

Stemming:ย Thisย connectsย theย differentย nuancesย ofย aย baseย word.ย Theseย nuancesย couldย beย inย theย form of plural forms, past tense or continuousย tense.

Normalization:ย Theย differentย inflectionsย wordsย canย takeย inย differentย contextsย isย checked.ย Theย targets atย thisย stageย areย hyphenatedย words,ย capitalization,ย acronyms,ย andย inย theย caseย ofย queryย tasks,ย itย takes care of spellingย errors.

Part-of-Speech (POS) Tagging: Depending on a sentence, words tend to assume different functions,ย theย aimย ofย POS-taggingย isย toย determineย theย partย ofย speechย thatย theย wordsย inย eachย sentence take up. The task is semi-automated, stochastic and rule-based. Models are used to automatically tag the dataset and later it is checked for consistency by human annotators (Marcus, Santorini, & Marcinkiewicz, 1993). The Penn Treebank, containing a total of 4.5 million words, is the most common POS-tagger used. The accuracy of a tagger is judged not just by its annotation accuracy but also by its consistency, syntactic function, efficacy and redundancy rating in tags (Marcus etย al., 1993).

 

Advertisements

CHAPTER THREE

ANALYSIS AND PROPOSED METHODS

Introductionย 

Good disease control entails efficient monitoring of media sources, though social media shows encouraging possibilities with real time information sharing and wide coverage, this information source can only be a viable option if the structure of the data processed from it can be well represented inย analysis.

Dataย collectionย 

Data is the major requirement to consider when designing a model. For generic applications, many samples of relevant annotated corpus are readily available; unlike for task-specific objectives, data has to be sourced, annotated and aggregated from the original stages.

Twitter was selected as the target social media platform for this work because of its wide coverage, provision for data streaming and search requests. Different events in different communities elicit different reactions from the populace; these variations and trends in the tweets generated in the locations of interest are extracted and captured to design a disease monitoring system.

CHAPTER FOUR

IMPLEMENTATION AND SIMULATION

ย Introduction

The Convolutional Neural Network (ConvNet) model, discussed in Section 3.3.2.2, used for short text analysis was implemented and its performance was measured by benchmark evaluations. All the results presented in this chapter were based on the data collated from Twitter using the procedures outlined in Section 3.1.1 and the output classes are as defined in Section 3.1.2.

CHAPTER FIVE

SUMMARY AND RECOMMENDATION

Summary

This work contributes to an emerging field of deep learning for disease control, we applied a character-level approach for text analytics that defeats the need for tasking model augmentation methods used in word vector learning and rule-based procedures. Even with comparatively little data, an NLP model with comparative performance in short text analysis was developed. The diseaseย predictionย modelย wasย builtย toย checkย theย frailtiesย ofย someย previouslyย failedย methodsย thatย were implemented for infectious disease monitoring and control; such as, the increase in information search on a particular disease may not translate to anย outbreak.

Recommendation

ย It will be worthwhile if time and resources are invested to make disease-related short text corpus publicly available to aid research in this area and to implement NLP tasks for named entity recognition (NER) to track the location of the outbreak reports.

Work to identify and control factors which directly or indirectly influence the inexplicable occurrence and reoccurrence of disease outbreaks in developing nations would reduce mortality rate. This will make epidemiology branch out into more fields.

References

  • Andrychowicz,ย M.,ย Denil,ย M.,ย Gomez,ย S.,ย Hoffman,ย M.ย W.,ย Pfau,ย D.,ย Schaul,ย T.,ย deย Freitas,ย N.ย (2016). Learning to learn by gradient descent by gradient descent. Advances in Neural Information Processing Systems 29 (NIPS 2016), 1-17. Retrieved from https://papers.nips.cc/paper/6461- learning-to-learn-by-gradient-descent-by-gradient-descent.pdf
  • Baars, H., & Kemper, H.-G. (2008). Management support with structured and unstructured Data โˆ’ an integrated business intelligence framework. Information Systems Management, 25(2), 132โ€“ 148. https://doi.org/10.1080/10580530801941058
  • Bansal, S., Chowell, G., Simonsen, L., Vespignani, A., & Viboud, C. (2016). Big data for infectious disease surveillance and modeling. Journal of Infectious Diseases, 214 (Suppl 4), S375โ€“S379. https://doi.org/10.1093/infdis/jiw400
  • Bengio,ย Y.,ย Ducharme,ย R.,ย Vincent,ย P.,ย &ย Janvin,ย C.ย (2003).ย Aย Neuralย Probabilisticย Languageย Model. The Journal of Machine Learning Research, 3, 1137โ€“1155. https://doi.org/10.1162/153244303322533223
  • Brauer, F. & Castillo-Chavez, C. (2012). Mathematical models in population biology and epidemiology. DOI: 10.1007/978-1-4614-1686-9
  • Chan, E. H., Brewer, T. F., Madoff, L. C., Pollack, M. P., Sonricker, A. L., Keller, M., โ€ฆ Brownstein,
  • S. (2010). Global capacity for emerging infectious disease detection. Proceedings of the National Academy of Sciences, 107(50), 21701โ€“21706. https://doi.org/10.1073/pnas.1006219107
  • Choi, J., Cho, Y., Shim, E., & Woo, H. (2016). Web-based infectious disease surveillance systems and public health perspectives: a systematic review. BMC Public Health, 16(1), 1238. https://doi.org/10.1186/s12889-016-3893-0

Advertisements

WeCreativez WhatsApp Support
Our customer support team is here to answer your questions. Ask us anything!