Enhancing Query Performance and Privacy on Relational Cloud Database
CHAPTER ONE
Aimย and Objectivesย ofย theย research
This dissertation is aimed at enhancing the privacy and query technique on relationalย cloudย database.
Theย objectivesย ofย this dissertationย areย to:
- Design an efficient technique that will enhance the privacy and performance ofSQLย queries on cloud database.
- Implement a model that minimizes the running time of client/cloud database
- Analyze query performance using the conventional querying method versus theproposed system with the Transaction Processing Performance Council (TPC-H)
CHAPTER TWOย
REVIEWย OFย LITERATURE
ย Introduction
This chapter addresses the framework of database outsourcing (database in the cloud),ย data security, query efficiency and benchmarking, which form the background of theย research.ย Itย goes furtherย to reviewย literatureย relatingย toย theย aforementioned.
Conceptย ofย Cloudย Database
A database is an organized collection of data which enables easy access, update andย management.ย Itย isย aย structuredย repositoryย forย informationย withย linksย withinย theย informationย thatย helpย makeย theย dataย searchable.ย Aย Databaseย Managementย Systemย (DBMS) is a software package with programs that control the creation, maintenance andย use of a database. It allows organizations to conveniently develop databases for variousย applicationsย (Elmasri, 2008).
Aย cloudย databaseย alsoย referredย toย asย Database-as-a-Serviceย (DBaaSย orย DaaS)ย isย aย database that isย accessible toย clientsย from the cloud and delivered toย usersย on demandย via the Internet from a cloud database providerโs servers. It typically runs on a sharedย cloud computing platform such as Windows Azure, Amazon EC2, GoGrid, Googleย Cloud SQL, all of which comprise hardware and software (Mateljan et al., 2010). Theย cloud platform is structured to host multiple outsourced databases by providing databaseย as aย specialized serviceย or providingย virtual machines to deployย anyย databases on.
Database services on the cloud are provided with automated features which enableย optimized scaling, highย availability, multi-tenancy and effectiveย resource allocation.ย Theย servicesย haveย theย advantagesย ofย increasedย accessibility,ย automaticย failoverย andย fast
automated recovery from failures, automated on-the-go scaling, load balancing, minimalย investmentย andย maintenanceย ofย in-houseย hardware,ย betterย performanceย overย theย traditional database. Some potential drawbacksย include security and privacy issues,ย potential loss/no access to data in event of disaster, internet outage or bankruptcy ofย cloudย databaseย serviceย providerย (Phan, 2013).
The cloud database is constructed by collecting a number of sites also called nodesย which are interlinked by aย communication network. Every node is aย databaseย class.ย Each class has its own database, terminals, the central processor and their individualย localย databaseย management systemย (Donkenaย andย Gannamani,ย 2012).
Advertisements
Typesย ofย Cloud Database
Cloud databases are categorized based on the model. Some are SQL based and some useย theย NoSQLย data model.
SQLย Databases
Structured Query Language (SQL) was developed in the 1970โs by IBM when Edgarย Codd introduced the relational data model. It is a standard query language for relationalย databaseย management systems (RDBMS)ย (Codd,ย 1970).
In the relational model, data are organized into relations, each represented by a tableย comprising rows and columns. It also has a key, which is used to map data to otherย relations.ย SQLย isย usedย toย makeย queriesย toย theย databaseย suchย asย creating,ย reading,ย updating and deleting data. SQL supports indexing mechanism to speed up readingย operations, or creating views which can join data from multiple tables and other featuresย forย databaseย optimizationย and maintenanceย (Elmasri, 2008).
Oneย importantย attributeย ofย SQLย databasesย isย thatย theyย followย theย Atomicity,ย Consistency,ย Isolationย and Durabilityย (ACID)ย rules (Codd,ย 1970).
- Atomicity:A transactionย isย aย logicalย unitย ofย workย whichย mustย be eitherย completedย withย allย ofย itsย dataย modifications,ย orย noneย ofย themย isย performedย thatย is,ย transactions are all-or-nothing
- Consistency: ย ย Before and after a transaction, all data must be left in a stableย Roll-back occurs in event of failure.
- Isolation:Modificationsย ofย dataย performedย byย aย transactionย mustย beย independent of another transaction (without interference). Unless this happens,ย theย outcome ofย aย transaction mayย beย erroneous
- Durability:When the transaction is completed, effects of the modificationsย performed by the transaction must be permanent in the system (committed).ย Committed transactions cannot be lost. Track of changes are kept in logs toย ensureย reliabilityย of dataย at anyย point in time.
CHAPTER THREEย
ARCHITECTUREย ANDย DESIGNย OFย SEQUREย SQL
ย Introduction
This chapter starts by introducing the framework of the proposed design known asย SecureSQL. The proposed system model which comprises the design is explained. Theย architectureย ofย theย systemย whichย givesย detailsย ofย theย algorithmsย andย theย majorย componentsย involvedย isย alsoย presented.ย Theย toolsย andย platformย usedย inย theย implementationย of theย system is also highlighted.
Systemย Model
Atย theย mostย rudimentary level,ย aย cloudย storageย systemย justย needsย oneย dataย serverย connected to the Internet. A subscriber copies files to the server over the Internet. Whenย a client wants to retrieve the data, the client accesses the data server with a web-basedย interface, and the server then either sends the files back to the client or allows the clientย toย accessย and manipulateย the data itself.
Figure 3.1 gives an overview of the proposed system model. The model employs theย client-server architecture. The client is trusted and the server is not trusted. This impliesย that the server is honest but curious. Most of the computations are to be performed onย theย clientย side.ย Dataย isย encryptedย onย theย clientย sideย usingย selectiveย attribute-basedย encryption, before deployment to the cloud server.ย The client layer consists of theย application which is accessed through a web browser. It connects through the internet toย the Cloud Service Providerย (CSP) and then to the dataย center, both of which make upย the server side. When the owner is uploading or updating data, the application interfaceย allowsย himย toย selectย theย entityย orย relationย heย wantsย toย encrypt.
CHAPTER FOUR
ย IMPLEMENTATION,ย RESULTSย ANDย DISCUSSION
ย Introduction
This chapter begins by generating the test data used for the database. It goes on to theย software and system development by explaining the implementation of the client sideย application and its connection to the cloud database. Furthermore, testing of the queriesย isย doneย byย analyzingย andย evaluatingย theย performanceย usingย statisticalย tools.ย Inย conclusion,ย the experimental results arrivedย at,ย areย criticallyย discussed.
CHAPTERย FIVE
SUMMARY,ย CONCLUSIONย ANDย RECOMMENDATION
ย ย Summary
The efficiency of data retrieval in large outsourced databases, especially as it relates toย the privacy of such data has been an open challenge primarily due to the fact thatย traditionalย queryย languagesย cannotย workย withย encryptedย data.ย Reviewedย literatureย showed that most of the architectural models and encryption techniques proposed toย ensure efficient performance of queries as well as privacy of outsourced data haveย limitationsย whichย rangeย fromย highย computationalย overheadย toย restrictedย queryย computation.
Based on these limitations, this research proposed an architectural framework whichย focused on enhancing server-side data retrieval and query efficiency through the use ofย hashย mapย andย AESย 128-bitย encryptionย algorithm.ย Theย implementationย ofย thisย frameworkย knownย asย secureSQLย wasย builtย onย theย client-sideย withoutย anyย alterationย toย the DBMS structure. Server-side data retrieval was made possible through a simpleย graphical user interface to encrypt data before uploading to a cloud server; performย queries and decrypt encrypted values on the fly. A query processing engine was alsoย developed to process encrypted data. Selective attribute-based encryption which wasย incorporatedย intoย theย encryptionย algorithmย ensuresย thatย computationalย overheadย isย minimized.
Observation of the results obtained (refer to section 4.9) show that there is a significant variation between the execution time using the traditional method of decrypting entire records before performing queries and the time using the proposed method. Also, with increasing number of records, the proposed method maintains some degree of constancy in execution time thereby supporting the O(1) time complexity assertion for the use ofย the hashย map data structure (refer toย sectionย 2.13ย andย sectionย 2.6).ย A tabular andย graphical comparison (refer to section 4.10) with some related works show that theย proposedย method scoresย moreย points.
SecureSQL model guarantees efficiency and is able to execute 20 out of the 22 TPC-Hย benchmark queries while ensuring privacy. Thisย isย proof that it isย not restricted toย simple query constructs but is able to handle even complex queries involving nested subย queriesย andย joins.
In a nutshell, this research has dealt with relational databases that are hosted on theย cloud; how they are secured, the effects of the security on data retrieval and furtherย proposedย methods to reduceย these effectsย to aย minimal level.
Conclusion
The recent explosion of digital content ownership has both increased the popularity ofย data outsourcing and fueled concerns over data security. The need to facilitate storageย andย processingย ofย largeย amountsย andย typesย ofย sensitiveย dataย isย ofย particularย importanceย in modern enterprise especially where the server is not trusted and client resources areย limited. This research worked at eliminating the tradeoff between security and efficientย query processing throughย itsย secureSQLย designย asย isย evidencedย inย the results.ย Theย design of an efficient technique to enhance the privacy and performance of SQL queriesย onย cloudย databasesย whichย wasย theย firstย objectiveย hasย beenย achievedย throughย theย combined use of Selective Attribute-Based Encryption, Hash Map structure and AES-ย 128 bit encryption algorithm. The model was implemented using all the implementationย toolsย in section 3.7, thus,ย the second objectiveย isย also achieved.
Finally, the observations in section 4.9 show a wide margin in the query performanceย using the conventional method and the proposed method. The third objective is alsoย achieved.
As long as the performance degradation due to encryption is under control, users wouldย choose to use outsourced databases to store private information rather than traditionalย databases.
Recommendation forย futureย work
Some aspects related to this research could not be investigated as they were out of scopeย while others were investigated but not implemented due to limitation of time. Theseย aspectsย haveย thereforeย been recommended as futureย workย outlined:
- Explore further techniques of data retrieval security and efficiency that deal withmultimediaย databasesย (images,ย musicย andย videos)ย becauseย ofย theย recentย explosionย ofย mobileย technologyย andย rapidย expansion of onlineย media
- Design of an intelligent system that determines which columns are sensitive andautomaticallyย encrypts the
- Make the encryption/decryption keys dynamic so that they could be set from theapplicationย user interface.
- With advances in web computing and mobile technology, XML is widely beingused as a standard to exchange data over inter-networked heterogeneous systemsย and web based applications. The hash map method could be extended to developย queryย execution techniques over the encryptedย XML
- Investigate and analyze similar techniques implemented in this work for non-relational
- A data partitioning function which defines the conditions for data sensitivity canbe incorporated to determine the sensitivity of data for certain attributes beforeย applyingย theย encryption function.
REFERENCES
- Aderounmuย G.A.(2012).ย โCloudย Computing:ย Aย Definition,ย Challengesย andย Opportunitiesโ. [PowerPoint slides]. Available from:ย http://www.cpn.gov.ng/?page=show&cat=6&subc=20
- Aggarwal G., Bawa M., Ganesan P., Garcia-Molina H, Kenthapadi K., Motwani R. andย Xu Y. (2005). Two can keep a Secret: A Distributed Architecture for Secureย Databaseย Services.ย Retrievedย Augustย 20,ย 2014ย from:ย https://database.cs.wisc.edu/cidr/cidr2005/papers/P16.pdf
- Agrawal R., Evfimievski A. and Srikant R. (2003). Information sharing across privateย databases. In Proceedings of the 2003 ACM SIGMOD international conferenceย on Management of data, SIGMOD โ03, pages 86โ97, New York, NY, USA,ย 2003.
- Agrawal R., Kiernan J., Srikant R. and Xu Y. (2004). Order Preserving Encryption forย Numericย Data.ย Inย Proceedingsย ofย theย 2004ย ACMย SIGMODย internationalย conference on Management of data, SIGMOD โ04, pages 563โ574, New York,ย NY,ย USA, 2004.
- Alย Tamimi,ย A.ย (2003).ย โPerformanceย Analysisย ofย Dataย Encryptionย Algorithmsโ.ย Retrievedย Aprilย 23,ย 2015ย from:ย http://www.cse.wustl.edu/~jain/cse567-ย 06/encryption_perf.htm.
- Alhanjouri M. and Al Derawi A.(2012) โA New Method of Query over Encrypted Dataย in Database using Hash Mapโ, International Journal of Computer Applicationsย (IJCA),ย 41(4):ย 46-51,ย Marchย 2012.ย Publishedย byย Foundationย ofย Computerย Science, NY, USA. Retrieved from:ย http://research.ijcaonline.org/volume41/number4/pxc3877580.pdf
- Badger L., Patt-Corner R., Grance T. and Voas. J. (2011). โDRAFT Cloud Computingย Synopsis and Recommendations. Recommendations of the National Institute ofย Standardsย andย Technologyโย (NIST).ย May,ย 2011.ย Retrievedย from:ย http://csrc.nist.gov/publications/drafts/800-146/Draft-NIST-SP800-146.pdf
- Bell R. (2009). A Beginners guide to Big O notation.[web log post]. Accessed August 2,ย 2015ย from:ย https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
- Bellovin S.M. (2009). Modes of Operation [Power point slides]. Retrieved Septemberย 12,ย 2014 from: https://www.cs.columbia.edu/~smb/classes/s09/l05.pdf.