A Clustering-Based Web Prefetching in High Traffic Environment

A Clustering-Based Web Prefetching in High Traffic Environment

Chapter One

Aim and Objectives

The aim of this research is to improve the web prefetching technique, by developing a prefetching technique that can be effective in a high traffic environment when the server idle time is very low.

The specific objectives are to:

predict user request based on history ofuser
determine which pages will be requested by majority of users in the nearest
prioritize the prefetching based on the frequency of the server idletime
evaluate the algorithm in respect to existing prefetchingalgorithm

CHAPTER TWO

LITERATURE REVIEW

Introduction

This chapter discusses web prefetching and caching in details. It also elaborate on the related work.

Web Caching

Web caching is one of the most successful solutions for improving the performance of Web- based system. In Web caching, the popular web objects that are likely to be visited in the near future are stored in positions closer to the user like client machine or proxy server. Thus, the web caching helps in reducing Web service bottleneck, alleviating of traffic over the Internet and improving scalability of the Web system (Waleed, 2011).

Types of Web Cache

Web caching keeps a local copy of Web pages in places close to the end user. Caches are found in browsers and in any object between the user agent and the origin server. Figure 2.1 depicts the various locations of the web cache. Typically, a cache is located in client (browser cache), proxy server (proxy cache) and origin server (cache server)

Client Side Cache

The client side cache is located in the user browser. The user can notice the cache setting of any modern Web browser such as Internet Explorer, Safari, Mozilla Firefox, Netscape and Google chrome. This cache is useful, especially when users hit the “back” button or click a link to see a page they have just looked at. In addition, if the user uses the same navigation images throughout the browser, they will be served from browsers’ caches almost immediately.

Proxy Server Cache

The proxy server cache is found in the proxy server which is located between client machines and origin servers. It works on the same principle of browser cache, but on a much larger scale. The proxies serve hundreds or thousands of users in the same way. When a request is received, the proxy server checks its cache. If the object is available, it sends the object to the client. If the object is not available, or has expired, the proxy server will request the object from the origin server and send it to the client. The object will be stored in the proxy’s local cache for future requests. This work seeks to use a proxy server which will be elaborated in section 2.5.

Origin Server Cache

Even at the origin server, web pages can be stored in a server-side cache for reducing the need for redundant computations or database retrievals. Thus, the server load can be reduced if the origin server cache is employed.

Cache Replacement Policy

Cache replacement policy is the rule required to remove a web object from the cache whenever the cache memory is full. When a cache miss occurs, the cache controller will select a block to be replaced with the desired data. A replacement policy determines which block should be replaced. The selection of the block to be replaced can be determined in several ways. The conventional cache replacement policies are Least Recently Used (LRU) and Least Frequently Used (LFU). LRU cache replacement policy replaces the least recently requested web page while the LFU cache replacement policy replaces the least frequently requested web page. The following are the factors (features) of Web objects that influence Web proxy caching:

Recency: object’s last reference
Frequency: number of requests made to an
Size: size of the requested Web object. Access latency of webobject

Proxy Caching

A Proxy server processes requests from within a firewall by forwarding them to the remote servers, intercepting the responses, and sending the replies back to the clients. Since the same proxy servers are typically shared by all clients inside, this leads to the question of the effectiveness of using these proxies to cache documents. Clients within the same firewall usually belong to the same organization and likely share common interests. They would probably access the same set of documents. Therefore on the proxy server a previously requested and cached document would likely result in future hits. Web caching at proxy server can not only save network bandwidth but also lower access latency for the clients. Caching can be done at the client browser, sever side and the proxy server. There are three types of proxy caching which are forward proxy caching, reverse proxy caching and transparent proxy caching (Reddy, 2007).

CHAPTER THREE

HIGH TRAFFIC WEB PREFETCHING MODEL

Introduction

This chapter describes the proposed high traffic web prefetching model. The architecture is presented as well as the corresponding algorithm of its sub models.

The Prefetching Model Architecture

The internet is a network environment where several users request for web pages. The connection to the web pages is done using the proxy server which is an intermediary between the web browser and the origin server. Several users request for a web page via the internet. The proxy cache is searched for the web page. The web page is fetched from the cache if it is present. The presence of the web page in the cache signifies a cache hit. Otherwise, a cache miss is recorded. The web page is fetched from the origin server. The web page is served to the user within a client group and also saved as a log file. The log files are preprocessed to remove noise. The preprocessed log files is used to update the web navigation graph. A support and confidence threshold value is applied on the navigation graph to remove web pages and edges that have low values. Several clusters are form in a client group. When the predicted prefetch time is greater than the server idle time inter domain cluster is formed. Figure 3.1 shows the web prefetching and caching environment.

CHAPTER FOUR

IMPLEMENTATION AND RESULT

Introduction

This chapter discusses implementation details of the proposed model as well as the result of the experimental evaluation of the model. Its performance is also compared with the simulated algorithm when using InterClustering and Thulaseet al.,(2014).

CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATIONS

Summary

Web prefetching is predicting user web object prior to the request of the web page. Prefetching is only done at server idle time. The prefetching problem addressed on a Web cache environment using an algorithm for clustering web pages.In this work, a technique to improve and prefetch at the server idle time is proposed. Users’ log files was collected and preprocessed. Clustering of the Web objects in the WNG is done. Availability of requested Web object in the cache leads to prefetching of all other Web objects in the cluster during server idle time. In a situation where the predicted server idle time is greater than the actual server idle time, inter domain cluster is performed. Several clusters will be formed from different domain based on users’ request. The proposed algorithm is evaluated based on hit ratio, byte ratio, precision, accuracy of prediction and usefulness of prediction were the performance metrics used.

Conclusion

Web prefetching is only done at server idle time. This research improves the prefetching at the smallest server idle time. From results of the experiments carried out, the proposed algorithm webclustering, improved on addressing the high traffic problem thereby enabling prefetching at the minimum server idle time.

Recommendations

Although the result obtained from the work shows some intensity result as compared to the standard prefetch technique, the measure of the estimated prefetch time depends only on the page size and bandwidth. Future work should look into other parameters that affect the speed of fetching a web page including the processor speed, the file location.

REFERENCES

Baskaran, k., and Kalaiarasan, C. (2015). Combining Pre-fetching and Intelligent Caching Technique (SVM) to Predict Attractive Tourist Places. Research Journal of Applied Sciences, Engineering and Technology , 9(1), 40-46.
Baskaran, k., Kalaiarasan C., and Sasi, A. (2013). Study of combined web prefetching with web caching based on machine learning technique. Journal of Theoretical and Applied Information Technology, 55( 2) , 280-291.
Bhaskaran, V., and Murali, V. (2012). Optimizing the Web Cache Performance by Clustering based Pre-Fetching Technique using Modified ART1. International Journal of Computer Applications, 4(1) , 50-57.
Cheng-Zhong, X., and Tamer I. (2000). Semantics-Based Personalized Prefetching to Improve Web PerformanceI. Institute of electrical and electronic engineering , 20(2), 636-643.
González-Cañete, F., (2007). A Content-type Based Evaluation of Web Cache Replacement Policies. International Conference Applied Computing , 2(3), 90-96.
Greeshma, G., and Jayasudha, J. (2012). A Survey on Web Prefetching and Web Caching in a Mobile Environment. International Journal of Computer Science and Information Technology, 2(1), 119-136.

Other Topics