- read

Prengen : A Prediction System for Web Requests

Bikrant Bikram Pratap Maurya 25

With the influx of the internet, we have seen significant growth in the number of its users. It has become an essential part of our day-to-day life. The Internet has proved to be a go-to solution for our daily problems but due to inflation in the number of users it has also become an ocean of information where getting perplexed is easy and with the introduction of web 2.0 the internet became a common thing and the result was generation of an enormous amount of data that is most often referred to as big data. Thus a prerequisite of a system to guide also aroused. There are chunks of data that are being generated by us in a single day. Nowadays most of us have social media account like Twitter, Facebook , Yahoo, various shopping sites accounts like amazon, Flipkart food ordering apps etc. these all create humongous amounts of data every day [1]. This enormous amount of data can be used to study user behaviour and predicting the next web page he/she may visit. In the current era, everyone wants to buy some time, and with a prediction of the next web page, it can be reduced and will result in a better and smooth user experience.

The activities we perform on the internet like surfing, watching movies on Netflix , reviewing an item online, shopping, visiting a site, checking out prices of a product all account for the data and result in the creation of big data. This data is then collected and processed in such a way that it can produce some useful information and predict user behaviour. This data is then fed to prediction based engines which later results in predicting our interests and future actions. Just like Netflix, Spotify, they all suggest movies according to our interest, mood, time of the day.

There has been a growing number of predictable activity models on the web. An accurate prediction can reduce user access times and reduce network traffic where pre-shipping is handled properly Lots of work done with recommendations for pre-delivery systems and programs[3]. Recommendation systems that rely on predictive models to make assumptions. The traditional approach of prediction base system is based on watching out server logs of users, and mining data in which sequence he visits web pages and learning these patterns to predict the next one. Recommendation systems were the basic building block of the prediction system. Initially, these only helped us to study and predict the user’s interest and then feed him with that information accordingly.

In this paper, we represent a model that learns about the various choices that users make while searching, visiting various sites, asking questions, reviewing things, visiting an item, making a wish list, checking etc. The algorithm looks at the user’s action, studies them, finds patterns, reads them and predicts their future actions, and this goes on, continues to learn and develop on its own. We, therefore, set out the concept of predicting the next web page a user can use.

Based on the approach to development and philosophy, recommendation systems are highly differentiated into systems based on content and collaboration. While content-based systems use specific user information to generate recommendations, interactive systems use information from the same user group to receive recommendations. The recommended recommendation program in our work is a collaborative program that uses the combined wisdom of other users to make recommendations.


Algorithm for front-end behaviour

Algorithm shows how the front end handles the responses of the backend and how it communicates with the back end to provide the functionality. When a user visits a web app for the first time, the back-end sends the response according to the request. Front end displays that page and sends the request to predict the next page. When the front end gets the response then it checks if the user has made any other request, if the user hasn’t made any more request by the time then the. The front end preloads the predicted webpage and does not display it and after that, if the user makes the same request then it sends a database change flag (to verify whether data changed after prediction response) if false then the preloaded page displays as it is otherwise displayed after changes in the data.

If the user sends any other request other than the predicted request then the preloaded page gets discarded and a prediction error flag would be sent.

Algorithm for back-end behaviour

This algorithm shows how backend should behave to get functionality of predictive engine,

When a user makes a request to the backend, the backend sends the requested page and saves the session details. Whenever a user visits any page a log gets created to understand users behaviour.

When the front end makes the database check request, the back-end searches for changes within the period and sends the data and if the database didn’t change then it sends a false flag.

Algorithm for probability priority:

Since there are two tables for prediction. One table gives the prediction for a single user and another table specifies prediction for all users. So this algorithm specifies which prediction should be used.

Flow chart

Logistic regression for prediction:

An asset retrieval is called a function used in the context of a route, an entry function.

Logistic activity, also called sigmoid activity, is performed by mathematicians to describe the human growth structures in the environment, which are rapidly increasing and carrying the forces of nature. It is an S-shaped curve that can take any number with real value and place it at values ​​between 0 and 1, but not directly within those limits.

Logistic regression is named for the function used at the core of the method, the logistic function.

The logistic function, also called the sigmoid function, was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits.

Alternatively, we assume that input (X) may belong to the default category (Y = 1), we can officially write this as:

Note that probability estimates should be converted to binary values ​​(0 or 1) in order to make probability predictions. More on this later when we talk about making predictions.

Logistic retrieval is a straightforward process, but the prediction is reversed using the login function. The effect of this is that we can no longer understand prediction as a combination of input as much as possible with the reversal of the line, for example, continuing from the top, the model can be described as:

The above equation can also be written as:

This is helpful because we see that the output of the right output is also straightforward (like lineback), and the left input is the default text of the default category.

The scale on the left is called the default category challenges (it is historically that we use the challenges, for example, the challenges are used for horse racing rather than opportunities). Ratings are calculated as the average of the probability of an event divided by non-event opportunities, e.g. 0.8 / (1–0.8) with 4 chances so instead we can write:

Because the issues are changed by logging in, we call this left-hand side log or probit. Other types of transform functions may be used (outside of scope_, but as a result, it is common to refer to a version that associates a variable line equation in possibilities such as link operation, e.g.

We can take the exponent back to the right and label it as:


When the prediction table gets ready then the first question arises from where the probability should be taken either from average prediction table or the personalization prediction table. The answer to that question is whichever has more value. Suppose probability for page A from average tabel is P1 and probability for page B from personalize table is P2 then if P1 is greater than P2 then prediction of P1 would be preferred and vise-versa.

Given diagram shows navigation according to the prediction. According to this data following navigation would take place

1 -> 2

1 -> 3

3 -> 4

3 -> 3

Since there can be multiple navigation from the current page then sum of both predictions would be preferred. Whichever path would have more weight that path would be taken.


The logs have been collected for the period of around five month and have been tested on a shopping website by the student of University. We have implemented this work using javascript and python . For this purpose we have used a server core i5 server with 4 GB of RAM. The format of web log files are in Comma Separated Values (CSV)[2].

Average prediction table

The given table shows how the predicted table will be stored in the format of CSV. The current page represents a page that the user is currently visiting and the average probability shows what is the probability to go to the next page from the current page.

Personalize prediction table:

Personalized predicted table shows the personalized prediction table clustered by user id or session id. which user is currently visiting and average probability shows what is probability to go to the next page from current page.

Below given image shows the implementation and how prefetched data have been stored in buffer and displayed on the console.

Below shown heatmap represents the result of the predicted values vs real values. As shown in heatmap prenagen have pretty much good result.


This survey paper will help future researchers in the field of web page forecasting to determine the methods available. This paper will also help the researcher to conduct their research more effectively.

Our results also show that both of these training and prediction algorithms can be used in real-time. Our algorithm has fast applications for web server cache, pre-shipping programs and recommendations. In our future work, we wish to use this algorithm in these domains.


The Internet has brought the world to ease, nowadays we can find anything online. The demand for prediction-based models has increased in the past. With the generation of big data on an everyday basis. As the internet has now become an ocean where getting lost has become easy we need something to guide the user and provide effective, seamless and positive experiences to the user. Our algorithm/ model has shown positive results by predicting the next web page a user may desire and making the surfing experience smooth. With self-learning and improving algorithm, we can predict more accurate results and promise better results in the future. This algorithm will also help other aspiring minds to get to know about the prediction based algorithm and develop a more efficient system in the future. This algorithm also aims to improve the surfing experience of users who live in remote areas as the algorithm will predict the most desired result, which improves their time consumption and brings a better surfing experience.