Classified web pages by bounce rate with SEO features

Published: 10 September 2024| Version 1 | DOI: 10.17632/h69x79cf4p.1
Contributors:
,

Description

To build a prediction model of webpage bounce rate and investigate the relationship between SEO features and bounce rate, it's required to specify a set of webpages and collect statistical data about them. To get bounce rates of these webpages, they should have Google Analytics property ID and tracking code on those webpages. In addition, the data collector should have access to this Google Analytics account. The number of examples in the final output dataset after preprocessing stage was 824 webpages. The attached excel sheets are for the dataset before and after preprocessing.

Files

Steps to reproduce

1366 webpages from 7 websites (including Arabic and English) were selected to be used for data collection. The Google tool "Looker Studio" was used to collect datasets. The 7 output datasets included the attributes: URL, bounce rate, views, average session duration, views per session, views per user, new users, sessions per user, scrolled users, engagement rate, user engagement, engaged sessions, and sessions. Data was extracted for the period Sep 2022 – Sep 2023. All these attributes can be used as dependent variables in the classification process, but because bounce rate is the most attribute that express about user engagement and can express about measurable effect on improvements of targeted actions on a webpage for economic purposes, it was selected to be the target class attribute of dataset. The ScreamingFrog tool was used to extract the SEO features for each webpage. Then, a RapidMiner process was used to merge the collected data from Google Looker Studio and ScreamingFrog. Finally, a data preprocessing by RapidMiner was applied for many goals: 1- To filter out webpages that have less than 10 page views, because this number of page views is little and can't be used to give a trusted values of bounce rates. 2- To select the needed attributes for classification. 3- To apply discretization on the attribute “Bounce Rate” to convert it from numerical values to user-specific classes to set it as a target attribute for classification. Discretization was applied for 3 values: Low, Medium, and High. Low bounce rates were chosen for values less than 0.4, medium for values between 0.4 and 0.6, and high bounce rates for values over 0.6 (values were selected up to expert recommendations)

Institutions

University College of Applied Sciences, Universiti Sains Malaysia

Categories

Artificial Intelligence, Data Mining, World Wide Web, Web Application, Web-Based Intelligence, Machine Learning, Search Engine, Intelligent Web, Artificial Intelligence Applications, Search Engine Marketing Technique, Digital Marketing

Licence