word number: 13201
Time: 2021-03-01 14:29:27 +0000
The original article is copyright under © Springer Nature Switzerland AG, Part of Springer Nature:
This is Author’s pre-print which is the submitted version or before the peer review process. The copyright was transferred from Fajar Purnama to Springer Science+Business Media, LLC, part of Springer Nature and stated in the copyright transfer that “Author(s) retain the right to make the pre-print available on their own personal, self-maintained website. Springer not only encourages pre-print sharing, but even allow to license it as creative commons, therefore I license this post as customized CC-BY-SA where you are also allowed to sell my contents but with a condition that you must mention that the free and open version is available here. In summary, the mention must contain the keyword “free” and “open” and the location such as the link to this content.
As I am the author, I can say that there is significant difference between before and after peer review version. Springer is kind enough to allow sharing of the authors’ accepted version (AAM) 12 months after publication which will be 6 March 2021. However, we can open it faster if you are interested.
I do highly recommend to contact me before donating where you can find my social media links on my donation page but with technology today, I cannot stop you from donating so make sure to contact me so that I can list you on the donator list. Beware of scams, my recommendation is to contact me and book or delay your donation. Once enough people are interested, I will contact you back on the same exact window. If there’s a comment section here, you can book your donation in the comment
Mouse tracking serves as an alternative to eye tracking in measuring the learning process in education because of its affordability. Moreover, mouse tracking does not require extra hardware, as in the case of eye tracking, because it is a feature in personal computers by default. Therefore, it is possible to implement mouse tracking in a massive open scale. However, mouse tracking has only been implemented in a laboratory setting to date, ostensibly because of the associated extremely high running costs. Nonetheless, there is no available data to support the claim of high resource costs, which has resulted in much speculation among implementers. In general, the implementation of mouse tracking in a non-laboratory environment is still rare. Therefore, the authors developed an application to investigate real-time mouse tracking online. It was implemented on the Moodle learning management system and tested on an online quiz session accessed abroad. Additionally, the application can handle tracking on mobile devices. In this work, the main resources that include CPU, network, RAM, and storage costs were measured when mouse tracking was used. These results can serve as a reference for network and server administrators during future implementation of this technique. It was determined that the characteristics of mouse activities were dynamic in that occasional surges and lulls were observed. If mouse tracking data are not aggregated and transmitted as a single data package, then mouse tracking can be implemented on a large scale.
With the recent advances in technology, including the Internet, information can be searched and published with few restrictions. With the advancement in information communication technology (ICT), more activities are being conducted online. Individuals no longer spend hours staring at computer screens after work or class; instead, they often use their mobile devices to stay online irrespective of the time or their physical location. (Dentzel, 2013). Unlike in the past where individuals were limited to local newspapers, televisions, and textbooks, people can now easily search and choose the information they want using advanced search engines such as Bing, DuckDuckGo, Google, and Yahoo. In the case of social networking services (SNS) such as Facebook, Instagram, Line, and Twitter, individuals can get the latest news, interact with one another, open discussions, and share information. Users can enjoy entertainment such as viewing photographs, listening to music, and watching videos using services such as Dailymotion, Metacafe, and Youtube. In addition, video games are also available. Shopping is also facilitated by online merchants such as Aliexpress and Amazon, where individuals can order items and have them delivered. All of these online activities can be performed using a computer device connected to the Internet.
Education has also benefitted from the Internet and courses can be delivered blended (Paturusi et al., 2012) or fully online (Wen and Rosé, 2014). A variety of learning and teaching activities can be performed outside the classroom, for example, the reading of learning material, discrete discussion in forums, submission of assignments, and the performance of exercises (Linawati et al., 2017). This greatly reduces the burden on both students and teachers. The number of higher education institutions that provides online education is increasing, and it is only a matter of time before primary and secondary schools (Sopu et al., 2016) follow this model. Implementing an online course is currently much easier than in the past because of the advent of learning management systems (LMS) where most processes are automated (Kakasevski et al., 2008) without the need for advanced knowledge on computers and web programming (Chourishi et al., 2011); only computer literacy is a prerequisite. The next step after implementing online courses is the implementation of massive open online courses (MOOC) (Drake et al., 2015). Unlike regular online courses, everyone can join MOOC, which is not limited to students of certain institutions.
With the option of numerous online activities, many studies have become interested in analyzing these activities in an area known as online analytics. Online analytics records who, where, and when associated with online activities. The most popular metrics are total traffic, source traffic, bounce rate (the rate of people immediately leaving the page after visiting), and conversion rate (whether the page fulfills its purpose) (Bluehost, 2016). Whether it is a public website or an online course, the concept is almost the same. On a public website, the number of page views, comments, ratings, image views, videos played, and items bought are recorded. Based on page views only, a variety of analyses can be performed. A page view can predict a user’s demographic (Hu et al., 2007), characterize an audience in terms of preference for news, multimedia, games, or adult content (Kumar and Tomkins, 2010). It can also predict whether a user is at risk on visiting malicious websites (Canali et al., 2014). Page views also provide hints on how to improve a website’s design, for example moving popular webpage links to the header (Khan et al., 2018). There are already many web analytics software available such as Google Analytics, Open Web Analytics, Piwik, and Cloudflare (spa, 2019). In online courses, logins, learning materials viewed, discussions, assignments submissions, quizzes attempt, and grades are recorded (moo, 2016). They can reveal for example a student’s level of activity (Nandi et al., 2011), how difficult the quizzes are for students (Nakano et al., 2005; Usagawa et al., 2006), and even identify failing students (Fungai and Usagawa, 2016). Usually, these data are used to measure the learning performance of students (Wen and Rosé, 2014).
Although those popular metrics can measure learning performance, there is a limit when it comes to measuring the actual learning process (Zushi et al., 2012). As such, the what, when, and where can be measured in detail, but not the how (Purnama et al., 2016a). To obtain more detailed measurements, it is necessary to record the time spent viewing a page (Li and Tsai, 2017). To obtain detailed information, the time spent viewing each section of a page can be recorded (Koh et al., 2018). One of the most common approaches is to divide a page into subpages (Lee et al., 2009) or to insert tracker codes into sections of the page (Purnama et al., 2016b). More powerful approaches involve eye-tracking (Pernice, 2017) and mouse tracking (Henrie et al., 2015; Koh et al., 2018), which can provide more information than just time spent viewing particular sections.
Eye tracking is arguably one of the most accurate methods for recording the viewing activity of users, but the financial cost is very high, thereby confining the technology to lab environments (Lai et al., 2013). Although mouse tracking is not as accurate, the financial cost is low in comparison. No additional devices are needed to perform mouse tracking, which can be implemented by anyone who has a computer. Most people own computers and with the increasing availability of the Internet, it is possible to perform massive scale mouse tracking implementation (Huang et al., 2011). Therefore, mouse tracking can be implemented outside of the laboratory in places such as classrooms, online learning, and websites. Recently, web mouse tracking software such as Mouseflow, ClickTale, ClickHeat, and Sessioncam has emerged (NT, 2015).
Unfortunately, another obstacle must be overcome before widespread implementation of mouse tracking can be achieved. This is related to the resource cost, especially for personal implementation. It is rumored that the resource cost for maintaining mouse tracking (eye tracking as well) is notoriously high, and therefore classified as Big Data (Sin and Muthu, 2015). However, the rumors were not confirmed. Furthermore, mouse tracking resource cost was never discussed in detail. Leiva and Huang (2015) state that a mouse swipe from left to right can generate hundreds of cursor coordinates and a mouse activity over a minute can generate 1 MB (megabyte) of data. That is the only information related to mouse tracking resource cost that was presented in their article. Discussions on these matters are very few, which has been unhelpful to implementers. Therefore, this work investigates the popular resource costs of mouse tracking including a computer processing unit (CPU), data rate, random access memory (RAM), and storage. A real-time online mouse tracking application was developed that can be implemented on any website. In this case, it was demonstrated on a quiz session on Moodle LMS. The source code is available on Github (Authors, 2019a). The implementation and resource measurement took place for three events: solo measurements in a laboratory, local testing by five people of a quiz session in a laboratory, and an overseas quiz session by students of classroom size in Mongolia, accessing a server in Japan.
This article is divided into six chapters. The first chapter is the introduction that includes background information on online activities and popular analytics, mouse tracking, and the problem of mouse tracking implementation as previously described. The second chapter is the literature review in which the results of several interesting studies in the field of eye tracking and mouse tracking are presented. Mainly, this chapter shows that there are promising results, which have generated interest and excitement, in these fields. The last part of this chapter discusses the state-of-the-art of this work. The third chapter is the system overview that discusses the real-time online mouse tracking application, developed as a part of this work, and mainly considers its operation and features. The fourth chapter is the experiment and implementation that explains the hardware and tools used, subjects, and the procedures of the experiment and their implementation. The fifth chapter presents the results and discussion on the mouse tracking data that was collected, sample analysis, and the resource costs from both calculated and profiled measurements. The sixth chapter is the conclusion, which summarizes the main findings.
Rayner (1998) reviewed many articles regarding eye movements spanning from 1971–1998 and claimed that eye tracking technologies existed since then. The most fundamental aspect of eye movements are fixation and saccade, where fixation is the process of fixing the gaze to a certain region of interest (ROI), and saccade is the process of moving the gaze to another ROI. However, it is up to the examiner to interpret eye movements, for example, eye movements can provide information about the user’s attention, interest, and state of mind. Eye tracking has been researched in the field of pattern recognition whether in a non-digital environment (Holmqvist and Wartenberg, 2005; Holsanova et al., 2006) or digital environment (Hyönä et al., 2002; Liu, 2005; Duggan and Payne, 2011; Jarodzka et al., 2017), search engine result page (SERP) (Rodden and Fu, 2007; Rodden et al., 2008; Huang et al., 2011, 2012), web evaluation and usability (Ehmke and Wilson, 2007; Buscher et al., 2009; Tzafilkou and Protogeros, 2017; Hsu et al., 2018), and visual search (Rayner, 2009; Dragunova et al., 2017).
In the category of learning, Lai et al. (2013) reviewed eye movement research in seven themes including pattern of information processing, effects of instructional design, reexamination of existing theories, individual differences, effects of learning strategies, patterns of decision making, and conceptual development. They concluded that the eye-tracking method provides a promising channel for educational researchers to connect learning outcomes to cognitive processes. Many educational researchers gained interest in the application of eye tracking in the process of learning and teaching, especially in online learning, which can make up for the lack of emotional connection between students and teachers (Cantoni et al., 2012). For example, eye tracking can capture signs of comprehension difficulties, cognitive stress, or tiredness of students during online learning, which a good teacher is able to perform during face-to-face learning.
In e-learning, eye tracking has been integrated into the online framework where the eye tracking hardware captures eye movement on the client, transmits the eye movement data to the server, then processes the data for direct analysis or to implement interactivity. Finally, the data are kept in storage.
Other than being integrated into the online learning framework, eye tracking is often utilized without adaptability and interactivity, simply as a tool to analyze specific characteristics of the learners and to perform post actions based on this analysis (Rakoczi and Pohl, 2012; Lupu and Ungureanu, 2013). Examples of utilizations include obtaining the cognitive (Eger, 2018) and emotional state of the users (Cantoni et al., 2012), evaluation of instructional design (Jarodzka and Brand-Gruwel, 2017; Yang et al., 2018) and user interface design (UID) (Ramakrisnan et al., 2012; Chivu et al., 2018), pattern recognition (Alhasan et al., 2018; Parikh and Kalva, 2018), strategic patterns (Tsai et al., 2012; Busjahn et al., 2014), etc.
Until now, eye tracking has yet to be implemented on a large scale because of hardware limitations. Almost all eye tracking articles cited herein are based on experiments in laboratory environments where separate and usually expensive eye tracking hardware is utilized. Most of these articles express confidence in the eventual reduction in cost and affordability of eye tracking hardware and the expectation that eye tracking will be implemented on a large scale in the future. In recent years, few researchers have attempted to fulfill these expectations, for example, Sungkur et al. (2016) and Zheng and Usagawa (2018) developed eye tracking in web cameras. As almost all modern laptops are equipped with a web camera, and most people including students own laptops, eye tracking in a web camera is a key aspect in the quest for large-scale implementation.
Although most researchers prefer eye-tracking data, many mouse tracking articles have noted that mouse tracking is a viable alternative because eye tracking technology is too expensive and inconvenient (Cooke, 2006). There are investigators who have attempted to correlate mouse tracking to eye tracking by utilizing exploratory studies (Rodden et al., 2008), correlation analysis (Chen et al., 2001; Rodden et al., 2008; Voßkühler et al., 2008; Liebling and Dumais, 2014; Demšar and Çöltekin, 2017), or prediction (Guo and Agichtein, 2010; Johnson et al., 2012; Huang et al., 2012; Navalpakkam and Churchill, 2012) to demonstrate the inaccuracy involved in correlation mouse tracking data to eye tracking data. In contrast, there is also active research that treats mouse tracking data independently (Navalpakkam and Churchill, 2012). There are also other rare studies that attempt to direct the eye gaze to the mouse cursor by restricting the user’s field of vision, thereby coupling the mouse tracking data with the eye tracking data (Tarasewich et al., 2005; Lagun and Agichtein, 2011; Maruya et al., 2015; Kim et al., 2017). Similar to eye tracking, mouse tracking is also conducted in the area of pattern recognition, search engine result page (SERP) (Rodden and Fu, 2007; Rodden et al., 2008; Guo and Agichtein, 2008; Huang et al., 2011, 2012; Lagun et al., 2014; Arapakis and Leiva, 2016), web evaluation and usability (Arroyo et al., 2006; Navalpakkam and Churchill, 2012; Manson et al., 2012), and education.
In the field of education, recent articles emphasize the need to record the time spent on a learning activity to obtain the user’s behavior patterns (Li and Tsai, 2017). Koh et al. (2018) emphasize the need to record the time spent on a particular section of a learning activity because the time spent on an entire page does not reflect the actual learning time given that the time spent on different sections varies. The authors further stated that mouse trajectories and scrolling can be used to determine the time spent on a particular section, although the capability of mouse tracking is more than simply being able to determine the time spent on a particular section. Mouse tracking can be used record the trajectories, velocities, and many other variables of the mouse’s cursor that can be used to measure many things including cognitive load (Rheem et al., 2018). However, the data generated by mouse tracking can be overwhelming to examine, although this is no longer a major problem because there are many data mining and visualization applications that can be used to extract meaningful information from the data of the users (Poon et al., 2017).
The earliest article on mouse tracking implementation was presented by Mueller and Lockerd (2001), whereas in e-learning, the earliest article was presented around the year 2012.
As shown in the preceding, there are many interesting works on mouse tracking; however, very few have investigated the implementation and resource costs associated with the process, which has caused implementers to doubt the feasibility of large-scale implementation. Huang et al. (2011) conducted a massive scale mouse tracking on Microsoft’s Bing search engine. By reducing the amount of mouse trajectories recorded their massive scale experiment succeeded. However, they only discussed the data analysis afterward and neglected to consider the resource costs. Leiva and Huang (2015) and Martín-Albo et al. (2016) addressed the issue but their discussion quickly shifted to the solution, which is primarily based on compression methods. To date, there are no articles, except this, that consider the resource costs of mouse tracking implementation.
The mouse tracking application developed by the authors was designed to run online and in real-time. Online means that the mouse tracking is run remotely via the Internet where the client runs the mouse tracking application when browsing a webpage and the associated data is sent to the server. Real-time means that the mouse tracking data is continuously sent by the client to the server during the mouse tracking process. Overall, this can be seen in the real-time online mouse tracking framework on Fig. 1.
The mouse tracking application developed as part of this work is a standalone application that can be implemented either on the server or on the client. In the case of the former, the mouse tracking code is incorporated into the webpage. A webpage mainly contains HTML, CSS, and JS. A more direct approach is to inject the mouse tracking code in the JS code. Another approach is to create a plugin for a certain content management system (CMS) or LMS. In this work, a Moodle mouse tracking plugin was developed, which can be in the form of an admin plugin, theme plugin, or a block plugin shown as shown in Fig. 2. A theme plugin usually applies to entire Moodle pages managed by the administrator while a block plugin applies to selected pages usually managed by managers and teachers.
The mouse tracking Moodle plugin was implemented on the authors’ laboratory server, which can be accessed on https://md.hicc.cs.kumamoto-u.ac.jp. The authors planned to publish the Moodle plugin on Moodle’s website in the future. To implement the application on the client, the mouse tracking code is incorporated into the browser’s code. This can be achieved by direct insertion or plugin installation. Fig. 3 shows a mouse tracking browser extension installed on Google’s Chrome Browser. The authors plan to publish the extension in Chrome stores and other online stores.
Implementation on a server is more efficient because the mouse tracking code is only installed on the server, whereas implementation on the client requires installation on each client. However, server implementation limits the mouse tracking process to the server’s website only. The authors were able to identify users as they visited or left the website but were unable to perform tracking once the users left the website. In comparison, client implementation facilitates the recording of every detail of the browser activity of users, including mouse tracking on all visited websites.
In this section, the main features of the real-time online mouse tracking application are reviewed in detail. The guide for writing the code is available on jQuery’s website (jsf, 2019). Tracking is divided into the main event logging and other information loggings. A simple keyboard logging was also implemented. For mobile devices, a mouse is rarely used, therefore tracking of scrolls, touches, and zooms is preferred. Fig. 4 shows a demonstration of the loggings that are available online (Authors, 2019a).
The real-time online mouse tracking application was installed on the author’s Moodle server. Three mouse tracking experiments were performed during which the clients participated in a ten-question quiz session on the server. The resource costs were then measured. The data rate of the network was measured using a tool called Wireshark. The default values for the CPU, RAM, and storage monitoring are available from the server’s operating system (OS) which is Ubuntu 18.04 LTS server. The server is equipped with an Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz (with SSE4.2) CPU, 32 GB of DDR4 RAM, 10 TB of hard drive, and an allocated 2 MBps network.
The first experiment was point-to-point (P2P) as illustrated in Fig. 6 where one client accessed the server directly without using the Internet. In this experiment, a laptop was directly connected to the server on an isolated P2P network to obtain clean data. The empirical data rate of one event (single click) was measured. As clean data were obtained, it was possible to derive a theoretical mouse tracking data rate. The other resource costs were not measured because the authors did not possess the necessary hardware, software, and knowledge to measure such small events.
The second experiment was a local experiment illustrated in Fig. 7 where five clients accessed the server through the Internet. This experiment was conducted inside the author’s laboratory. Five lab members including the main author tested the mouse tracking application and answered 10 questions during the quiz session. A resource costs profile of the five users was generated. There were no limits to the number of events per second that the clients were allowed to produce.
The third experiment was an overseas implementation illustrated on Fig. 8 where 44 clients in Mongolia accessed the server in Japan. Unlike the previous experiment, this was a real implementation where students from the School of Engineering and Applied Science, National University of Mongolia participated in a real quiz session on the server in the Human Interface Cyber Communication Laboratory, Kumamoto University. In this case, there was also no limitation in terms of events per second on the clients. Apart from determining the resource costs profile for the real quiz session, useful mouse tracking data was obtained. Although this work discusses the mouse tracking data and demonstrates some simple analysis, further analyses are out of the scope of this investigation.
|Data rate (kB)||3.11||3.14||3.14||3.2||3.2||3.22||3.25||3.29||3.43||3.56||3.64||3.72|
Table 1 shows the resource cost of the authors’ real-time online mouse tracking application for one event on the server, for a variety of information. This data applies to the P2P experiment in which the client performed one click on the Moodle server where mouse tracking was activated. As more information was included in the event, the data rate increased. The data rate revealed an increase of approximately 12 bytes when new information was added. This behavior is expected because an increase in the amount of information results in an increase in the post data size. For example, in Table 1 there was a significant increase in the data rate when the variable “date” and “content_url” were included because they contain more characters compared to other variables. The authors also attempt to measuring CPU and RAM activity but the change was negligibly small. Although the result is limited to this application only, similar results are expected for other existing applications.
The addition of new data does not appear to significantly increase the data rate; however, this addition will be consequential as the number of users increases, especially when they perform many activities. The rate of mouse tracking activities is measured in events per second or frequency in hertz (Hz). Although the frequency of mouse movement and scrolling is high, usually, the rate does not exceed 70 events per second or 70 Hz (Rheem et al., 2018). Based on the empirical data obtained from Table 1, it is possible to estimate the data rate of the soon to be implemented mouse tracking. The first step in this process is to determine the number of events generated by users per second. Then the data rate is identified in Fig. 9 and multiplied by the number of users. The results revealed that it is possible for 1 MB of data to be generated from mouse activity in one minute (Leiva and Huang, 2015). From Fig. 9, when a user constantly generates five events per second (5 Hz), the data generated can reach 1 MB in approximately one minute.
As previously stated, the real-time online mouse tracking application has a feature to limit the maximum number of events per second generated by a user. This can be set after allocating the network bandwidth of the mouse tracking process. Assuming that data are recorded for all available variables in mouse tracking if there are 22 users, and the network allocated to mouse tracking is 2 megabytes per second (MBps), the mouse tracking application should be limited to 25 events per seconds. However, this calculation is not realistic and is only relevant for measuring the worst-case scenario, whereby smooth implementation and not optimal resource usage is intended. This is because the events generated per second by users are dynamic and not static. For example, there are instances when the mouse cursor is not moved as the user stops scrolling to read. Likewise, there are instances when users move the mouse cursor and scroll to search for information. There are also occasions when users drag and drop objects during interactive activities. Consequently, users do not generate a fixed number of events per second.
|Statistical||Local Mouse Tracking Experiment||Overseas Mouse Tracking Implementation|
|Metrics||CPU (%)||RAM (MB)||Data Rate (kB)||CPU (%)||RAM (MB)||Data Rate (kB)|
To obtain more realistic data and to perform reliable calculation, profile measurements should be acquired. The main question that is considered in profile measurement is “how often do users move their mouse and scroll?” The profile measurement used in this work is resource monitoring during local mouse tracking processes by five users followed by statistical analysis on the time series data. Although the data required is the average events per second during the mouse tracking processes, it is more convenient to immediately measure the average resource costs.
Unlike the P2P experiments, the local experiment measures not only the mouse tracking but also all the other processes, which includes accessing the Moodle page and answering 10 questions. During this experiment, the CPU percentage Fig. 10, the RAM usage Fig. 11, and the data rate Fig. 12 were rarely zero, indicating that idle activity was uncommon. For the local experiment with five users, the CPU percentage usage was an average of 10%, the RAM usage was an average of 1.7 GB, and the data rate was an average of 51 kB. This indicates that there was a reserve capacity for more users. It should be noted that the initial CPU percentage and RAM usage by the OS were 0% and 1.4 GB, respectively.
An interesting result is shown in Fig. 12 for the data rate. During the quiz session at approximately 17:02:40-17:14:20 (11 minutes and 40 seconds, or 700 seconds), a table size of 6.1 MB with approximately 16287 rows (equivalent to 16287 events) and 17 columns were generated (note that the number of columns is less than the number of introduced variables in Table 1 because during this time, tracking for mobile device had not been developed. In addition, the total data transmission may not be equal to the size of the stored data on the database because of other factors such as the transmission methods, unoptimized application, and other factors apart from mouse tracking such as data transmitted when loading Moodle pages). Interestingly, the average events per second (16287 events divided by 700 seconds) was 23 or 23 Hz. If this data is plotted on Fig. 9, a result 78 kBps is obtained which is not far from the actual measurement of 51 kBps in Table 2.
Can a single mouse swipe produce hundreds of mouse coordinates (Leiva and Huang, 2015)? The answer is “yes,” is we examine the spikes in Fig. 12. The highest spike occurred between 17:07:28-17:07:29 when 98 queries per second were received by the database. If plotted in Fig. 9, the result of 335 kBps is obtained which is close to the actual measurement where the maximum data rate from Table 2 is 312 kBps (unfortunately, name and email identification was not available at the time. Therefore, the identity of the users and the number of them who performed a large number of events is unknown). This shows that the speculation seems to be justified in the case where users are able to generate a high number of events per second momentarily. In other words, more than one user can perform many activities; mainly, simultaneous mouse moves and scrolls. The upper spikes are very large, as Table 2 indicates that the difference between median and maximum is large compared to the difference between median and minimum. Moreover, the spikes indicate that a very high level of activities only occurs momentarily and not constantly. Network and server administrators observed that mouse activities potentially generate a large amount of data. However, there was concern that this high level of activity is constant. The presented results clearly demonstrate that high-level mouse activity is mostly temporary.
The authors were able to obtain mouse tracking data from 44 students in the School of Engineering and Applied Science, National University of Mongolia, for an online quiz session on the server in the Human Interface Cyber Communication Laboratory, Kumamoto University. Fig. 13 shows a screenshot of the mouse tracking data in form of a table. The table size is approximately 145 MB, containing 393585 rows and 22 columns. Are the rumors that mouse tracking produces a notoriously large amount of data true? The answer to this question is “yes.” A half year Moodle log data with a similar number of students was only approximately 300 kB, while the mouse tracking data represented in Fig. 13 was 145 MB after 3 hours and 30 minutes.
The mouse tracking data contains so much information that a separate report is required to discuss the characteristics of the data and the types of possible analysis. There are many different types of analysis and discussions on mouse tracking data and several examples were reviewed earlier. In this report, sample visualization based on heatmaps and the trails of the mouse tracking data is presented, as shown in Fig. 14. As expected, left clicks occurred more often for selection of the questions. However, there were also left clicks associated with some of the questions and the visualization shows that the left clicks were dragged. This can be interpreted as highlighting the questions by the students. Middle clicks occurred most frequently for question four; however, the reason for this occurrence is not clear. Right clicks were most common on the top of the page, where some students probably decided to explore the available features. As expected, there were numerous trails such that it seemed that there were too much to visualize all at once. The heatmap indicates that most of the students placed the mouse cursor on the questions and choices. There were also few students who placed the mouse cursor outside the questions. Probably, these were individuals who preferred to keep the mouse cursor away from the text while reading. Further analyses are outside the scope of this work.
Similar to the local mouse tracking experiment with five users, the resource costs were also measured, allowing us to determine whether large-scale implementation is possible or not. Although 44 students attempted the quiz, the session was divided into two sessions and each session contained only 22 students. The students were informed that the first session would start at 12:00, followed by a break at approximately 14:00. The second session started a few minutes later and finishes at 15:30. As such, the entire process took 3 hours and 30 minutes (12600 seconds). The three Figures Fig. 15, Fig. 16, and Fig. 17 seems relevant to the informed schedule where a decrease in the graph was observed at 14:00 for a few minutes. The number of events generated during this time (12600 s) was 393585. The average number of events per second was 31 (393585 divided by 12600) or 31 Hz. When 31 Hz is plotted in Fig. 9, the result of 115 kB is obtained, which is close to the measured average data rate of 105 kB in Table 2.
Is large-scale mouse tracking implementation possible? This is possible if resource usage is balanced and distributed. Implementing real-time mouse tracking is a better choice than implementing non-real-time tracking. For example, if mouse tracking is first accumulated and subsequently submitted all at once, this will cause a bottleneck at the server. Fig. 15, Fig. 16, and Fig. 17 would not show constant usage but would show idle activities at the beginning, which would become constantly high in the middle. This is arguably an inefficient use of resource. Real-time implementation helps to evenly distribute the transmission of the mouse tracking data.
Compared to the local mouse tracking experiment with the five users, the resource costs are expected to increase because more users (22) were involved in this implementation, but there were more unexpected findings. The unexpected aspect is that the standard deviation is very high. As such, not only are there many positive spikes, but there are also many negative spikes, which further indicates that the number of events per second generated by the users is very dynamic. It should be noted that there was no limitation on the number of events per second that the students were allowed to generate. Based on Table 1 and Fig. 9, the data rate should increase in excess of 5 MBps for the worst-case scenario where 22 students simultaneously generate 70 events per second. However, this scenario never occurred as shown in Fig. 17, indicating high dynamics, and the very low probability of the worst-case scenario.
In Table 2, not only does the standard deviation increase indicating high dynamics, the distance between the median and maximum also increases, as represented by the taller spikes. The highest spike occurred at 14:28:40 when 228 events were submitted to the server and surprisingly, this was attributed to only two users. This occurrence either contradicts the assumption of the authors that a user can generate up to 70 events per second or there was a delay in transmission, and the submitted events were incorrectly aggregated. When 228 events per second are plotted in Fig. 9, the result 849 kBps is obtained, which is close to the actual measurement in Table 2, where the maximum data rate during this implementation was 837 kBps.
The first conclusion is that the online mouse tracking application was successfully implemented. The overseas quiz that was session monitored with real-time mouse tracking at the National University of Mongolia to Kumamoto University was successfully conducted and at present, mouse tracking is still running on the server. The mouse tracking data containing mouse clicks, mouse movements, and mouse scrolls was obtained, but the analysis of this data will be challenging because of the large size. Additionally, this demonstrates the possibility of tracking on a mobile device using scroll, touch, and zoom events.
Are the rumors concerning high resource cost in mouse tracking true? Can a single swipe generate hundreds of mouse coordinates? Does mouse tracking over a minute generate in excess of a megabyte of data? Based on the result of this investigation, the answer to these questions is “yes.” In that case, is mouse tracking implementable on a large scale? The answer is also “yes.” One server with its specification highlighted in the Experiment and Implementation section was able to handle a classroom of users, and resource monitoring showed that there was much reserve capacity. Other institutions or corporations should have no difficulty in implementing mouse tracking because they typically have big data centers (large number of networks and servers, and distributed resources). For example, a corporation such as Google should not encounter difficulties, although this might be different for technologically challenged entities.
The second conclusion is that mouse tracking is implementable if resource usage is distributed. In this work, the mouse tracking data were transferred in real-time to evenly distributed resource usage, instead of aggregating the data and transmitting them together, which may cause bottlenecks. Unfortunately, the nature of mouse tracking is such that it is difficult to predict. As such, it is challenging to determine resource allocation. The data acquired as part of this work showed the high dynamic characteristic of mouse activities, as reflected in the high standard deviations observed during monitoring of resource usage. When 22 students attempted the quizzes, the resource usage peaks were very high, but only temporarily. This was identified as spikes. Both upper spikes and lower spikes were observed, where upper spikes indicate momentary high-level activity and lower spikes indicate the opposite.
If the amount of available resource is limited, then the resource cost of mouse tracking can be reduced. The mouse tracking application developed in this work can limit the number of events per second or frequency. Additionally, it can exclude unnecessary data. Moreover, even prior to this mouse tracking resource usage investigation, research on the compression of mouse tracking data already existed.
This opens many paths for future work. Although real-time implementation assisted in the distribution of resource usage, the characteristics of the resource usage data showed how mouse tracking can potentially destabilize the system. The use of load balancing techniques can help stabilize the implementation. To achieve the minimum system requirement for mouse tracking, more experiments with different machine specification needs to be conducted. In addition, resource measurement on the client-side needs should be conducted to achieve the minimum system requirement for the client. Even though the developed mouse tracking application was able to limit activity level recording, the settings are still manually inputted. Adaptive settings are required for optimal usage. Although it was useful to conduct overseas implementation, more users and longer implementations are required to further evaluate the viability of real-time online mouse tracking.
The authors are very grateful to Muhammad Bagus Andra, Hamidullah Sokout, Irwansyah, and members of the School of Engineering and Applied Science, National University of Mongolia, for participating in the experiment. The authors would also like to thank Masayoshi Aritsugi, Hendarmawan, Hamidullah Sokout, Alhafiz Akbar Maulana, and Sari Dewi for inspiring this research topic. A special thanks to Muhammad Bagus Andra and Ni Nyoman Sri Indrawati for suggesting some interesting ideas. The authors would also like to thank Fahd Ouassarni for providing suggestions with respect to compressing the mouse tracking application codes. Finally, the authors would like to thank Alvin Fungai for initiating this research and for his assistance in proofreading.
Part of this work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research 19H1225100 and 15H02795.
The authors declare that they have no conflict of interest.
The datasets generated and/or analyzed during the current study are available in the Mendeley repository titled ’Data for: Implementation of Real-Time Online Mouse Tracking Case Study in a Small Online Quiz’.