TY - JOUR T1 - Using computational visual attention models in evaluating audience attention in educational multimedia TT - استفاده از مدل‌های محاسباتی توجه بینایی در ارزیابی توجه مخاطب در چندرسانه‌ای‌های آموزشی JF - icss JO - icss VL - 24 IS - 3 UR - http://icssjournal.ir/article-1-1487-en.html Y1 - 2022 SP - 88 EP - 104 KW - Visual attention KW - Educational multimedia KW - Saliency map N2 - Introduction With the advancement of knowledge and technology, multimedia service for education has developed. The cognitive theory of multimedia learning proposes the principle that helps multimedia designers and e-learning in produce optimal textual, graphic, visual, and auditory presentations. Each principle is based on comparingthe results of multimedia learning research in different situations and determines how much each is effective in bettering students' learning. The current research used five principles of principles mentioned in the Mayer multimedia learning book. In recent decades, many scientific studies have been aimed at modeling computational mechanisms in concentration orientation. One of the most overlooked issues is the audience's attention to adaptingthe produced multimedia to their original design goals. In this research, using visual attention models, this study try to predict the audience's attention to increase the educational impact of these videos by providing scientific solutions to improve the quality of educational multimedia. Computational visual attention models of human brain-inspired can fill this gap and assist as a powerful tool for evaluating educational multimedia audience points of interest. The models used in this study are the four types of bottom-up visual attention with the best performance (6, 10, 20, 23), effectively improving and enhancing the educational quality in multimedia production. Methods In this study, the tests using the bottom-up visual attention models on educational multimedia based on five principles of the 12 Mayer principles (Five principles of Mayer are: Coherence principle to remove sub-elements and unnecessary elements, the Signaling Principle for signs to make the essential elements more salient, Redundancy principle through graphics and narrative to learn better, Spatial resolution providing words and images closely together on one page, and Temporal resolution providing words and images close together at the same time). Besides, the results were examined based on each model and type of experiment. Multimedia was evaluated to determine the locations of the audience's attention by the eye-tracking, and its results were used to evaluate the accuracy of the forecast results by computational models of visual attention. This multimedia educational study designed with the help of the computational visual attention models is evaluated to determine what parts the human user will pay attention to when watching the video. Using computational models of visual attention, the current study attempt to identify the locations of the audience's attentionand compare them with the data obtained from an eye-tracking system to increase its impact on the audience and improve the quality of the produced multimedia. Visual attention computational models are typically confirmed compared to the eye movements of human observers. Eye movements convey essential information about cognitive processes, such as reading, visual search, and scene perception. Accordingly, this study assumes that there is a model that produces a saliency map S, then compares and evaluates it with eye movements G (or fixation by the human eye). The evaluation of criteria used (including AUC, LCC, NSS, and SSIM) to determine the accuracy of predicting the locations determined by the model with the audience's locations. Each of the computational models of visual attention is implemented and evaluated based on the visual saliency obtained from the multimedia produced based on the first five principles of Mayer. The values obtained are based on the above criteria in the work method for each shot (the sum of consecutive frames to create a scene). Notably, the multimedia dimensions are 640*480 pixels and 30 frames per second. Results The ground-truth saliency map (GSM) for each frame was obtained by the three participants (visual health), who viewed a five-minute video (30 frames per second) in free observation mode. The video contains 48 shots and 5069 frames. This way, wherever the audience saccades on the point, it is the point of fixation in GSM. Then, for a more significant adaptation to the human vision region, a Gaussian filter is applied to it, and the final GSM is produced for each frame. Some features were used, including color, intensity, orientation, and motion (obtained from the optical flow method). Furthermore, in some scenes, around some areas of the image, no visual attention is attracted and all the user's attention to the central part is for two reasons; The first reason is the lack of information in some areas of the image (center-surround difference), and the second reason is that individuals focus on the center of the image (center bias). To examine the signaling principle in which the designer aimed at attracting the viewer's attention to the expression, the user's attention is attracted to the desired mark. The reasons for the audience's attention are the motion of the mark and the center-surround difference. These demonstrated that by better marking, the audience can be better attracted to the desired location and increase the learning effect. The two consecutive frames in which the letters appear in time and place attracts visual attention, which is the effect of the object's motion in the two consecutive frames and follows the audience's desired content based on the spatial and temporal resolution principle. When in the principles of signaling, coherence, and redundancy, the purpose of the multimedia designer is to attract the audience's attention to parts of the text and image. However, due to the lack of difference between the background and the thinness of the text and the sign, the audience's visual attention is not well received, and using visual attention models, changes such as creating a difference with the surroundings, proximity to the center, and motion the frames are made that attract better attention. Overall, evaluated frames with the saliency maps of each model used for each shot, the highest adaptation to human eye data. In addition, the best performance in frames with a center-surround difference on the Itti et al. model (10), in the center bias frames of the GBVS model (23), and the frames with motion, and the same superpixels are the Wang model (20). Conclusion This study used models available in bottom-up visual attention to improve educational content. According to the obtained results, the computational models of visual attention can be used as a criterion for predicting the attractive vision of the audience. Although the models do not have the same results, they have a good performance and are close together in predicting areas of human attention, it is essential to increase the quality of educational multimedia using the possibility of these models. Using visual or combination models of vision or combination of the audience can be anticipated, and by putting the necessary content in the pre-explained location by the model of visual attention, quality, and multimedia impact can be increased. As can be seen in the experiments, a model alone does not have high efficiency in all cases, and the models have different results depending on the type of algorithm and the features used. By combining mentiond models, they can help improve the ultimate performance. In the following, the current research strives to improve the values of these criteria by improving the above-mentioned model or creating a new computing model to adapt to actual data. Correspondingly, examining brain signals and the effect of attracting attention methods on cognitive reduction, as well as the provision of software that can be used to evaluate the multimedia education produced can be considered for future tasks. Ethical Considerations Compliance with ethical guidelines This article is taken from the Master's thesis of the first author. The present study was conducted in accordance with ethics such as the confidentiality of the participants and the participants were free to leave the study. In this study, sufficient information on the research is provided and the results of honest, accurate and complete research have been published. Authors' contributions Majid Shabani: This article is taken from the Master's Thesis of the first author that was responsible for researching, implementing, and analyzing the results. Alireza Bosaghzadeh: Corresponding author guiding implementing the research and reform of the article. Reza Ebrahimpour: Provided guidance in the method of working and data analysis. Seyed Hamid Amiri: Guiding implementing the research and reform of the article. Keyhan Latifzadeh: Responsible for researching and collecting samples. Funding This article was supported by the Cognitive Sciences & Technologies Council with Research Code 6880, approved 08/10/1397, and the research project of Shahid Rajaee Teacher Training University with contract number 39110. Acknowledgments The authors thank all the participants in this study, who had ongoing participation, and respectable professors who provided guidance and advice. Conflicts of interest The authors declare no conflicts of interest. M3 10.30514/icss.24.3.88 ER -