Augmented reality emotion recognition for autism spectrum disorder children [version 1; peer review: awaiting peer review]

Autism spectrum disorder (ASD) is a lifelong neurodevelopmental disorder that affects brain development. The prevalence of ASD is one in 68 children. Low social motivation is the main cause in developing social communication skills deficiency. As a result, it is becoming difficult for them to express themselves, to be able to manage social interactions, and they lack the ability to comfort others and even share their own feelings. This study aimed to design a mobile application based on augmented reality (AR) focusing on social interactions and communication aspect for children with ASD. The scope is in emotion recognition, which makes use of emotional icons to help them improve their social skills, more specifically on helping them to recognize various emotions. The emotions are represented by emojis inspired by Dr. Paul Ekman who has created the basic six emotions, namely happiness, sadness, disgust, fear, surprise, and angry. Additional emotions such as confound face, winking with tongue, cold sweat, blowing kiss, flushed, sleepy, disappointed, and winking are included. AR is able to gage the children’s attention when they view the animated emojis. The application enables the children with ASD to be more willing in recognizing different emotional expressions and improve their social skills by expressing their own feelings. The scope of the study is limited to emotion recognition. It is developed based on literature reviews without guidance of any certified ASD specialist. AR is an interactive technology that places digital information in our physical world in real time, providing precise registration in all three dimensions. Existing literature proved that the traditional face-to-face teaching methods have failed to increase the interest and ability of ASD children because the teacher has full control in the classroom. This study adds value to the existing works to incorporate AR as additional intervention in treating ASD children.


Introduction
Children with autism spectrum disorder (ASD) have difficulty learning to understand, recognize, and manage emotions. This challenge has to be embraced as learning to understand emotions can help ASD children to respond to other people. Traditional teaching methods such as using everyday interactions and tools like emotion cards and social stories lack interactive elements to provide immersing experience and interest for children with ASD. This has caused them having difficulty to stay motivated and focused during learning, thus making them lose their interests quickly and stop learning. Hence, their potential development in proper study and communicating with others will be reduced or delayed due to the non-effective way of teaching.
Current intervention methods for children with ASD are limited. These intervention methods are mostly based on in-person or social skill-group behavioral therapies done by professional therapists. The therapy is normally once every week over a time frame of at least 12 weeks. During the session, the therapist teaches special social skill and conducts activities such as role playing and discussion. The practice presents an opportunity for the ASD children to rehearse the social skills repeatedly. However, it is usually difficult to be accessed by parents or caregivers due to lack of talents in this field. Moreover, these intervention methods are usually costly, which most families cannot afford.
Technology has played a pivotal role in facilitating learning for the ASD children. It is well received and easily adapted since most of the applications are in the form of tablets. The emerging augmented reality technology has garnered attention especially children due to its immersive and engaging experience. Although augmented reality (AR) applications are widely used in multiple fields such as medical training, tourism industry, and military, there are fewer AR applications that are targeted or developed for children with ASD. Often, these AR applications lack therapeutic effects that help them to familiarize themselves in various social situations.
AR is a technology that still on its early development; therefore, it is hard to find practical AR applications that focus on helping children with ASD in recognizing emotions. These applications are mostly on its preliminary period or only for research purposes. Hence, the purpose of this research is to create a mobile-based application making use of AR to enable children with ASD to learn emotions in a more intuitive way.
The contribution of this research is the proposed of a friendly user interface design integrate with AR functionality as an excited motivation to retain the ASD children's learning time span. This paper is divided into several sections. Section 2 introduces the key concepts related to ASD and we have provided a review of a couple of existing applications. Section 3 presents the methodology and the proposed application screen designs are shown in Section 4. Finally, Section 5 presents the conclusion and its limitations.
Literature review ASD is a complex brain development disorders that cause difficulties in serious developmental aspects such as social skills and behavioral challenges. The prevalence of ASD is one in 68 children. It affects people to behave differently from average people from communicating and interacting to learning.
It was suggested that low social motivation is the cause for children with ASD having an issue in developing social communication skills as they hardly focus on social information such as other people's emotional behavior and faces. 1 Moreover, they are also less likely to express their feelings such as pleasure and happiness when interacting with others. Children with ASD often struggle in understanding and responding to emotion in other people's facial expressions. 2 As a result, it is becoming difficult for these children to manage their social interactions even though they genuinely wanted to develop relationships. 3 Children with ASD may also lack the ability to comfort others or share their own emotions. Sometimes, they may interpret a situation incorrectly and respond with unintended behavior or emotions. 4 Communication has been the biggest challenge for parents with children with ASD as they lack the ability to respond and communicate appropriately with their parents. 5 The growing number of children with ASD has definitely effected a modern educational systems trend. 6 Face-to-face teaching method has failed to increase the interest and ability of children with ASD because a teacher is having full control in the delivery. 7 Eventually, AR has been introduced to provide an alternative intervention for children with ASD. 8 AR has been around for decades and by fundamental terms, it is the concept of blending the real and virtual world. It is an interactive technology that places digital information in our physical world in real time, providing precise registration in all three dimensions. 9 It is also worth mentioning that AR is considered a revolution and originated from virtual reality (VR). The basic difference is that AR is based on the camera field, while VR is not. Through AR, new possibilities for educational learning have emerged as the capability to combine the real and virtual world may further improve the quality of provided education. 10 In this way, the engagement for children with ASD can be further enhanced and their motivation and learning interest can also be increased. 11 There are quite a number of prototypes and applications proposed by researchers to improve emotion recognition targeted at ASD children. An intuitive augmented and alternative communication application was created to assist children with ASD in expressing their own emotions to their parents. 12 Users can choose an emoji that represent their current emotion or needs. Text and audio that represent the emoji will be shown and played when user taps on the emoji. Users can send their message to the selected contacts. The sent contacts will receive a speech output message that represents the user's current emotion or needs. It uses emoji in a user-friendly way to enable children with ASD to express themselves easily through the application. Also, parents can understand the emotions and needs of their children more easily and act accordingly in an efficient way through recorded feedbacks and actions in the application.
Children with ASD struggle to understand other people's emotions and express their own emotion in the same way including recognizing facial expression. AR-based self-facial modeling (ARSFM) is a system, which users can see their own virtual three-dimensional (3D) facial expressions through an augmented mirror. 13 3D head models are first created using users' frontal and side face photo in 3D Facial Studio 3.0. Next, head models are animated in 3Ds Max 2012 to create six facial expression models that correspond to our basic emotions according to the Facial Action Coding System, namely, happiness, sadness, fear, disgust, surprise, and anger. These models are then finished with Unity game engine and Vuforia. To use the ARSFM system, a therapist will guide the users. Users are required to read through a scenario scripts and choose to wear one of the six facial expressions' masks that is matched to the scenario. In the augmented mirror, users will see the pre-animated facial expression overlay on their own face to indicate the correct facial expression matching to the scenario.
An AR-based video modeling storybook was created to teach users to recognize emotions' pattern in the storybook and then watching the story's video clips with a tablet. 14 Users are required to concentrate on the corresponding social stimulus and mimic facial expressions and emotions. They are also encouraged to pretend to feel those emotions. By observing facial expressions, it is useful in teaching children with ASD to practice recognizing, understanding, and responding to facial emotions more appropriately in daily social scenarios.
Facial Emotion Expression Training (FEET) was developed to provide feedback to users in real time by detecting facial expressions and emotion problems. 15 With FEET, users have to proceed through four levels: level one displays a simple cartoon face with basic emotions, level two displays a short recording of child actor showing emotions, level three displays only facial expression corresponding to a narration of short story mixing with background music, and level four displays an avatar showing an emotion. Every level, users are required to mimic the facial expression of the displayed contents using their own perception of the feeling, and feedback will be given by FEET throughout the session. The proposed system is able to accurately detect and categorize emotion in facial expression in real time and also improve facial expressions' recognition of children with ASD.
An AR puzzle-style application was created, which required users to interact with a tablet device with a frontal camera. 16 There are two stages: a customization stage and an emotion guessing stage. In the customization stage, users create a emoji based on their own preferences with the help of various AR markers representing different emotions, colors, and shapes. After users are satisfied with their emoji, they will proceed to the second stage that requires them to guess the facial expression of a different emoji using the markers. Constructive feedback will be given when users accomplish the guessing, or users will be redirected to the right answer. With a design of clear visual clues, the AR puzzle-style application aimed to improve emotion recognition skills of children with ASD and keep their attention without any confusion.

Methodology
Humans experience the six basic emotions and these feelings are represented by emojis inspired by Dr. Paul Ekman, the psychologist who has created the basic six emotions in the Emoji Recognition Chart in the 1970s, namely, happiness, sadness, disgust, fear, surprise, and angry. Humans also experience more complex feelings such as embarrassment, guilt, shame, trust, and anticipation. Hence, we have also included additional complex emojis such as confound face, winking with tongue, cold sweat, blowing kiss, flushed, sleepy, disappointed, and winking.
The AR feature is embedded to motivate the children with ASD to experiment with an array of emotions. The user needs to set a predefined area to trigger the AR feature. Indirectly, this enables the user to practice and reinforce the motor skills in setting the boundaries of the plane. The scripting in this research is based on C# programming and Unity software. Hence, the technical requirement to run the application is to use a touch-based device with an Android version 5 and a camera with minimum of 3 megapixels.
For the system evaluation, a preliminary assessment was carried out on ten subjects. The objective was to assess the usability of the application in terms of their perception using a survey. There are three steps involved during data collection. Firstly, we showed the respondent the mobile application and spent approximately 10-15 min instructing them. The respondent then spent another 15 min exploring the application. Finally, we conducted a survey by providing an online questionnaire. The whole process took 30-40 min to complete. In the questionnaire, there are altogether nine questions rated according to a 5-point Likert scale. Due to inaccessibility to children with ASD during the COVID-19 nation lockdown, the ten subjects were neurotypical children who were randomly chosen aged between 6 and 12 years. The respondents were fluent in using smartphone technology, and this reduces unnecessary risk in accessing the application.

Results
Based on the research gaps found through literature review, a solution of AR emotion recognition for ASD children was proposed. The prototype is shown in Figures 1 to 5. In the Main Menu page, users could choose to learn, trigger the AR feature, or play the quiz.
As shown in Figure 1, users can learn the basic emojis and emotions through 2D emoticons. The purpose was to ensure that user understands and recognizes the emoji facial features, such as smiling happily. The emoticons available are based on emoji recognition, namely, happiness, sadness, disgust, fear, surprise, and angry. More emoticons are included, which have slightly complicated emotions such as confound face, winking with tongue, cold sweat, blowing kiss, flushed, sleepy, disappointed, and winking.
In Figure 2, AR is the main element of the proposed prototype. It uses AR functions to enable users to learn emotions and interact with emojis. It has two modes: single mode and multiple mode, one with single types of emoji and the other are combined with multiple different emojis. In the single mode, the user will first be shown a panel (pop-up message), which instructs them how to use the AR functions. After the panel is dismissed, the PLANE TOGGLE button and DETAILS button are shown. Users can instantiate the emoji by touching the screen. Also, they can interact with it by selecting it, which a transparent bounding box will be shown, and then users can scale, move, and rotate the emoji.
Users first click on the Dismiss button and move the smartphone device around the floor/table/carpet to draw up a boundary for the plane. Once the plane area is set, users just need to instantiate the emoji that you would like to see. In this case, the happiness emoji is chosen. You can keep on instantiate Happiness emoji and it will keep on appearing. When the user taps on the Happiness emoji, a message would appear telling the user the attributes of Happiness emoji, such as it is a smiling facial expression, it means a relaxed mood, and it shows a pleasant way of speaking. In the multiple mode, users will also be shown a panel (pop-up message), which instructs them how to use the AR functions. After the panel is dismissed, the PLANE TOGGLE button and EMOJI OPTION buttons are shown. Users can instantiate different types of emoji by touching the screen. Also, they can interact with it by selecting it, which a transparent bounding box will be shown, and then users can scale, move, and rotate the emoji. Users can also take photo of their favorite emoji or emoji that represents their emotions to be saved in an album.
The album stores all the photos that users have taken in AR mode. Users can view their collections here and can also delete photos, which they do not want to keep. Parents also can view these photos to perhaps know their children's emotional state. These photos are kept in the internal storage of the phone. Users can do the quiz once they feel ready to challenge themselves. Users can challenge the quiz and collect an emoji trophy. Trophies awarded are stored in the Trophy room as shown in Figure 5. The Quiz menu has the main elements such as the PLAY icon, TROPHY ROOM icon, INFO icon, and  RESET icon. Users need to finish the quiz in one shot to collect all trophies. Once wrong, users will be prompted a wrong window and auto quit to Quiz menu. If correct, users will be prompted to a page, which showcases a rotating 3D emoji trophy that they have collected. The collected emojis are automatically stored in the Trophy room.
The initial Trophy room is empty and emoji trophies will be added once users have completed each question. The emoji in the Trophy room is animated in different ways to reflect the real-world actions based on that emotions. Various emotions will be displayed in the Trophy room after completing all quiz questions.
The preliminary evaluation examined the children's perception of the application. In terms of whether the application is fun, six of them agreed, while one disagreed, and three felt neutral. In assessing whether the respondents feel that the  applications help them to understand emotions better, six agreed, two disagreed, and the other two were neutral. In terms of learning contents, 30% of respondents think that it is useful. 90% of the respondents agreed that the application interface was very easy to use and user friendly, while 80% of them were excited by the AR feature. Regarding the photo feature, 60% of the respondents enjoyed using it, while eight of them thought that the quiz was fun. Six respondents agree that they like the Trophy room feature, and 80% of respondents will recommend this application to others. The summary of the results is shown in Figure 6.
As a summary of the preliminary evaluation, we obtained a mean value of 3.98 and standard deviation of 0.9757. We can conclude that overall, the respondents are positive about the application's usefulness.

Discussion and conclusion
The respondents have shown a positive perception toward the application. This is due to the reason that they felt that the application is helpful in facilitating ASD children to recognize emotions easily. In terms of fun, since the respondents are neurotypical children, they might be more adept in using the emojis in their life when sending messages through WhatsApp or Telegram as compared to the ASD children. Hence, the emojis do not appear as new or enticing to them. However, the respondents feel that the application is user friendly. This aspect is important as a user-friendly tool will not burden the ASD children. One of the characteristics of ASD children is their inability to express their emotion or they do not feel for an emotion; hence, a novel approach created in the application enables the ASD children to share their emotion with their parents or peers through sharing of pictures. In addition, we believe that the AR feature is interesting and engages the children's motor skills at the same time as relaying the emojis over the fixed physical space.
AR is able to engage the children's attention when they view the animated emojis in the self-defined plane area. The scene can be snapshot to be saved in the photo album. The application enables the ASD children to be more willing in recognizing different emotional expressions and improve their social skills by expressing their own feelings. The scope of the study is limited to emotion recognition. It is developed based on literature reviews without guidance of any certified ASD specialist. AR is an interactive technology that places digital information in our physical world in real time, providing precise registration in all three dimensions. Existing literature proved that the traditional face-to-face teaching methods have failed to increase the interest and ability of ASD children because the teacher is having full control in the classroom. This study adds value to the existing works to incorporate AR as additional intervention in treating ASD children.
Author contributions H.F.-N. is the corresponding author and is the supervisor for this project. She has contributed to writing and editing the research paper. C.C.-Teo is responsible for proofreading the research paper, while Y.Q.F. has contributed in application development and original draft preparation.  Ethics approval and consent to participate The conduct of this research has obtained the ethical approval number EA0872021 issued by the Research Ethics Committee, Multimedia University. Written informed consent from the respondents' guardians was obtained prior to the data collection process.