Therefore, it is necessary to create methods based on natural, unstaged presentations of emotion. In order to meet this need, researchers have increasingly focused on real-world datasets.
In the ICML 2013 Challenges in Representation Learning [64 (link)], the Facial Expression Recognition 2013 (FER-2013) [65 ] database was first introduced. The database was built by matching a collection of 184 emotion-related keywords to images using the Google Image Search API, which allowed capturing the six fundamental and neutral expressions. Photos were downscaled to 48 × 48 pixels and converted to grayscale. The final collection includes 35,887 photos, most of which were taken in natural real-world scenarios. Our previous work [56 (link)] used the FER-2013 dataset because it is one of the largest publicly accessible facial expression datasets for real-world situations. However, only 547 of the photos in FER-2013 depict emotions such as distaste, and most facial landmark detectors are unable to extract landmarks at this resolution and quality due to the lack of face registration. Additionally, FER-2013 only provides the category model of emotion.
Mehendale [66 (link)] proposed a CNN-based facial emotion recognition and changed the original dataset by recategorizing the images into the following five categories: Anger-Disgust, Fear-Surprise, Happiness, Sadness, and Neutral; the Contempt category was removed. The similarities between the Anger-Disgust and Fear-Surprise facial expressions in the top part of the face provide sufficient evidence to support the new categorization. For example, when someone feels angry or disgusted, their eyebrows will naturally lower, whereas when they are scared or surprised, their eyebrows will raise in unison. The deletion of the contempt category may be rationalized because (1) it is not a central emotion in communication and (2) the expressiveness associated with contempt is localized in the mouth area and is thus undetectable if the individual is wearing a face mask. The dataset is somewhat balanced as a result of this merging process.
In this study, we used the AffectNet [24 (link)] dataset to train an emotional recognition model. Since the intended aim of this study is to determine a person’s emotional state even when a mask covers their face, the second stage was to build an appropriate dataset in which a synthetic mask was attached to each individual’s face. To do this, the MaskTheFace algorithm was used. In a nutshell, this method determines the angle of the face and then installs a mask selected from a database of masks. The mask’s orientation is then fine-tuned by extracting six characteristics from the face [67 ]. The characteristics and features of existing facial emotion recognition datasets are demonstrated in