<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.159688.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Analyzing why AI struggles with drawing human hands with CLIP</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sarkar</surname>
                        <given-names>Meghna</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Chatterjee</surname>
                        <given-names>Siddhartha</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hazra</surname>
                        <given-names>Sudipta</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0009-0006-3083-3646</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sinha</surname>
                        <given-names>Anurag</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1034-6334</uri>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Reza</surname>
                        <given-names>Md. Sazid</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Shah</surname>
                        <given-names>Mohd Asif</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <uri content-type="orcid">https://orcid.org/0009-0000-2821-5423</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a6">6</xref>
                    <xref ref-type="aff" rid="a7">7</xref>
                    <xref ref-type="aff" rid="a8">8</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Information Technology, Kalyani Government Engineering College, Nadia, West Bengal, India</aff>
                <aff id="a2">
                    <label>2</label>Department of Computer Science and Engineering, College of Engineering and Management- Kolaghat, Kolaghat, West Bengal, India</aff>
                <aff id="a3">
                    <label>3</label>Department of Computer Science and Engineering, Asansol Engineering College, Asansol, West Bengal, India</aff>
                <aff id="a4">
                    <label>4</label>Department of Computer Science and Engineering, ICFAI University Jharkhand, Ranchi, Jharkhand, India</aff>
                <aff id="a5">
                    <label>5</label>Department of Computer Science &amp; Engineering, Rajshahi University of Engineering &amp; Technology, Rajshahi, Bangladesh</aff>
                <aff id="a6">
                    <label>6</label>Centre for Research Impact &amp; Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India</aff>
                <aff id="a7">
                    <label>7</label>Department of Economics, Kardan University, Kabul, Kabul, Afghanistan</aff>
                <aff id="a8">
                    <label>8</label>Division of Research and Development, Lovely Professional University, Phagwara, Punjab, India</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:m.asif@kardan.edu.af">m.asif@kardan.edu.af</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>2</month>
                <year>2025</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2025</year>
            </pub-date>
            <volume>14</volume>
            <elocation-id>193</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>25</day>
                    <month>1</month>
                    <year>2025</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Sarkar M et al.</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/14-193/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>Artificial Intelligence (AI) has made significant strides in various domains, but generating realistic human hands remains a challenge. This study explores the limitations of AI in capturing the fine details and proportions of hands, using Contrastive Language Image Pretraining (CLIP) as a case study.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>Our analysis reveals that CLIP struggles to accurately represent hands due to inadequate training data, anatomical complexities, and practical challenges. We conducted a series of tests and analyses to identify the primary causes of CLIP&#x2019;s difficulties.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>Our results show that CLIP&#x2019;s struggles stem from data biases and insufficient anatomical representation in training datasets. Specifically, we found distorted finger relationships, inaccurate proportions, and deviations from expected hand geometry.</p>
                </sec>
                <sec>
                    <title>Conclusion</title>
                    <p>This study aims to provide a comprehensive examination of the current limitations and propose possible directions for future research. By leveraging CLIP for evaluation, control algorithms for structure enforcement, DALL-E for generation, AR for gesture tracking, and 3D modeling for anatomical accuracy, we can overcome the challenges of generating realistic human hands and advance AI&#x2019;s capabilities in artistic creativity</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Contrastive Model</kwd>
                <kwd>Control Algorithm</kwd>
                <kwd>CLIP</kwd>
                <kwd>AI</kwd>
                <kwd>DALL-E</kwd>
                <kwd>AR</kwd>
                <kwd>3D.</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec5" sec-type="intro">
            <title>1. Introduction</title>
            <p>This AI is like a human trapped in a museum from its birth. It learns perceiving things from limited resources, the web. It cannot roam in real world like human, so does not have the ability to analysis thing from real world scenario. AI knows how hands looks but not how hands work. The human hand is an intricate and versatile organ, capable of performing a wide range of tasks with precision and dexterity. Despite the advancements in AI and machine learning, accurately modelling and predicting the movements and interactions of human hands remain a significant challenge. This thesis delves into the reasons behind these difficulties and explores the implications for various applications, including robotics, virtual reality, and healthcare. By learning to associate images with text, an artificial intelligence model known as CLIP (Contrastive Language&#x2013;Image Pretraining) is able to understand and generate visuals in response to textual inputs. When used in conjunction with generative models like DALL-E or diffusion models, CLIP produces images that closely resemble the input descriptions and performs well in AI drawing applications. This function allows for the production of images that follow specific prompts or styles. Although because of its large training set, CLIP might not be able to produce features that are very accurate or realistic, it can help by showing intricate items like human hands. we can systematically address these challenges by using CLIP for evaluation, control algorithms for structure enforcement, DALL-E for generation, AR for gesture tracking, and 3D modeling for anatomical accuracy.</p>
        </sec>
        <sec id="sec6">
            <title>2. Related work</title>
            <p>In The &#x2018;bad hands&#x2019; phenomena, which artificial intelligence is responsible for creating, emphasizes the value of human creativity and media literacy. Artificial intelligence has advanced to the point where these skills are no longer required, but educators and artists can still employ &#x2018;bad hands&#x2019; to push the limits of machine learning and redefine humanity in algorithms.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> Hand surgery requires precise techniques due to the hand&#x2019;s complexity. Generative AI (GenAI) can enhance this by analyzing data, creating detailed simulations, and personalizing procedures, potentially reducing complications. This review explores how GenAI could improve hand surgery, leading to better patient outcomes and setting new standards in the field.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> The Cascaded Deep Graphical Convolutional Neural Network (DCGCN) framework outperforms state-of-the-art models in accuracy and computational cost for 2D hand pose estimation in AI applications.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> Optimizing human hand gestures for AI systems reduces error rates and effort while maintaining the original gesture trajectory, improving interaction with AI systems.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Human/computer control of dexterous remote hands presents unique challenges, including grasp stabilization and nonanthromorphic behaviour, but progress has been made in grasp planning and controlled slip techniques.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> This paper reviews current research in hand and finger modeling and animation, highlighting progress towards convincing, detailed motions for virtual characters in areas like manipulation and communication.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> The FF-SSD deep learning network effectively detects and localizes hands in space human-robot interaction, outperforming state-of-the-art methods.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> Pixelor is a competitive drawing AI agent that can achieve human-level performance in a Pictionary-like sketching game by learning optimal stroke sequencing strategies and achieving recognizable results faster than humans.
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> A unified control framework for robotic hands can simplify and generalize their control, allowing for more advanced manipulation tasks in industries.
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> A knowledge-based approach using a three- phased scheme can effectively simulate human hand motion and grasping of arbitrary objects, reducing search space and improving performance.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> This paper
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> presents an algorithm for hand-drawn interfaces that simplifies designs by replacing multiple strokes with a single stroke, rationalizing the designer&#x2019;s creative intent. The hands-free human-computer interface using facial movements achieved high performance and accuracy, offering increased independence and confidence for patients with limited hand function.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> The designed grasping control strategy effectively adapts an anthropomorphic robotic hand to object contours, achieving human-like behaviour and robustness.
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> AI raises issues of responsibility attribution, including the problem of many hands and the temporal dimension of control, affecting transparency and explainability.
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> Hand gesture recognition (HGR) is a research hotspot in HMI due to its high degree of differentiation, strong flexibility, and efficiency of information transmission.
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup> The AI edge computing-based system uses gesture tracking and recognition techniques to detect the correctness of stroke trajectory during writing or drawing.
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup> Combinatorial generalization and structured representations are key to achieving human-like abilities in AI, such as drawing human hands.
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>
Figure 6. </label>
                    <caption>
                        <title>Joint angle accuracy using IMU.</title>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure6.gif"/>
                </fig>
            </p>
        </sec>
        <sec id="sec7" sec-type="methods">
            <title>3. Method</title>
            <sec id="sec8">
                <title>3.1 Mechanism of CLIP</title>
                <p>There are various types of model that can be used for explain how the AI generates an image. One such fine example is of CLIP. It is a type of contrastive model that obtains knowledge of semantic information and contextual relationships by learning visual representations of large undefined text data. The following illustration represents a clearer view over this matter. The illustration in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> is about a detailed overview of UnCLIP. Above the dashed line is the CLIP training process, through which we learn a unified presentation space for text and images. Our text-to image conversion process involves feeding the CLIP text into an autoregressive or diffusion system before embedding the image, followed by conditioning the diffusion decoder that generates the final image. Picture Note that the CLIP model is blocked during feedforward and decoder training. Success and careful manual guidance.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>
Figure 1. </label>
                    <caption>
                        <title>Mechanism of CLIP.</title>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure1.gif"/>
                </fig>
                <p>In addition, encoding and decoding images also gives us a tool to observe which image features are detected or ignored by CLIP. By integrating a CLIP image embedding decoder into an earlier model that creates a potential CLIPS image embedded in arbitrary text, we can create generative image models from scratch. Our text-to-image system is compared to DALL-E and GLIDE by researchers, who observe that our samples are similar in quality to SLID, but our generations differ more. Our research demonstrates that propagation priors can be trained in hidden space while still performing at the same level of performance as autoregressive prior, with better computational efficiency. Due to the CLIP codec being flipped, the full-text conditional image generation stack is known as unclip. This methodology provides a technical and detailed approach to analyzing why AI models, particularly those leveraging contrastive learning such as CLIP, struggle with accurately drawing hands. The analysis involves model evaluation, latent space examination, quantitative metrics, and dataset scrutiny. The specific component flow works in the following way:
                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Text encoder:</bold> The input is a textual description: &#x201c;Human hand gesture that shows all five fingers&#x201d; This text is passed through a text encoder, which converts the text into a latent vector representation.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>CLIP objective:</bold> The CLIP (Contrastive Language-Image Pretraining) objective is used to align the text and image representations in the same latent space. This means that the text encoder and image encoder are trained together so that the latent vectors for matching text and images are close to each other in the latent space. The encoded text vector is compared to encoded image vectors using this CLIP objective. The goal is to ensure that the text encoding is close to the image encoding of the corresponding image (in this case, an image of a human hand with all of its five fingers being visible).</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Image encoder:</bold> An image that corresponds to the text prompt is passed through an image encoder, producing an image latent vector. This vector representation of the image is used in conjunction with the text encoding to ensure alignment in the latent space via the CLIP objective.</p>
                        </list-item>
                        <list-item>
                            <label>iv)</label>
                            <p>

                                <bold>Prior network:</bold> The prior network generates a distribution of potential image representations based on the text encoding. This step is crucial for creating diverse image outputs from the same text prompt. It takes the text latent vector and processes it to generate a set of potential latent vectors that could correspond to images matching the description. Two different model classes for the prior model are available:
                                <list list-type="bullet">
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Autoregressive (AR) prior:</bold> Based on the caption, the CLIP image embedding is transformed into a series of discrete codes and predicted autoregressively.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Diffusion prior:</bold> A Gaussian diffusion model conditioned on the caption is used to directly model the continuous vector.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>v)</label>
                            <p>

                                <bold>Decoder:</bold> The decoder takes the generated image latent vector from the prior network and converts it into a full-resolution image. This involves a generative model, such as a Variational Autoencoder (VAE) or a Generative Adversarial Network (GAN), which can decode the latent representation back into a high-quality image. The result is the final image that visually represents the input text prompt. In this case, it would be an image of a human hand gesture with all its five fingers being exposed.</p>
                        </list-item>
                    </list>
                </p>
                <p>In the 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> a Model Architecture of CLIP model and its&#x2019; interfaces is illustrated as the following, that shows how CLIP generates image from text in a very basic method.
                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>Text Encoder: This initial block takes the input text prompt (such as a description or caption) and converts it into a numerical representation. Think of it as translating words into a format that neural networks can understand.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>CLIP (Contrastive Language-Image Pretraining): CLIP is a remarkable model that bridges the gap between language and vision. It learns to associate images and text by embedding them into a shared space. This allows it to understand both visual content and textual descriptions.</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>Image Encoder: Once we have an image, the image encoder processes it and generates a feature vector. This vector captures relevant information about the image, which can be used for subsequent steps.</p>
                        </list-item>
                        <list-item>
                            <label>iv)</label>
                            <p>Prior and Diffusion Decoder Blocks: These are critical for image synthesis. The &#x201c;prior&#x201d; refers to a learned distribution of latent variables (essentially, hidden factors), while the &#x201c;diffusion decoder&#x201d; reconstructs the image from these latent variables. Together, they enable controlled image generation.</p>
                        </list-item>
                        <list-item>
                            <label>v)</label>
                            <p>Additional Conditioning Steps: These steps refine the process. They might involve fine-tuning based on specific attributes mentioned in the text prompt. For example, if the prompt specifies &#x201c;a sunny beach,&#x201d; the conditioning steps adjust the generated image accordingly.
</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>
Figure 2. </label>
                    <caption>
                        <title>Model architecture of CLIP and its interfaces.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec9">
                <title>3.2 Calculation</title>
                <p>

                    <bold>A. Data preparation</bold>
                </p>
                <p>

                    <bold>Dataset selection:</bold> A dataset D consisting of paired text descriptions T and corresponding images I of hands is used. This dataset should cover a diverse range of hand poses, shapes, and contexts to ensure a comprehensive analysis. Using this method, an optimal solution can be found alongwith the following 
                    <xref ref-type="disp-formula" rid="e1">equation (1)</xref>.
                    <disp-formula id="e1">

                        <mml:math display="block">
                            <mml:mi>D</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:msub>
                                <mml:msup>
                                    <mml:mrow>
                                        <mml:mo stretchy="true">{</mml:mo>
                                        <mml:mo stretchy="true">(</mml:mo>
                                        <mml:mi mathvariant="italic">Ti</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi mathvariant="italic">Ii</mml:mi>
                                        <mml:mo stretchy="true">)</mml:mo>
                                        <mml:mo stretchy="true">}</mml:mo>
                                    </mml:mrow>
                                    <mml:mi>N</mml:mi>
                                </mml:msup>
                                <mml:mrow>
                                    <mml:mi>i</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:msub>
                        </mml:math>

                        <label>(1)</label>
</disp-formula>
                </p>
                <p>

                    <bold>B. Model components</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Text encoder:</bold> A transformer-based text encoder ET is used to convert textual descriptions into latent vector representations 
                                <italic toggle="yes">z</italic>

                                <italic toggle="yes">T</italic>. This can be obtained by the following 
                                <xref ref-type="disp-formula" rid="e2">equation (2)</xref>.
                                <disp-formula id="e2">

                                    <mml:math display="block">
                                        <mml:mi mathvariant="italic">zT</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mi mathvariant="italic">ET</mml:mi>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mi>T</mml:mi>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(2)</label>
</disp-formula>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Image encoder:</bold> An image encoder EI is used to convert images into latent vector representations 
                                <italic toggle="yes">zI</italic>, which is gained by 
                                <xref ref-type="disp-formula" rid="e3">equation (3)</xref>.
                                <disp-formula id="e3">

                                    <mml:math display="block">
                                        <mml:mi mathvariant="italic">zI</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mi mathvariant="italic">EI</mml:mi>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mi>I</mml:mi>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(3)</label>
</disp-formula>
                            </p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>CLIP objective:</bold> The CLIP (Contrastive Language-Image Pretraining) objective is employed to align the text and image representations in the latent space. The objective function for a batch of size N is defined as the following expression 
                                <xref ref-type="disp-formula" rid="e4">(4)</xref>: 
                                <disp-formula id="e4">

                                    <mml:math display="block">
                                        <mml:msub>
                                            <mml:mi>L</mml:mi>
                                            <mml:mtext>clip</mml:mtext>
                                        </mml:msub>
                                        <mml:mo>=</mml:mo>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:mfrac>
                                            <mml:mn>1</mml:mn>
                                            <mml:mi>N</mml:mi>
                                        </mml:mfrac>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>N</mml:mi>
                                        </mml:munderover>
                                        <mml:mo stretchy="true">[</mml:mo>
                                        <mml:mo>log</mml:mo>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:mo>exp</mml:mo>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi>sim</mml:mi>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:mi mathvariant="italic">zTi</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi mathvariant="italic">zIi</mml:mi>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mo>/</mml:mo>
                                                    <mml:mi mathvariant="normal">&#x03c4;</mml:mi>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mrow>
                                                        <mml:mi>j</mml:mi>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:mo>exp</mml:mo>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi>sim</mml:mi>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:mi mathvariant="italic">zTi</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi mathvariant="italic">zIj</mml:mi>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mo>/</mml:mo>
                                                    <mml:mi mathvariant="normal">&#x03c4;</mml:mi>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:mo>+</mml:mo>
                                        <mml:mo>log</mml:mo>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:mo>exp</mml:mo>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi>sim</mml:mi>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:mi mathvariant="italic">zIi</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi mathvariant="italic">zTi</mml:mi>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mo>/</mml:mo>
                                                    <mml:mi mathvariant="normal">&#x03c4;</mml:mi>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mrow>
                                                        <mml:mi>j</mml:mi>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:mo>exp</mml:mo>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi>sim</mml:mi>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:mi mathvariant="italic">zIi</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi mathvariant="italic">zTj</mml:mi>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mo>/</mml:mo>
                                                    <mml:mi mathvariant="normal">&#x03c4;</mml:mi>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:mo stretchy="true">]</mml:mo>
                                    </mml:math>

                                    <label>(4)</label>
</disp-formula>
where sim = cosine similarity; &#x03c4; (tau) = temperature parameter.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>C. Analysis procedure</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Model performance analysis:</bold> Evaluate the performance of the text-to-image generation model specifically on hand images. The text encoder ET processes descriptions of hand images, and the image encoder EI processes the corresponding images. Analyze the alignment of text and image representations in the latent space. This whole analysis can be analyzed with the following 
                                <xref ref-type="disp-formula" rid="e5">equation (5)</xref>,

                                <disp-formula id="e5">

                                    <mml:math display="block">
                                        <mml:mi>sim</mml:mi>
                                        <mml:mspace width="0.25em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mi mathvariant="italic">zTi</mml:mi>
                                            <mml:mo>,</mml:mo>
                                            <mml:mi mathvariant="italic">zIi</mml:mi>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>=</mml:mo>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">{</mml:mo>
                                            <mml:mtable>
                                                <mml:mtr>
                                                    <mml:mtd>
                                                        <mml:mi mathvariant="italic">zTi</mml:mi>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:mi mathvariant="italic">ET</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mo stretchy="true">(</mml:mo>
                                                            <mml:mi mathvariant="italic">Ti</mml:mi>
                                                            <mml:mo stretchy="true">)</mml:mo>
                                                        </mml:mrow>
                                                        <mml:mo>;</mml:mo>
                                                    </mml:mtd>
                                                </mml:mtr>
                                                <mml:mtr>
                                                    <mml:mtd>
                                                        <mml:mi mathvariant="italic">zIi</mml:mi>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:mi mathvariant="italic">EI</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mo stretchy="true">(</mml:mo>
                                                            <mml:mi mathvariant="italic">Ii</mml:mi>
                                                            <mml:mo stretchy="true">)</mml:mo>
                                                        </mml:mrow>
                                                        <mml:mo>;</mml:mo>
                                                    </mml:mtd>
                                                </mml:mtr>
                                            </mml:mtable>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(5)</label>
</disp-formula>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Latent space visualization:</bold> Use dimensionality reduction techniques like t-SNE or PCA to visualize the latent space of text and image encodings. The visualization helps in understanding the clustering of hand images and their textual descriptions in the latent space. This visualization follows the expression (6),

                                <disp-formula id="e6">

                                    <mml:math display="block">
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:mi>SNE</mml:mi>
                                        <mml:mspace width="0.25em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:msub>
                                                <mml:mi>Z</mml:mi>
                                                <mml:mi>T</mml:mi>
                                            </mml:msub>
                                            <mml:mo>,</mml:mo>
                                            <mml:msub>
                                                <mml:mi>Z</mml:mi>
                                                <mml:mi>I</mml:mi>
                                            </mml:msub>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>=</mml:mo>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">{</mml:mo>
                                            <mml:mtable>
                                                <mml:mtr>
                                                    <mml:mtd>
                                                        <mml:msub>
                                                            <mml:mi>Z</mml:mi>
                                                            <mml:mi>T</mml:mi>
                                                        </mml:msub>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:msubsup>
                                                            <mml:mrow>
                                                                <mml:mo stretchy="true">{</mml:mo>
                                                                <mml:msub>
                                                                    <mml:mi>Z</mml:mi>
                                                                    <mml:mi mathvariant="italic">Ti</mml:mi>
                                                                </mml:msub>
                                                                <mml:mo stretchy="true">}</mml:mo>
                                                            </mml:mrow>
                                                            <mml:mrow>
                                                                <mml:mi>i</mml:mi>
                                                                <mml:mo>=</mml:mo>
                                                                <mml:mn>1</mml:mn>
                                                            </mml:mrow>
                                                            <mml:mi>N</mml:mi>
                                                        </mml:msubsup>
                                                        <mml:mo>;</mml:mo>
                                                    </mml:mtd>
                                                </mml:mtr>
                                                <mml:mtr>
                                                    <mml:mtd>
                                                        <mml:msub>
                                                            <mml:mi>Z</mml:mi>
                                                            <mml:mi>I</mml:mi>
                                                        </mml:msub>
                                                        <mml:mo>=</mml:mo>
                                                        <mml:msubsup>
                                                            <mml:mrow>
                                                                <mml:mo stretchy="true">{</mml:mo>
                                                                <mml:msub>
                                                                    <mml:mi>Z</mml:mi>
                                                                    <mml:mi mathvariant="italic">Ii</mml:mi>
                                                                </mml:msub>
                                                                <mml:mo stretchy="true">}</mml:mo>
                                                            </mml:mrow>
                                                            <mml:mrow>
                                                                <mml:mi>i</mml:mi>
                                                                <mml:mo>=</mml:mo>
                                                                <mml:mn>1</mml:mn>
                                                            </mml:mrow>
                                                            <mml:mi>N</mml:mi>
                                                        </mml:msubsup>
                                                        <mml:mo>;</mml:mo>
                                                    </mml:mtd>
                                                </mml:mtr>
                                            </mml:mtable>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(6)</label>
</disp-formula>
                            </p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Qualitative analysis:</bold> Generate images from text descriptions of hands using the trained model. Visually inspect the generated images for common errors and patterns, focusing on aspects such as finger placement, proportions, and overall hand shape. This is expressed by 
                                <xref ref-type="disp-formula" rid="e7">equation (7)</xref>
                                <disp-formula id="e7">

                                    <mml:math display="block">
                                        <mml:mo>&#x00ee;</mml:mo>
                                        <mml:mo>=</mml:mo>
                                        <mml:mi>D</mml:mi>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mi>P</mml:mi>
                                            <mml:mspace width="0.25em"/>
                                            <mml:mrow>
                                                <mml:mo stretchy="true">(</mml:mo>
                                                <mml:mi mathvariant="italic">zT</mml:mi>
                                                <mml:mo stretchy="true">)</mml:mo>
                                            </mml:mrow>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(7)</label>
</disp-formula>
                            </p>
                        </list-item>
                        <list-item>
                            <label>iv)</label>
                            <p>

                                <bold>Quantitative metrics:</bold> Employ quantitative metrics to assess the quality of hand images generated by the model. Metrics like Structural Similarity Index (SSIM) and Mean Squared Error (MSE) between generated and real hand images are used. Here these metrics are expressed by 
                                <xref ref-type="disp-formula" rid="e8">equation (8)</xref> and 
                                <xref ref-type="disp-formula" rid="e9">(9)</xref> for SSIM and MSE.
                                <disp-formula id="e8">

                                    <mml:math display="block">
                                        <mml:mtext>SSIM</mml:mtext>
                                        <mml:mspace width="0.25em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:msub>
                                                <mml:mi>I</mml:mi>
                                                <mml:mi>g</mml:mi>
                                            </mml:msub>
                                            <mml:mo>,</mml:mo>
                                            <mml:msub>
                                                <mml:mi>I</mml:mi>
                                                <mml:mi>r</mml:mi>
                                            </mml:msub>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mi mathvariant="italic">&#x03bc;g</mml:mi>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mi mathvariant="italic">&#x03bc;r</mml:mi>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi>C</mml:mi>
                                                    <mml:mn>1</mml:mn>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mi mathvariant="italic">&#x03c3;gr</mml:mi>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi>C</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mspace width="0.25em"/>
                                                    <mml:mi mathvariant="italic">&#x03bc;g</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi mathvariant="italic">&#x03bc;r</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi>C</mml:mi>
                                                    <mml:mn>1</mml:mn>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi mathvariant="italic">&#x03c3;g</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi mathvariant="italic">&#x03c3;r</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo>+</mml:mo>
                                                    <mml:mi>C</mml:mi>
                                                    <mml:mn>2</mml:mn>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                        </mml:mfrac>
                                    </mml:math>

                                    <label>(8)</label>
</disp-formula>

                                <disp-formula id="e9">

                                    <mml:math display="block">
                                        <mml:mi>MSE</mml:mi>
                                        <mml:mspace width="0.25em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:msub>
                                                <mml:mi>I</mml:mi>
                                                <mml:mi>g</mml:mi>
                                            </mml:msub>
                                            <mml:mo>,</mml:mo>
                                            <mml:msub>
                                                <mml:mi>I</mml:mi>
                                                <mml:mi>r</mml:mi>
                                            </mml:msub>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mn>1</mml:mn>
                                            <mml:mi>N</mml:mi>
                                        </mml:mfrac>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>N</mml:mi>
                                        </mml:munderover>
                                        <mml:msup>
                                            <mml:mrow>
                                                <mml:mo stretchy="true">(</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>I</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>g</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>I</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>r</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo stretchy="true">)</mml:mo>
                                            </mml:mrow>
                                            <mml:mn>2</mml:mn>
                                        </mml:msup>
                                        <mml:mo>&#x2219;</mml:mo>
                                        <mml:mi>z</mml:mi>
                                    </mml:math>

                                    <label>(9)</label>
</disp-formula>
                            </p>
                        </list-item>
                    </list>where 
                    <italic toggle="yes">I</italic>
                    <sub>

                        <italic toggle="yes">g</italic>
                    </sub> and 
                    <italic toggle="yes">I</italic>
                    <sub>

                        <italic toggle="yes">r</italic>
                    </sub> are the generated and real images, respectively, &#x03bc; and &#x03c3; sigma represent mean and variance, and 
                    <italic toggle="yes">C</italic>1, 
                    <italic toggle="yes">C</italic>2 are constants to stabilize the division.
                    <list list-type="roman-lower">
                        <list-item>
                            <label>v)</label>
                            <p>

                                <bold>Error analysis:</bold> Perform detailed error analysis to categorize the types of mistakes made by the model. Errors can be classified into anatomical inaccuracies, unnatural poses, missing fingers, etc. (
                                <italic toggle="yes">E</italic>
                                <sub>anatomical</sub>, 
                                <italic toggle="yes">E</italic>
                                <sub>pose</sub>, 
                                <italic toggle="yes">E</italic>
                                <sub>missing</sub>
                                <bold>)</bold>.</p>
                        </list-item>
                        <list-item>
                            <label>vi)</label>
                            <p>

                                <bold>Dataset evaluation:</bold> Evaluate the dataset D to identify potential biases or gaps in the representation of hands. Assess whether the dataset includes a sufficient variety of hand poses, shapes, and contexts. Identify if the dataset lacks specific types of hand images that might contribute to the model&#x2019;s difficulties. This evaluation holds the 
                                <xref ref-type="disp-formula" rid="e10">expression (10)</xref>.
                                <disp-formula id="e10">

                                    <mml:math display="block">
                                        <mml:mtext>Variety</mml:mtext>
                                        <mml:mspace width="0.25em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mi>D</mml:mi>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mi>k</mml:mi>
                                                    <mml:mi>K</mml:mi>
                                                </mml:msubsup>
                                                <mml:mtext>unique</mml:mtext>
                                                <mml:mspace width="0.25em"/>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:mi mathvariant="italic">Tk</mml:mi>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:mi mathvariant="italic">Ik</mml:mi>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                            </mml:mrow>
                                            <mml:mi>K</mml:mi>
                                        </mml:mfrac>
                                    </mml:math>

                                    <label>(10)</label>
</disp-formula>
                            </p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec10">
            <title>4. Technical challenges</title>
            <sec id="sec11">
                <title>4.1 Complexity of hand anatomy</title>
                <p>

                    <bold>4.1.1 Challenges</bold>
                </p>
                <p>

                    <bold>4.1.1.1 Structural complexity</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Bones and joints:</bold> The human hand has 27 bones, including the phalanges (finger bones), metacarpals (palm bones), and carpal bones (wrist bones). Each joint, especially in the fingers, allows for a wide range of motion and poses.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Movement dynamics:</bold> The fingers can bend, twist, and rotate in various directions. Accurately capturing these movements and the transitions between them is challenging.
                                <sup>
                                    <xref ref-type="bibr" rid="ref18">18</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.1.2 Surface anatomy</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Muscles and tendons:</bold> The hand&#x2019;s surface anatomy includes muscles, tendons, and veins that change appearance based on hand movements and poses.
                                <sup>
                                    <xref ref-type="bibr" rid="ref19">19</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Skin texture and wrinkles:</bold> The skin on the hand has unique textures, lines, and wrinkles, especially on the palms and knuckles. These details are crucial for realistic rendering.
                                <sup>
                                    <xref ref-type="bibr" rid="ref20">20</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.1.3 Articulation and posing</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Finger poses:</bold> Each finger can independently move, creating countless possible poses. The AI must understand the natural range of motion and how fingers interact.
                                <sup>
                                    <xref ref-type="bibr" rid="ref21">21</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Hand gestures:</bold> Hands can express a wide range of emotions and actions through gestures. Understanding and replicating these gestures adds complexity.
                                <sup>
                                    <xref ref-type="bibr" rid="ref22">22</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.1.4 Perspective and proportion</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Foreshortening:</bold> Drawing hands from different angles, especially when fingers are pointed towards or away from the viewer, requires accurate foreshortening to maintain realistic proportions.
                                <sup>
                                    <xref ref-type="bibr" rid="ref23">23</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Relative size:</bold> Each finger has a different length and thickness, and these proportions must be maintained from various perspectives.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.1.5 Inter-hand interaction influenced hand grips and interactions:</bold> When hands hold objects or interact with other body parts, the AI must accurately depict the contact points and the resulting deformations in the skin and muscles.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup>
                </p>
                <p>

                    <bold>4.1.1.6 Lighting and shadows</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Light interactions:</bold> The hand&#x2019;s complex structure creates intricate patterns of light and shadow, especially in the spaces between fingers and around joints. Capturing these details is crucial for realistic rendering.
                                <sup>
                                    <xref ref-type="bibr" rid="ref25">25</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Reflective and translucent properties:</bold> The skin of the hand has both reflective and translucent properties, which affect how light interacts with it.
                                <sup>
                                    <xref ref-type="bibr" rid="ref26">26</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.1.7 Symmetry and asymmetry</bold>
                </p>
                <p>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Bilateral symmetry:</bold> While hands are generally symmetrical, minor asymmetries due to individual differences and hand use must be considered.
                                <sup>
                                    <xref ref-type="bibr" rid="ref27">27</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Dominance and wear:</bold> The dominant hand often shows different wear patterns and muscular development compared to the non-dominant hand.
                                <sup>
                                    <xref ref-type="bibr" rid="ref28">28</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.2 Quantitative measurements</bold>
                </p>
                <p>

                    <bold>4.1.2.1 Pose estimation accuracy:</bold>
                </p>
                <p>

                    <bold>a. Keypoint detection:</bold>Keypoint detection involves identifying specific points on the hand, such as joints and fingertip positions. To measure the accuracy of keypoint detection, the following metrics are commonly used:</p>
                <p>

                    <bold>iii) Mean Squared Error (MSE):</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Description:</bold> MSE is used to measure the average squared difference between the predicted keypoint coordinates and the ground truth coordinates. This can be expressed by 
                                <xref ref-type="disp-formula" rid="e11">equation (11)</xref> as well.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Formula:</bold>

                                <disp-formula id="e11">

                                    <mml:math display="block">
                                        <mml:mi mathvariant="italic">MSE</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mn>1</mml:mn>
                                            <mml:mrow>
                                                <mml:mspace width="0.25em"/>
                                                <mml:mi>N</mml:mi>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>N</mml:mi>
                                        </mml:munderover>
                                        <mml:mspace width="0em"/>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>x</mml:mi>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>x</mml:mi>
                                                    <mml:mo>^</mml:mo>
                                                    <mml:msub>
                                                        <mml:mrow/>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:mo>+</mml:mo>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>y</mml:mi>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>y</mml:mi>
                                                    <mml:mo>^</mml:mo>
                                                    <mml:msub>
                                                        <mml:mrow/>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(11)</label>
</disp-formula>
                            </p>
                        </list-item>
                    </list>where 
                    <italic toggle="yes">N</italic> is the number of keypoints, (
                    <italic toggle="yes">x</italic>
                    <sub>

                        <italic toggle="yes">i</italic>
                    </sub>,
                    <italic toggle="yes">y</italic>
                    <sub>

                        <italic toggle="yes">i</italic>
                    </sub>) are the ground truth coordinates, and (
                    <italic toggle="yes">x</italic>
                    <sup>

                        <italic toggle="yes">i</italic>
                    </sup>,

                    <italic toggle="yes">y</italic>
                    <sup>

                        <italic toggle="yes">i</italic>
                    </sup>) are the predicted coordinates.

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Application:</bold> MSE provides a straightforward indication of the overall prediction error. Lower MSE values indicate higher accuracy.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>iv) Percentage of Correct Keypoints (PCK):</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Description:</bold> PCK measures the percentage of keypoints that fall within a certain threshold distance from the ground truth and is expressed by 
                                <xref ref-type="disp-formula" rid="e12">equation (12)</xref>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Formula:</bold>

                                <disp-formula id="e12">

                                    <mml:math display="block">
                                        <mml:mi mathvariant="italic">PCK</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mn>1</mml:mn>
                                            <mml:mrow>
                                                <mml:mspace width="0.25em"/>
                                                <mml:mi>N</mml:mi>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>N</mml:mi>
                                        </mml:munderover>
                                        <mml:mspace width="0em"/>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:mo>&#x221a;</mml:mo>
                                            <mml:mrow>
                                                <mml:mo stretchy="true">(</mml:mo>
                                                <mml:msup>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:msub>
                                                            <mml:mi>x</mml:mi>
                                                            <mml:mi>i</mml:mi>
                                                        </mml:msub>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:msup>
                                                            <mml:mi>x</mml:mi>
                                                            <mml:mi>i</mml:mi>
                                                        </mml:msup>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:msup>
                                                <mml:mo>+</mml:mo>
                                                <mml:msup>
                                                    <mml:mrow>
                                                        <mml:mo stretchy="true">(</mml:mo>
                                                        <mml:msub>
                                                            <mml:mi>y</mml:mi>
                                                            <mml:mi>i</mml:mi>
                                                        </mml:msub>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:msup>
                                                            <mml:mi>y</mml:mi>
                                                            <mml:mi>i</mml:mi>
                                                        </mml:msup>
                                                        <mml:mo stretchy="true">)</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:msup>
                                                <mml:mo stretchy="true">)</mml:mo>
                                            </mml:mrow>
                                            <mml:mo>&lt;</mml:mo>
                                            <mml:mi>&#x03b1;</mml:mi>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(12)</label>
</disp-formula>
                            </p>
                        </list-item>
                    </list>where &#x03b1; is the threshold distance, 1(&#x00b7;) is the indicator function, and the rest are as defined above.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Application:</bold> PCK is often used to assess model performance under varying thresholds, providing insight into the robustness of the keypoint detection.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>b. Average Distance Error (ADE):</bold> ADE measures the average Euclidean distance between the predicted and ground truth keypoints, providing a more intuitive understanding of the prediction error.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Description:</bold> ADE calculates the average Euclidean distance between predicted keypoints and their corresponding ground truth keypoints. This is expressed by the 
                                <xref ref-type="disp-formula" rid="e13">(13)</xref> no. formula.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Formula:</bold>

                                <disp-formula id="e13">

                                    <mml:math display="block">
                                        <mml:mi mathvariant="italic">ADE</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:mn>1</mml:mn>
                                            <mml:mrow>
                                                <mml:mspace width="0.25em"/>
                                                <mml:mi>N</mml:mi>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:munderover>
                                            <mml:mo>&#x2211;</mml:mo>
                                            <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                <mml:mo>=</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                            <mml:mi>N</mml:mi>
                                        </mml:munderover>
                                        <mml:mo>&#x221a;</mml:mo>
                                        <mml:mrow>
                                            <mml:mo stretchy="true">(</mml:mo>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>x</mml:mi>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>x</mml:mi>
                                                    <mml:mo>^</mml:mo>
                                                    <mml:msub>
                                                        <mml:mrow/>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:mo>+</mml:mo>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mo stretchy="true">(</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>y</mml:mi>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>y</mml:mi>
                                                    <mml:mo>^</mml:mo>
                                                    <mml:msub>
                                                        <mml:mrow/>
                                                        <mml:mi>i</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo stretchy="true">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:mo stretchy="true">)</mml:mo>
                                        </mml:mrow>
                                    </mml:math>

                                    <label>(13)</label>
</disp-formula>
                            </p>
                        </list-item>
                    </list>where 
                    <italic toggle="yes">N</italic> is the number of keypoints, (
                    <italic toggle="yes">x</italic>
                    <sub>

                        <italic toggle="yes">i</italic>
                    </sub>,

                    <italic toggle="yes">y</italic>
                    <sub>

                        <italic toggle="yes">i</italic>
                    </sub>) are the ground truth coordinates, and (
                    <italic toggle="yes">x</italic>
                    <sup>

                        <italic toggle="yes">i</italic>
                    </sup>,

                    <italic toggle="yes">y</italic>
                    <sup>

                        <italic toggle="yes">i</italic>
                    </sup>) are the predicted coordinates.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Application:</bold> ADE gives a direct measure of the average error in prediction, making it easier to understand how far off the model&#x2019;s predictions are from the actual keypoints. Lower ADE values indicate higher accuracy.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.2.2. Shape and proportion accuracy</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Procrustes analysis:</bold> Use Procrustes distance to measure the similarity between predicted hand shapes and ground truth shapes after removing differences in scale, rotation, and translation.
                                <sup>
                                    <xref ref-type="bibr" rid="ref29">29</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Aspect ratio consistency:</bold> Measure the consistency of aspect ratios of fingers and the overall hand structure.
                                <sup>
                                    <xref ref-type="bibr" rid="ref30">30</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.2.3. Surface detail and texture accuracy</bold>
                </p>
                <p>The 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> gives an overview on texture analysis base on outlines the steps involved in image classification, emphasizing the role of textural features and the Random Forest algorithm on a selected image segment that follows the flowchart given below:
                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Input data (Labelled images):</bold> The process starts with a set of labelled images. These images have known class labels (e.g., &#x201c;cat,&#x201d; &#x201c;dog,&#x201d; &#x201c;car,&#x201d; etc.).</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Feature extraction (Textural features):</bold> Next, we extract relevant features from these labelled images. These features capture the visual characteristics of the images. Textural features play a crucial role in image classification. They describe patterns, textures, and spatial relationships within the image.</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Engineered &amp; learned features:</bold> The flowchart mentions both &#x201c;engineered&#x201d; and &#x201c;learned&#x201d; features. Engineered Features: These are handcrafted features designed by domain experts. Examples include texture descriptors, color histograms, and edge-based features.
</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>
Figure 3. </label>
                    <caption>
                        <title>Texture analysis.</title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure3.gif"/>
                </fig>
                <p>

                    <bold>Learned features:</bold> These are automatically learned by neural networks or other machine learning models during training. Convolutional Neural Networks (CNNs) excel at learning hierarchical features from raw pixel data.
                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Random forest algorithm:</bold>The flowchart includes a &#x201c;Random Forest&#x201d; algorithm. Random Forest is an approach for ensemble learning that combines several decision trees. Each decision tree is trained on a set of attributes and data points. The final prediction is based on the majority vote of individual trees.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Image labels (Classification):</bold> Using the extracted features, the Random Forest predicts the class labels for unlabelled images. The function &#x201c;=f (Features)&#x201d; represents this classification process.</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Feature scores &amp; classification accuracy:</bold> The output of the Random Forest includes feature scores, which indicate the importance of each feature. Classification accuracy measures how well the model performs on unseen data.</p>
                        </list-item>
                    </list>
                </p>
                <p>The image in 
                    <xref ref-type="fig" rid="f4">
Figure 4</xref> depicts a comparison between human skin and artificial skin, highlighting their respective structures and functionalities.
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>

                                <bold>Human skin (Left side):</bold> The cross-section of human skin reveals its layers: epidermis and dermis. Within these layers, various sensory receptors are labeled:
                                <list list-type="bullet">
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Meissner&#x2019;s corpuscles:</bold> Responsible for light touch and sensitivity.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Merkel cells:</bold> Involved in tactile discrimination.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Ruffini endings:</bold> Detect skin stretch.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Pacinian corpuscles:</bold> Detect pressure and vibration. These receptors contribute to our sense of touch and perception.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>

                                <bold>Artificial skin (Right side):</bold> The artificial skin structure consists of sensor nodes interconnected by lines, forming a network. An encapsulation layer covers these nodes. Icons below the illustrations compare functionalities:
                                <list list-type="bullet">
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Sensation:</bold> Human skin vs. artificial skin.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Regulation:</bold> Human skin maintains temperature; artificial skin aims to do the same.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>Protection:</bold> Both provide protective functions.</p>
                                    </list-item>
                                </list>
                            </p>
                            <p>

                                <bold>Additional icons represent advanced features of artificial skin</bold>
</p>
                        </list-item>
                    </list>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Super-sensing:</bold> Enhanced perception (depicted by an eye with circuit patterns).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Beyond-skin perception:</bold> Connectivity (depicted by a Wi-Fi symbol).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Feedback:</bold> Loop of information exchange.
                                <list list-type="roman-lower">
                                    <list-item>
                                        <label>i)</label>
                                        <p>

                                            <bold>Texture similarity metrics:</bold> Use metrics like Structural Similarity Index (SSIM) or Peak Signal-to-Noise Ratio (PSNR) to compare the textures of AI-generated hands with ground truth images.
                                            <sup>
                                                <xref ref-type="bibr" rid="ref31">31</xref>
                                            </sup>
                                        </p>
                                    </list-item>
                                    <list-item>
                                        <label>ii)</label>
                                        <p>

                                            <bold>Wrinkle and line detection:</bold> Measure the presence and accuracy of skin details such as wrinkles and lines using edge detection algorithms.
                                            <sup>
                                                <xref ref-type="bibr" rid="ref32">32</xref>
                                            </sup>
</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>
Figure 4. </label>
                    <caption>
                        <title>Surface analysis.</title>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure4.gif"/>
                </fig>
                <p>

                    <bold>4.1.2.4 Movement dynamics and articulation</bold>
                </p>
                <p>

                    <bold>i) Temporal consistency:</bold> For sequences of hand movements, measure the temporal consistency of keypoints and shapes across frames
                    <sup>
                        <xref ref-type="bibr" rid="ref33">33</xref>
                    </sup>
                </p>
                <p>Temporal consistency is crucial when working with video data, especially in the context of artificial intelligence. Applying spatial augmentations to video data, considering temporal consistency is essential. It helps maintain the coherence of the sequence and improves the quality of learned representations. In the image provided in 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> it shows various ways it impacts spatial augmentation.
                    <sup>
                        <xref ref-type="bibr" rid="ref33">33</xref>
                    </sup> It is shown with three rows demonstrating different approaches to spatial augmentation:
                    <sup>
                        <xref ref-type="bibr" rid="ref52">34</xref>
                    </sup>

                    <list list-type="alpha-lower">
                        <list-item>
                            <label>a)</label>
                            <p>

                                <bold>Original video clip (Top row):</bold> This row contains four frames showing a horse in various positions as it moves. These frames represent the natural progression of the video clip.</p>
                        </list-item>
                        <list-item>
                            <label>b)</label>
                            <p>

                                <bold>Frame-level spatial augmentation (Middle row):</bold> In this row, we also have four frames, but each frame has undergone individual augmentations. These augmentations include changes in brightness, contrast, and color saturation. However, the key issue here is that these augmentations were applied independently to each frame, without considering the context of the previous or next frame. As a result, the appearance across the sequence lacks consistency. This lack of temporal consistency can be problematic for AI models that learn from video data because it disrupts the natural flow of movement.</p>
                        </list-item>
                        <list-item>
                            <label>c)</label>
                            <p>

                                <bold>Temporally consistent spatial augmentation (Bottom row):</bold> The bottom row shows four frames where augmentations have been applied while maintaining temporal consistency. Temporally consistent augmentations smoothly transition from one frame to another. This ensures that the changes in brightness, contrast, and color saturation align with the video&#x2019;s natural progression. By preserving temporal consistency, AI models can learn more effectively from video clips.
</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>
Figure 5. </label>
                    <caption>
                        <title>Temporal consistency.</title>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure5.gif"/>
                </fig>
                <p>

                    <bold>ii) Joint angle accuracy:</bold> Compare the predicted joint angles with ground truth angles using angular error metrics.
                    <sup>
                        <xref ref-type="bibr" rid="ref34">35</xref>
                    </sup> The image in 
                    <xref ref-type="fig" rid="f6">
Figure 6</xref> illustrates a model architecture for predicting lower limb joint angles and moments during gait using artificial neural networks. It compares two approaches: a feed-forward neural network and an LSTM (Long Short-Term Memory) neural network. Although it suggests that both feed-forward and LSTM neural networks can be used for this prediction task but the LSTM is expected to perform better due to its ability to consider the temporal context of the IMU data. This process flows through following method.
                    <list list-type="alpha-lower">
                        <list-item>
                            <label>a)</label>
                            <p>

                                <bold>IMU data:</bold> The process starts with IMU (Inertial Measurement Unit) data, which likely captures information about acceleration and angular velocity during movement.</p>
                        </list-item>
                        <list-item>
                            <label>b)</label>
                            <p>

                                <bold>Neural networks:</bold> It is consisted with two different paths.</p>
                            <p>

                                <bold>1st. Feed-forward neural network:</bold> The IMU data is fed into a feed-forward neural network. This type of network processes data in one direction, from input to output, without forming loops or cycles.</p>
                            <p>

                                <bold>2nd. LSTM neural network:</bold> Alternatively, the IMU data is fed into an LSTM neural network. LSTMs are specifically designed to handle sequential data like time series, allowing them to capture temporal dependencies in the data.</p>
                        </list-item>
                        <list-item>
                            <label>c)</label>
                            <p>

                                <bold>Output:</bold> Both networks generate predictions for joint angles and moments. These represent the estimated positions and forces at the lower limb joints during gait.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.1.2.5 Lighting and shadow realism</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Light direction and intensity consistency:</bold> Measure the accuracy of predicted lighting directions and intensities using photometric error metrics.
                                <sup>
                                    <xref ref-type="bibr" rid="ref35">36</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Shadow accuracy:</bold> Compare the predicted shadow patterns with ground truth shadows using metrics like Shadow Similarity Index.
                                <sup>
                                    <xref ref-type="bibr" rid="ref36">37</xref>
                                </sup>
                            </p>
                            <p>
In 
                                <xref ref-type="fig" rid="f7">Figure 7</xref> it is shown how ARShadowGAN-like training scheme in AI
                                <sup>
                                    <xref ref-type="bibr" rid="ref36">37</xref>
                                </sup> generates realistic shadow (Lighting and Shadow Realism) in a picture works. This process ensures that the generated shadows blend seamlessly into the scene, enhancing visual realism.

                                <list list-type="alpha-lower">
                                    <list-item>
                                        <label>a)</label>
                                        <p>

                                            <bold>Shadow-free image and mask:</bold> Start with a shadow-free image (an image without any shadows) and a mask that highlights the object of interest.</p>
                                    </list-item>
                                    <list-item>
                                        <label>b)</label>
                                        <p>

                                            <bold>Attention module:</bold> The attention module analyzes the input and produces attention maps. These attention maps include a mask for neighboring objects and their shadows.</p>
                                    </list-item>
                                    <list-item>
                                        <label>c)</label>
                                        <p>

                                            <bold>Shadow generation module:</bold> Based on the attention maps, the shadow generation module creates a shadow for the object.</p>
                                    </list-item>
                                    <list-item>
                                        <label>d)</label>
                                        <p>

                                            <bold>Refinement module ground truth:</bold> The generated shadow undergoes further refinement to make it realistic.</p>
                                    </list-item>
                                    <list-item>
                                        <label>e)</label>
                                        <p>

                                            <bold>Discriminator:</bold> The discriminator compares the refined shadow with a real image to assess its authenticity.
</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>
Figure 7. </label>
                    <caption>
                        <title>How ARShadowGAN-like training scheme in AI generates realistic shadow (Lighting and Shad-ow Realism) in a picture.</title>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/175451/c102d881-0c94-42f0-b4f6-425e59e926c2_figure7.gif"/>
                </fig>
            </sec>
            <sec id="sec13">
                <title>4.2 Qualitative assessments</title>
                <p>

                    <bold>4.2.1 Expert evaluation</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Human evaluators:</bold> Have experts (e.g., artists, anatomists) assess the realism and accuracy of AI-generated hand drawings based on various criteria such as anatomical correctness, proportion, and movement.
                                <sup>
                                    <xref ref-type="bibr" rid="ref37">38</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Visual turing test:</bold> Evaluate if human observers can distinguish between AI-generated and real hand drawings.
                                <sup>
                                    <xref ref-type="bibr" rid="ref38">39</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.2.2 User studies:</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Perceptual studies:</bold> Conduct studies with users to gather subjective feedback on the realism and accuracy of hand drawings.
                                <sup>
                                    <xref ref-type="bibr" rid="ref39">40</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Preference tests:</bold> Compare AI-generated hand drawings with human-drawn hands to see which one&#x2019;s users prefer.
                                <sup>
                                    <xref ref-type="bibr" rid="ref40">41</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.2.3 Comparative analysis:</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>Benchmarking against datasets:</bold> Compare AI-generated hand drawings against established benchmarks and datasets (e.g., Human3.6M, RHD) to measure performance against known standards.
                                <sup>
                                    <xref ref-type="bibr" rid="ref41">42</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>A/B Testing:</bold> Perform A/B tests with different versions of AI-generated hand drawings to determine improvements and preferences.
                                <sup>
                                    <xref ref-type="bibr" rid="ref42">43</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec14">
                <title>4.3 Tools and techniques</title>
                <p>

                    <bold>4.3.1 3D Hand models:</bold> Use 3D hand models and motion capture data to create accurate ground truth references for measuring AI performance. Employ 3D reconstruction techniques to compare predicted hand poses with 3D ground truth data.</p>
                <p>

                    <bold>4.3.2 Machine learning metrics:</bold> Utilize common machine learning metrics such as accuracy, precision, recall, F1 score, and area under the curve (AUC) for classification tasks related to hand gesture recognition.</p>
                <p>

                    <bold>4.3.3 Computer vision techniques:</bold> Implement computer vision algorithms for keypoint detection, segmentation, and texture analysis to evaluate the quality of AI-generated hand drawings.</p>
            </sec>
            <sec id="sec15">
                <title>4.4 Data collection and annotation</title>
                <p>Collecting high-quality data on hand movements is a significant hurdle. Traditional motion capture systems can be cumbersome and expensive, while video-based methods often lack the necessary precision. Additionally, annotating hand movement data requires expert knowledge and can be time-consuming, leading to limited availability of large, annotated datasets that are essential for training AI models. Some detailed and specific breakdown of data collection and annotation for measuring the complexity and accuracy of AI-generated human hand drawings.</p>
                <p>

                    <bold>4.4.1 Data collection</bold>
                </p>
                <p>

                    <bold>4.4.1.1 Publicly available datasets</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>MPII+NZ Hand Pose Dataset:</bold> Contains hand images with annotated keypoints and 3D poses.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>FreiHAND:</bold> Includes color images, depth maps, and corresponding 3D hand models.</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Rendered Hand Pose Dataset (RHD):</bold> Offers synthetic images of hands with keypoint annotations.</p>
                        </list-item>
                        <list-item>
                            <label>iv)</label>
                            <p>

                                <bold>CMU Panoptic Hand Dataset:</bold> Provides multi-view images and 3D keypoints of hand poses.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.1.2 Custom data collection</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>High-resolution imaging:</bold> Capture images using high-resolution cameras to ensure detailed features of hands are recorded.</p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Diverse subjects:</bold> Include a variety of subjects with different hand shapes, sizes, skin tones, and ages to create a comprehensive dataset.</p>
                        </list-item>
                        <list-item>
                            <label>iii)</label>
                            <p>

                                <bold>Varied poses:</bold> Ensure hands are captured in a wide range of poses, including open, closed, gripping objects, and interacting with other hands or objects.</p>
                        </list-item>
                        <list-item>
                            <label>iv)</label>
                            <p>

                                <bold>Lighting conditions:</bold> Collect data under different lighting conditions to help the model learn how lighting affects hand appearance.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.1.3 Extended data</bold>
                </p>
                <p>Some resources can be adapted from OpenAI CLIP simple implementation that consists of CLIP models on Keras code from scratch in PyTorch.
                    <sup>
                        <xref ref-type="bibr" rid="ref56">44</xref>
                    </sup> Although OpenAI has open-sourced parts of CLIP,e.g.-a dataset of OpenAI&#x2019;s CLIP model, VIT-LARGE-14-PATCH,
                    <sup>
                        <xref ref-type="bibr" rid="ref57">45</xref>
                    </sup> the code can be complex and overwhelming
                    <bold>.</bold>
                </p>
                <p>

                    <bold>4.4.1.4 3D hand models</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>

                                <bold>3D scanning:</bold> Use 3D scanners like Artec Eva or Structure Sensor to capture high-resolution 3D models of hands in various poses.
                                <sup>
                                    <xref ref-type="bibr" rid="ref43">46</xref>
                                </sup>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>

                                <bold>Synthetic data generation:</bold> Create synthetic hand models using software like Blender or Unity. Apply different textures and poses to these models to augment the dataset.
                                <sup>
                                    <xref ref-type="bibr" rid="ref44">47</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.2 Data annotation</bold>
                </p>
                <p>

                    <bold>4.4.2.1 Keypoint annotation</bold>
                </p>
                <p>

                    <bold>i) Manual annotation:</bold> Mannual Annotation includes tools like - Labelbox, VGG Image Annotator, or custom software and the process work by Annotating keypoints such as wrist, knuckles, and finger joints (21 keypoints: 4 per finger, 1 at the wrist). For example- Label the base, middle, and tip joints for each finger, and the wrist joint.</p>
                <p>

                    <bold>ii) Automated annotation tools:</bold> Use pre-trained models like OpenPose to predict keypoints, then manually correct them for accuracy.
                    <sup>
                        <xref ref-type="bibr" rid="ref45">48</xref>
                    </sup>
                </p>
                <p>

                    <bold>4.4.2.2 3D pose annotation</bold>
                </p>
                <p>

                    <bold>i) Motion capture:</bold> Motion capturing can be achieved by systems like Vicon or OptiTrack. The process may work by the Record hand movements and generate 3D keypoints. Ensure accurate calibration for precise annotations.</p>
                <p>

                    <bold>ii) Multi-view stereo:</bold> The Setup of multi view stereo Captures images from multiple angles using synchronized cameras. And the Reconstruction is done by Using stereo vision techniques to reconstruct 3D hand poses.</p>
                <p>

                    <bold>4.4.2.3 Surface detail and texture annotation:</bold> For manual Annotation tools or software like Adobe Photoshop or custom annotation tools are used. And the process follows by Annotation of fine details such as skin texture, wrinkles, and veins manually.</p>
                <p>

                    <bold>4.4.2.4 Shadow and lighting annotation is done by:</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Tools:</bold> Use software like Labelbox or custom annotation tools.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Process:</bold> Annotate regions of shadows and light sources in the images.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.3 Annotation tools and software are suggested as below:</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Labeling software: </bold>For this purpose, tools like Label Studio
                                <sup>
                                    <xref ref-type="bibr" rid="ref47">49</xref>
                                </sup> can be used. Also VIA
                                <sup>
                                    <xref ref-type="bibr" rid="ref48">50</xref>
                                </sup> and CVAT
                                <sup>
                                    <xref ref-type="bibr" rid="ref49">51</xref>,
                                    <xref ref-type="bibr" rid="ref58">52</xref>
                                </sup> a good choice for analyzing as they have free source.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>3D Modeling software: </bold>Softwares like Maya
                                <sup>
                                    <xref ref-type="bibr" rid="ref50">53</xref>
                                </sup> and Godot
                                <sup>
                                    <xref ref-type="bibr" rid="ref51">54</xref>
                                </sup> are very adequate for 3D modeling and animation.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Motion capture systems: </bold>OpenPose
                                <sup>
                                    <xref ref-type="bibr" rid="ref51">54</xref>
                                </sup> for real-time human pose and key point detection for AI, VR/AR, and research, OpenMoCap
                                <sup>
                                    <xref ref-type="bibr" rid="ref53">55</xref>
                                </sup> for motion capturing for 2D/3D tracking using cameras or video footage, or Kinovea
                                <sup>
                                    <xref ref-type="bibr" rid="ref54">56</xref>
                                </sup> for simple 2D motion analysis for sports and rehabilitation satisfy for such systemic requirements.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.4 Quality control is chosen on the basis of these two mentioned below:</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Inter-annotator agreement process:</bold> Have multiple annotators label the same data and calculate Cohen&#x2019;s Kappa to assess consistency.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Annotation validation process:</bold> Review and correct annotations in a validation set by experts. Regularly update annotations to maintain high quality.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.5 Data augmentation</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Synthetic augmentation and transformations:</bold> Apply rotations, scaling, translations, and color adjustments to existing images using libraries like OpenCV or imaging.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>3D Augmentation and Software:</bold> Use Blender
                                <sup>
                                    <xref ref-type="bibr" rid="ref46">57</xref>
                                </sup> or Unity
                                <sup>
                                    <xref ref-type="bibr" rid="ref55">58</xref>
                                </sup> to create new poses, textures, and lighting conditions for 3D hand models.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.6 Documentation and metadata</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Annotation guidelines and documentation:</bold> Create detailed guidelines for annotators, specifying how to label key points, 3D poses, textures, and interactions. Include examples and edge cases.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.7 Annotation guidelines</bold>
                </p>
                <p>

                    <bold>4.4.7.1 Keypoint annotation</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Wrist:</bold> The joint where the hand connects to the forearm.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Knuckles:</bold> The joints at the base of each finger.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Finger joints:</bold> Annotate the base, middle, and tip of each finger.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.4.7.2 3D Pose annotation</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Use motion capture systems to record hand movements. Ensure accurate calibration for precise an-notations.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Metadata:</bold> Maintain metadata for each annotated image or 3D model, including information about the subject (e.g., age, gender), pose, lighting conditions, and annotation quality.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec30">
                <title>4.5 Real-time processing</title>
                <p>Real-time processing in the context of AI-generated human hand drawings involves the rapid detection, analysis, and generation of hand images or movements. This is crucial for applications like virtual reality (VR), augmented reality (AR), and real-time interaction systems. Here&#x2019;s a detailed breakdown of the technical aspects involved in real-time processing:</p>
                <p>

                    <bold>4.5.1 Real-time hand detection and tracking</bold>

                    <list list-type="roman-lower">
                        <list-item>
                            <label>i)</label>
                            <p>Hand Detection is done by Object Detection Models such as,

                                <list list-type="bullet">
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>YOLO (You Only Look Once):</bold> Efficient for real-time object detection, including hands.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>SSD (Single Shot MultiBox Detector):</bold> Another real-time object detection framework.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                        <list-item>
                            <label>ii)</label>
                            <p>Keypoint Detection is accomplished by Pose Estimation Models such as:
                                <list list-type="bullet">
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>MediaPipe hands:</bold> A high-performance, real-time hand tracking solution by Google.</p>
                                    </list-item>
                                    <list-item>
                                        <label>&#x2022;</label>
                                        <p>

                                            <bold>OpenPose:</bold> Multi-person key point detection including hand key points.</p>
                                    </list-item>
                                </list>
                            </p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.5.2 Real-time 3D pose estimation</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Depth cameras:</bold> Use depth cameras like Intel RealSense or Microsoft Kinect to capture depth in-formation for 3D pose estimation.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Stereo vision:</bold> Employ stereo cameras to calculate depth and reconstruct 3D hand poses.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.5.3 Real-time Gesture Recognition withGesture Classification Models:</bold> Use trained machine learning models to classify hand gestures in real-time. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks can be used for sequence prediction.</p>
                <p>

                    <bold>4.5.4 Real-time rendering and visualization</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Graphics libraries:</bold> Use OpenGL, DirectX, or Vulkan for rendering hand models and animations in real-time.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Game engines:</bold> Unity
                                <sup>
                                    <xref ref-type="bibr" rid="ref55">58</xref>
                                </sup> or Unreal Engine can be used for real-time rendering in VR/AR applications.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.5.5 Real-time interaction and feedback</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Haptic feedback:</bold> Use haptic devices to provide real-time tactile feedback based on hand interactions.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Real-time collaboration:</bold> Enable multiple users to interact with hand gestures in a shared virtual environment.</p>
                        </list-item>
                    </list>
                </p>
                <p>

                    <bold>4.5.6 Performance optimization</bold>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Hardware acceleration:</bold> Use GPUs or specialized hardware like NVIDIA Jetson for faster processing.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Model optimization:</bold> Apply model compression techniques like quantization and pruning to reduce latency.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>

                                <bold>Parallel processing:</bold> Use multi-threading or parallel processing to handle multiple tasks simultaneously.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec16">
            <title>5. Anatomical and biological considerations</title>
            <sec id="sec17">
                <title>5.1 Variability in hand shapes and sizes</title>
                <p>Human hands vary widely in shape, size, and dexterity. AI models trained on a limited dataset may not generalize well to the diverse range of human hands. This variability necessitates the creation of more robust models that can accommodate different hand anatomies, which is a challenging task given the current state of AI technology.</p>
                <p>

                    <bold>5.1.1 Anatomical complexity on bone structure based 3D skeletal models:</bold> AI systems should use detailed 3D skeletal models that include the 27 bones of the hand, such as the carpals, metacarpals, and phalanges. These models can be created from 3D scans of real hands using tools like photogrammetry or depth sensors. This consists example of 3D scanning tools like Artec Eva to capture hand models and use software like Blender
                    <sup>
                        <xref ref-type="bibr" rid="ref46">57</xref>
                    </sup> or Autodesk Maya
                    <sup>
                        <xref ref-type="bibr" rid="ref50">53</xref>
                    </sup> to create and manipulate these models.</p>
                <p>

                    <bold>5.1.1.1 Joint flexibility based articulated hand models:</bold> Implement articulated hand models with kinematic chains to represent the finger joints. This allows each finger joint to move independently within physiological limits, using forward and inverse kinematics for realistic hand movements. This includes example of Using a physics engine like Bullet or PhysX to handle joint constraints and movements like- Load hand model URDF.</p>
                <p>

                    <bold>5.1.1.2 Musculature and tendons based muscle simulation:</bold> Simulate muscle contraction and tendon forces to produce realistic hand movements. This can be achieved with biomechanical models that translate muscle activations into joint torques. For example, of using the Finite Element Method (FEM) to simulate the deformation of muscles and tendons like system defining the mesh and function space for FEM simulation.</p>
                <p>

                    <bold>5.1.2 Biological diversity and dataset diversity comprehended data collection:</bold> Collect datasets with diverse hand shapes, sizes, and conditions, ensuring representation across age, gender, ethnicity, and hand condition. For example, the use of data augmentation techniques to artificially increase dataset diversity.</p>
                <p>

                    <bold>5.1.2.1 Adaptive algorithms influenced neural networks:</bold> Utilize models like Adaptive Resonance Theory (ART) or dynamic neural networks that can adjust to new data during inference. For example, the Implement online learning algorithms to continuously update the model with new hand data.</p>
                <p>

                    <bold>5.1.2.2 Hierarchical model based layered representation:</bold> Develop a hierarchical hand model where bones, joints, muscles, and skin are modelled separately but interactively. For example, the Use of a physics engine like PyBullet to simulate interactions between these layers like Simulate muscle contraction affecting joint angles</p>
            </sec>
            <sec id="sec18">
                <title>5.2 Sensory feedback and adaptation</title>
                <p>Human hands rely heavily on sensory feedback from the environment to perform tasks. This feedback loop allows for continuous adjustment and adaptation, enabling precise control of hand movements. AI systems, however, lack this inherent sensory feedback mechanism, making it difficult for them to adapt to dynamic environments and perform tasks with the same level of precision as human hands.</p>
                <p>

                    <bold>5.2.1 Proprioception and tactile sensation for proprioception on sensor integration:</bold> Integrate sensors to capture hand position and movement data. Use IMUs (Inertial Measurement Units) and joint angle sensors. For example: Implement sensor fusion algorithms to combine data from multiple sensors for improved accuracy like - Kalman filter for sensor fusion.</p>
                <p>

                    <bold>5.2.1.1 Tactile sensation and haptic feedback:</bold> Use haptic devices to simulate tactile sensations. Devices like the Geomatic Touch provide force feedback to simulate touch. For example: Implement haptic rendering algorithms to convert virtual interactions into haptic feedback.</p>
                <p>

                    <bold>5.2.1.2 Real-time motor controlled algorithms:</bold> Use PID controllers or neural network-based controllers to adjust hand movements in real-time based on sensory feedback. Such proper example is of Implement PID control for precise hand movement adjustments.</p>
                <p>

                    <bold>5.2.2 Neural plasticity and learning and adaptive models:</bold> Develop models that can adapt to changes in sensory input over time. Use reinforcement learning or continual learning techniques to improve performance with experience. This consists example of Train agents using reinforcement learning to adapt to dynamic environments and varying sensory feedback.</p>
                <p>

                    <bold>5.2.2.1 Experience and learning (Supervised and unsupervised learning):</bold> Combine supervised learning for initial training with unsupervised learning to refine the model based on new data. For Example: Use self-supervised learning techniques to label data automatically and improve the model&#x2019;s performance without extensive manual labeling.</p>
            </sec>
        </sec>
        <sec id="sec19">
            <title>6. Practical implication and application</title>
            <sec id="sec20">
                <title>6.1 Robotics</title>
                <p>

                    <bold>6.1.1 Human-robot interaction:</bold> Grasping and Manipulation consists of Robots equipped with advanced AI hand models can perform complex tasks involving human-like dexterity, such as handling delicate objects, opening containers, or assembling intricate components. For example, a robot arm with an adaptive hand model can pick up various objects, from fragile glassware to irregularly shaped tools, by dynamically adjusting its grip based on real-time feedback. Another example is Load the robot model and configure the hand: Loading and configuring a robotic hand involves initializing models, calibrating sensors, and setting up real-time control and feedback mechanisms to ensure precise and adaptive functionality. This includes steps like: Load the Robot Model (Initialization,Hand Configuration), Sensor Calibration (Tactile and Position/Force Sensors), Real-time Feedback and Control (Control Algorithms, Feedback Loops), Object Interaction and Adaptation (Dynamic Grip Adjustment), Testing and Validation (Task Simulation).</p>
                <p>

                    <bold>6.1.2 Adaptive control systems:</bold> This includes properties like Real-time Adjustments. This includes Use real-time control algorithms and sensory feedback to adapt robot actions based on dynamic environments or varying object properties. For example- a robotic hand can use PID controllers to adjust its grip strength in response to changes in object texture or weight.</p>
                <p>

                    <bold>6.1.3 Collaborative robots:</bold> Collaboration with robots embodies Human-Robot Collaboration. This Be composed of Implementation robots that can work alongside humans, using AI-driven hand models to perform tasks that complement human abilities. For example- A collaborative robot (cobot) in a manufacturing line can assist human workers by handling heavy or repetitive tasks while adapting to the worker&#x2019;s movements and actions.</p>
            </sec>
            <sec id="sec21">
                <title>6.2 Virtual and augmented reality</title>
                <p>

                    <bold>6.2.1 Realistic interaction:</bold> This area is incorporated with phenomenon like Immersive Experiences. Here, AI-generated hand models in VR/AR can enhance user immersion by providing realistic and responsive interactions with virtual objects. For example- In VR, accurate hand models enable users to manipulate virtual objects with natural gestures and movements, improving the realism of the experience. Another example is Simulate hand interaction in a virtual environment</p>
                <p>

                    <bold>6.2.2 Haptic feedback:</bold> This Encompasses Enhanced Feedback. The use of AI to simulate tactile sensations in VR/AR environments, allowing users to feel textures, resistances, and forces. For example- Haptic gloves or controllers equipped with AI can provide feedback corresponding to virtual objects, enhancing the sense of touch and improving user interaction.</p>
                <p>

                    <bold>6.2.3 Training and simulation:</bold> This may include Skill Development. The use VR/AR for training scenarios that require precise hand movements or interactions, such as surgical simulations or mechanical repairs. For example- Surgeons can practice complex procedures in a virtual environment with realistic hand movements and haptic feedback, improving their skills without real-world consequences.</p>
            </sec>
            <sec id="sec22">
                <title>6.3 Healthcare and rehabilitation</title>
                <p>

                    <bold>6.3.1 Rehabilitation:</bold> This consists of Assistive Devices that includes AI-driven hand models which can be integrated into rehabilitation devices to assist patients in regaining hand function after injuries or surgeries. For example- Robotic exoskeletons with adaptive hand models can assist patients in performing exercises, adjusting the level of assistance based on real-time feedback from the patient. Load exoskeleton model and simulate rehabilitation exercises is the process where the involvement of initializing the exoskeleton model and simulating various rehabilitation exercises to evaluate and enhance the effectiveness of the exoskeleton in supporting patient recovery. This includes Load the Exoskeleton Model (Model Initialization, Configure the Model) and Simulate Rehabilitation Exercises (Defining Exercises, Run Simulations).</p>
                <p>

                    <bold>6.3.2 Prosthetics:</bold> This includes technology like Advanced Prosthetic Hands. This consists the Develop prosthetic hands that mimic the complexity of natural hand movements, offering improved functionality and user experience. For example, AI-driven prosthetics with adaptive hand models can provide more natural grasping and manipulation capabilities, allowing users to perform daily tasks with greater ease.</p>
                <p>

                    <bold>6.3.3 Diagnosis and monitoring:</bold> This forms concepts like Gesture Analysis. This includes the Use AI to analyse hand gestures and movements for diagnosing conditions or monitoring recovery progress. For example- AI systems can assess hand tremors or dexterity levels to help diagnose neurological conditions or track the effectiveness of rehabilitation interventions.</p>
            </sec>
        </sec>
        <sec id="sec23" sec-type="conclusion">
            <title>7. Conclusion</title>
            <p>Due to several complex problems, AI finds it difficult to draw human hands, particularly when using models like CLIP. Human hands differ widely in size, shape, and movement, therefore obtaining a good approximation requires a huge and diverse training set. The intricate musculature, joints, and bones of the hands add to the difficulty of the work. Replicating movement and detail accurately requires complex models. Artificial intelligence (AI) models find it challenging to dynamically adjust to changing settings because they frequently lack the sensory feedback associated with human control. Even with recent improvements, real-time hand identification and keypoint tracking are still unable to fully capture the subtleties of hand interactions and motions. Applications like virtual and augmented reality, robotics, and healthcare all depend on high fidelity hand modeling. The current limitations of AI in this field highlight the need for continual advancements in data collection, model training, and real-time processing. These cutting-edge technologies could enhance AI&#x2019;s ability to more accurately and efficiently mimic and sketch human hands.</p>
        </sec>
        <sec id="sec24">
            <title>8. Future Scope</title>
            <p>One of the biggest challenges in accurately modeling the human hand is capturing the body tissue. These improvements enable AI systems to better understand and reproduce simple movements and gestures, resulting in more precise and accurate gestures. Crowdsourcing platforms provide a low-cost, cost-effective way to collect this data and ensure that AI models can be integrated and transferred to diverse populations. Research on new neural network architectures such as Transformers and graph neural networks will provide powerful tools for modeling human muscles. These architectures improve vision and shape recognition, resulting in more accurate hand patterns. Training models and multiple data, including images, statistics, and motion, can provide a comprehensive understanding of hand movements, and the technology can improve the ability of artificial intelligence to repeat hand movements and daily behaviours. Developing real-time tracking and rendering software solutions can reduce downtime and improve the user experience in applications such as VR and AR. Develop highly optimized algorithms (low-latency algorithms) to analyze hand movements in real time to ensure that artificial intelligence systems can run smoothly on consumer devices. It could include additional tactile feedback, adaptive control systems, and more. Combined with an advanced tactile system, different emotions can be simulated, making the hand model generated by artificial intelligence more realistic and effective. These improvements can improve the user experience in virtual environments by providing an immersive and interactive experience. Developing control systems that adapt AI feedback based on real-time input will lead to more accurate manual interactions. This approach can improve the fidelity of hand movements generated by artificial intelligence and assistive applications such as medical simulation and prosthetic design. Analyzing the diversity of hand shape, size, and movement patterns across different populations will help create more inclusive and accurate AI models. For human-robot interactions, this can improve the grasping algorithms in robots to handle various objects of different shapes, sizes and textures, thus increasing the effectiveness of clever hand patterns. These avatars can provide users with a natural and immersive experience, making the virtual environment even more immersive. Using artificial intelligence to develop smart functions that can adapt to a user&#x2019;s unique movement patterns and provide sensory feedback improves performance and user experience. These prosthetics can provide natural and intelligent interactions and improve the lives of their users. The next innovation is the use of artificial intelligence to create interactive learning platforms that interpret hand movements to enable hands-on learning experiences in virtual environments. Other efforts include developing features to make interactive wearable technology more accessible to people with disabilities, which is important for inclusive design. This ability allows everyone to benefit from the advances in hand-sensing modeling, regardless of physical ability. Working with neuroscientists to better understand the brain&#x2019;s control of hand movements and incorporating these insights into artificial intelligence models can demonstrate the accuracy and precision of the hand. Developing cognitive models that simulate human cognitive processes related to hand movements can improve artificial intelligence to predict and repeat complex movements. These models provide a deeper understanding of human hand dynamics and improve AI performance. Other research suggests research and development of systems that allow humans and AI to adapt, learn from each other to improve manual interaction and control in over time, AI systems will become more intelligent and effective.</p>
        </sec>
        <sec id="sec25">
            <title>Ethics and consent</title>
            <p>All data, figures, and diagrams used in this study were either generated by the author(s) or obtained from publicly available repositories on platforms such as Kaggle and GitHub.</p>
            <p>The data used from these platforms are subject to the respective licensing terms provided by the original contributors. The author(s) confirm that:
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>For data obtained from 
                            <bold>Kaggle</bold>, usage complied with the terms of the associated license specified by the dataset creator. Any restrictions or conditions set forth by the dataset provider have been respected.</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>
For code or resources obtained from 
                            <bold>GitHub</bold>, usage adhered to the terms of the repository&#x2019;s stated license (e.g., MIT License, Apache License, GPL). Proper credit has been provided to the original contributors where required.</p>
                    </list-item>
                </list>
            </p>
            <p>No sensitive or personally identifiable information is included in the data. As the datasets and resources are publicly available and appropriately licensed, no additional ethical approval was required for their use in this study.</p>
            <p>The author(s) affirm that all figures, diagrams, and outputs derived from these sources were created with due consideration of copyright, licensing, and usage rights. If requested, the detailed license information and attribution for any third-party data or code used can be provided.</p>
            <p>No ethical approval needed as data used from online repository.</p>
        </sec>
    </body>
    <back>
        <sec id="sec29" sec-type="data-availability">
            <title>Data availability</title>
            <p>O&#x2019;Reilly: Deep Learning with TensorFlow and Keras - Third Edition 
                <ext-link ext-link-type="uri" xlink:href="https://www.oreilly.com/library/view/deep-learning-with/9781803232911/">https://www.oreilly.com/library/view/deep-learning-with/9781803232911/</ext-link>
            </p>
            <p>OpenAI: CLIP: Connecting text and images 
                <ext-link ext-link-type="uri" xlink:href="https://openai.com/index/clip/">https://openai.com/index/clip/</ext-link>
            </p>
            <p>For code or resources obtained from GitHub, 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/openai/CLIP">https://github.com/openai/CLIP</ext-link>, usage adhered to the terms of the repository&#x2019;s stated license MIT License</p>
            <p>Some resources can be adapted from OpenAI CLIP simple implementation that consists of CLIP models on Keras code from scratch in PyTorch.
                <sup>
                    <xref ref-type="bibr" rid="ref56">44</xref>
                </sup> Although OpenAI has open-sourced parts of CLIP, e.g.- a dataset of OpenAI&#x2019;s CLIP model, VIT-LARGE-14-PATCH,
                <sup>
                    <xref ref-type="bibr" rid="ref57">45</xref>
                </sup> the code can be complex and overwhelming.[
                <ext-link ext-link-type="uri" xlink:href="https://github.com/moein-shariatnia/OpenAI-CLIP">https://github.com/moein-shariatnia/OpenAI-CLIP
</ext-link> , MIT License</p>
            <p>We have not used any or npor generated any extended data the reference link is as follows: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/moein-shariatnia/OpenAI-CLIP">https://github.com/moein-shariatnia/OpenAI-CLIP
</ext-link>
            </p>
        </sec>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Keyes</surname>
                            <given-names>OK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hyland</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Hands are hard: unlearning how we talk about machine learning in the arts.</article-title>
                    <source>

                        <italic toggle="yes">Tradition Innovations in Arts, Design, and Media Higher Education.</italic>
</source>
                    <year>2023</year>;<volume>1</volume>(<issue>1</issue>):<fpage>4</fpage>.
                    <pub-id pub-id-type="doi">10.9741/2996-4873.1004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rajaratnam</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>May</surname>
                            <given-names>STS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jerome</surname>
                            <given-names>JTJ</given-names>
                        </name>
</person-group>:
                    <article-title>Precision at hand: Revolutionising surgery with generative AI.</article-title>
                    <source>

                        <italic toggle="yes">J. Hand Microsurg.</italic>
</source>
                    <year>2024</year>;<volume>16</volume>:<fpage>100090</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jham.2024.100090</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Salman</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zakir</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Takahashi</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Cascaded deep graphical convolutional neural network for 2D hand pose estimation.</article-title>
                    <year>2023</year>;<volume>12592</volume>: pp.<fpage>1259215</fpage>&#x2013;<lpage>1259215-6</lpage>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schneider</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Optimizing human hand gestures for AI-systems.</article-title>
                    <source>

                        <italic toggle="yes">AI Commun.</italic>
</source>
                    <year>2022</year>;<volume>35</volume>:<fpage>153</fpage>&#x2013;<lpage>169</lpage>.
                    <pub-id pub-id-type="doi">10.3233/AIC-210081</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Salisbury</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>Issues in human/computer control of dexterous remote hands.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Aerosp. Electron. Syst.</italic>
</source>
                    <year>1988</year>;<volume>24</volume>:<fpage>591</fpage>&#x2013;<lpage>596</lpage>.
                    <pub-id pub-id-type="doi">10.1109/7.9687</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wheatland</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Song</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">State of the Art in Hand and Finger Modeling and Animation.</italic>
</source>
                    <publisher-name>Computer Graphics Forum</publisher-name>;<year>2015</year>;<volume>34</volume>.</mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gao</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ju</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <article-title>Robust real-time hand detection and localization for space human-robot interaction based on deep learning.</article-title>
                    <source>

                        <italic toggle="yes">Neurocomputing.</italic>
</source>
                    <year>2020</year>;<volume>390</volume>:<fpage>198</fpage>&#x2013;<lpage>206</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.neucom.2019.02.066</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhunia</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Das</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Muhammad</surname>
                            <given-names>U</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Pixelor: a competitive sketching AI agent. so you think you can sketch?</article-title>
                    <source>

                        <italic toggle="yes">ACM Trans. Graph.</italic>
</source>
                    <year>2020</year>;<volume>39</volume>:<fpage>1</fpage>&#x2013;<lpage>166:15</lpage>.
                    <pub-id pub-id-type="doi">10.1145/3414685.3417840</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gioioso</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salvietti</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Malvezzi</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mapping Synergies From Human to Robotic Hands With Dissimilar Kinematics: An Approach in the Object Domain.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Robot.</italic>
</source>
                    <year>2013</year>;<volume>29</volume>:<fpage>825</fpage>&#x2013;<lpage>837</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TRO.2013.2252251</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rijpkema</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Girard</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Computer animation of knowledge-based human grasping. Proceedings of the 18th annual conference on Computer graphics and interactive techniques.</article-title>
                    <year>1991</year>.</mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <chapter-title>Research on rapid generation of 3D models based on art and design cognitive models.</chapter-title>
                    <source>

                        <italic toggle="yes">SHS Web of Conferences.</italic>
</source>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Hands-Free Human-Computer Interface Based on Facial Myoelectric Pattern Recognition.</article-title>
                    <source>

                        <italic toggle="yes">Front. Neurol.</italic>
</source>
                    <year>2019</year>;<volume>10</volume>.
                    <pub-id pub-id-type="pmid">31114539</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fneur.2019.00444</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6503102</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ficuciello</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Synergy-Based Control of Underactuated Anthropomorphic Hands.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Industr. Inform.</italic>
</source>
                    <year>2019</year>;<volume>15</volume>:<fpage>1144</fpage>&#x2013;<lpage>1152</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TII.2018.2841043</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Coeckelbergh</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Artificial Intelligence, Responsibility Attribution, and a Relational Justification of Explainability.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Eng. Ethics.</italic>
</source>
                    <year>2019</year>;<volume>26</volume>:<fpage>2051</fpage>&#x2013;<lpage>2068</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s11948-019-00146-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yao</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Human-Machine Interaction Sensing Technology Based on Hand Gesture Recognition: A Review.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Hum. Mach. Syst.</italic>
</source>
                    <year>2021</year>;<volume>51</volume>:<fpage>300</fpage>&#x2013;<lpage>309</lpage>.
                    <pub-id pub-id-type="doi">10.1109/THMS.2021.3086003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <chapter-title>An AI Edge Computing-Based Intelligent Hand Painting Teaching System.</chapter-title>
                    <source>

                        <italic toggle="yes">2022 IEEE 11th Global Conference on Consumer Electronics (GCCE).</italic>
</source>
                    <year>2022</year>; pp.<fpage>942</fpage>&#x2013;<lpage>943</lpage>.</mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Battaglia</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hamrick</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bapst</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Relational inductive biases, deep learning, and graph networks.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv, abs/1806.01261.</italic>
</source>
                    <year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mattar</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>A survey of bio-inspired robotics hands implementation: New directions in dexterous manipulation.</article-title>
                    <source>

                        <italic toggle="yes">Robot. Auton. Syst.</italic>
</source>
                    <year>2013</year>;<volume>61</volume>(<issue>5</issue>):<fpage>517</fpage>&#x2013;<lpage>544</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.robot.2012.12.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>CC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thai</surname>
                            <given-names>MT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Advanced user interfaces for teleoperated surgical robotic systems.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Sensor Res.</italic>
</source>
                    <year>2023</year>;<volume>2</volume>(<issue>4</issue>):<fpage>2200036</fpage>.
                    <pub-id pub-id-type="doi">10.1002/adsr.202200036</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hazra</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chatterjee</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Future Prospects Analysis in Healthcare Management Using Machine Learning Algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Eng. Sci. Invention (IJESI).</italic>
</source>
                    <issn>2319-6734</issn>.</mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carf&#x00ec;</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mastrogiovanni</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Gesture-based human&#x2013;machine interaction: Taxonomy, problem definition, and analysis.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Cybern.</italic>
</source>
                    <year>2021</year>;<volume>53</volume>(<issue>1</issue>):<fpage>497</fpage>&#x2013;<lpage>513</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TCYB.2021.3129119</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nyatsanga</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kucherenko</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ahuja</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Comprehensive Review of Data-Driven Co-Speech Gesture Generation.</article-title>
                    <source>

                        <italic toggle="yes">Computer Graphics Forum.</italic>
</source>
                    <year>2023, May</year>;<volume>42</volume>(<issue>2</issue>):<fpage>569</fpage>&#x2013;<lpage>596</lpage>.
                    <pub-id pub-id-type="doi">10.1111/cgf.14776</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmitz</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>R&#x00f6;sch</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zingsheim</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Interactive pose and shape editing with simple sketches from different viewing angles.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Graph.</italic>
</source>
                    <year>2023</year>;<volume>114</volume>:<fpage>347</fpage>&#x2013;<lpage>356</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.cag.2023.06.024</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Balaji</surname>
                            <given-names>AN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Peh</surname>
                            <given-names>LS</given-names>
                        </name>
</person-group>:
                    <article-title>AI-On-Skin: Towards Enabling Fast and Scalable On-body AI Inference for Wearable On-Skin Interfaces.</article-title>
                    <source>

                        <italic toggle="yes">Proc. ACM Hum.-Comput. Interact.</italic>
</source>
                    <year>2023</year>;<volume>7</volume>(<issue>EICS</issue>):<fpage>1</fpage>&#x2013;<lpage>34</lpage>.
                    <pub-id pub-id-type="doi">10.1145/3593239</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deng</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Hand pose understanding with large-scale photo-realistic rendering dataset.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Image Process.</italic>
</source>
                    <year>2021</year>;<volume>30</volume>:<fpage>4275</fpage>&#x2013;<lpage>4290</lpage>.
                    <pub-id pub-id-type="pmid">33826515</pub-id>
                    <pub-id pub-id-type="doi">10.1109/TIP.2021.3070439</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>MK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Artificial intelligence in meta-optics.</article-title>
                    <source>

                        <italic toggle="yes">Chem. Rev.</italic>
</source>
                    <year>2022</year>;<volume>122</volume>(<issue>19</issue>):<fpage>15356</fpage>&#x2013;<lpage>15413</lpage>.
                    <pub-id pub-id-type="pmid">35750326</pub-id>
                    <pub-id pub-id-type="doi">10.1021/acs.chemrev.2c00012</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9562283</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chormai</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Machine learning of large-scale multimodal brain imaging data reveals neural correlates of hand preference.</article-title>
                    <source>

                        <italic toggle="yes">NeuroImage.</italic>
</source>
                    <year>2022</year>;<volume>262</volume>:<fpage>119534</fpage>.
                    <pub-id pub-id-type="pmid">35931311</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.neuroimage.2022.119534</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mathew</surname>
                            <given-names>SP</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Predicting Functional Use of the Non-Dominant Hand using Machine Learning and Wearable Accelerometers.</italic>
</source>
                    <publisher-loc>Canada</publisher-loc>:
                    <publisher-name>University of Toronto</publisher-name>;<year>2022</year>. (Master&#x2019;s thesis).</mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hampali</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rad</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oberweger</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Honnotate: A method for 3d annotation of hand and object poses.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.</italic>
</source>
                    <year>2020</year>; pp.<fpage>3196</fpage>&#x2013;<lpage>3206</lpage>.</mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chevtchenko</surname>
                            <given-names>SF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vale</surname>
                            <given-names>RF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Macario</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A convolutional neural network with feature fusion for real-time hand posture recognition.</article-title>
                    <source>

                        <italic toggle="yes">Appl. Soft. Comput.</italic>
</source>
                    <year>2018</year>;<volume>73</volume>:<fpage>748</fpage>&#x2013;<lpage>766</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.asoc.2018.09.010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deng</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2407.10575.</italic>
</source>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Naji</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jalab</surname>
                            <given-names>HA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kareem</surname>
                            <given-names>SA</given-names>
                        </name>
</person-group>:
                    <article-title>A survey on skin detection in colored images.</article-title>
                    <source>

                        <italic toggle="yes">Artif. Intell. Rev.</italic>
</source>
                    <year>2019</year>;<volume>52</volume>:<fpage>1041</fpage>&#x2013;<lpage>1087</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s10462-018-9664-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hazra</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <chapter-title>Review on Social and Ethical Concerns of Generative AI and IoT.</chapter-title>
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Raza</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ahmad</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Singh</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>, editors.
                    <source>

                        <italic toggle="yes">Generative AI: Current Trends and Applications. Studies in Computational Intelligence.</italic>
</source>Vol<volume>1177</volume>.
                    <publisher-loc>Singapore</publisher-loc>:
                    <publisher-name>Springer</publisher-name>;<year>2024</year>.
                    <pub-id pub-id-type="doi">10.1007/978-981-97-8460-8_13</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <label>34</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thien</surname>
                            <given-names>NP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dang</surname>
                            <given-names>CN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trinh</surname>
                            <given-names>TT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Machine Learning for Enhanced Exercise Performance and Planning.</chapter-title>
                    <source>

                        <italic toggle="yes">International Conference on Future Data and Security Engineering.</italic>
</source>
                    <publisher-loc>Singapore</publisher-loc>:
                    <publisher-name>Springer Nature Singapore</publisher-name>;<year>2024, November</year>; pp.<fpage>249</fpage>&#x2013;<lpage>263</lpage>.</mixed-citation>
            </ref>
            <ref id="ref34">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mundt</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thomsen</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Witter</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Prediction of lower limb joint angles and moments during gait using artificial neural networks.</article-title>
                    <source>

                        <italic toggle="yes">Med. Biol. Eng. Comput.</italic>
</source>
                    <year>2020</year>;<volume>58</volume>:<fpage>211</fpage>&#x2013;<lpage>225</lpage>.
                    <pub-id pub-id-type="pmid">31823114</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s11517-019-02061-3</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kruisselbrink</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dangol</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rosemann</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Photometric measurements of lighting quality: An overview.</article-title>
                    <source>

                        <italic toggle="yes">Build. Environ.</italic>
</source>
                    <year>2018</year>;<volume>138</volume>:<fpage>42</fpage>&#x2013;<lpage>52</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.buildenv.2018.04.028</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kruisselbrink</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dangol</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rosemann</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Photometric measurements of lighting quality: An overview.</article-title>
                    <source>

                        <italic toggle="yes">Build. Environ.</italic>
</source>
                    <year>2018</year>;<volume>138</volume>:<fpage>42</fpage>&#x2013;<lpage>52</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.buildenv.2018.04.028</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>38</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sai</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gaur</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sai</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">Generative ai for transformative healthcare: A comprehensive study of emerging models, applications, case studies and limitations.</italic>
</source>
                    <publisher-name>IEEE Access</publisher-name>;<year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref38">
                <label>39</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pennanen</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Linkola</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kantosalo</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>From Product to Producer: The Impact of Perceptual Evidence and Machine Embodiment on the Human Assessment of AI Creativity.</article-title>
                    <source>

                        <italic toggle="yes">PsyArXiv.</italic>
</source>
                    <year>2023 October, 6</year>.</mixed-citation>
            </ref>
            <ref id="ref39">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dey</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Billinghurst</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lindeman</surname>
                            <given-names>RW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A systematic review of 10 years of augmented reality usability studies: 2005 to 2014.</article-title>
                    <source>

                        <italic toggle="yes">Front. Robot. AI.</italic>
</source>
                    <year>2018</year>;<volume>5</volume>:<fpage>37</fpage>.
                    <pub-id pub-id-type="pmid">33500923</pub-id>
                    <pub-id pub-id-type="doi">10.3389/frobt.2018.00037</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7805955</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <label>41</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gupta</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Detecting multimedia generated by large ai models: A survey.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2402.00045.</italic>
</source>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref41">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>SH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enhancing 3D hand pose estimation using SHaF: synthetic hand dataset including a forearm.</article-title>
                    <source>

                        <italic toggle="yes">Appl. Intell.</italic>
</source>
                    <year>2024</year>;<volume>54</volume>:<fpage>9565</fpage>&#x2013;<lpage>9578</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s10489-024-05665-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>43</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duan</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhai</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2405.07346.</italic>
</source>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref56">
                <label>44</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kapoor</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gulli</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pal</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">Deep Learning with TensorFlow and Keras: Build and deploy supervised, unsupervised, deep, and reinforcement learning models.</italic>
</source>
                    <publisher-name>Packt Publishing Ltd.</publisher-name>;<year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref57">
                <label>45</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shen</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>ViTamin: Designing Scalable Vision Models in the Vision-Language Era.</chapter-title>
                    <source>

                        <italic toggle="yes">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.</italic>
</source>
                    <year>2024</year>; pp.<fpage>12954</fpage>&#x2013;<lpage>12966</lpage>.</mixed-citation>
            </ref>
            <ref id="ref43">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Redaelli</surname>
                            <given-names>DF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barsanti</surname>
                            <given-names>SG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Biffi</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Comparison of geometrical accuracy of active devices for 3D orthopaedic reconstructions.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Adv. Manuf. Technol.</italic>
</source>
                    <year>2021</year>;<volume>114</volume>(<issue>1</issue>):<fpage>319</fpage>&#x2013;<lpage>342</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00170-021-06778-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deng</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Hand pose understanding with large-scale photo-realistic rendering dataset.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans. Image Process.</italic>
</source>
                    <year>2021</year>;<volume>30</volume>:<fpage>4275</fpage>&#x2013;<lpage>4290</lpage>.
                    <pub-id pub-id-type="pmid">33826515</pub-id>
                    <pub-id pub-id-type="doi">10.1109/TIP.2021.3070439</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <label>48</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jin</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Whole-body human pose estimation in the wild.</chapter-title>
                    <source>

                        <italic toggle="yes">Computer Vision&#x2013;ECCV 2020: 16th European Conference, Glasgow, UK, August 23&#x2013;28, 2020, Proceedings, Part IX 16.</italic>
</source>
                    <publisher-name>Springer International Publishing</publisher-name>;<year>2020</year>; pp.<fpage>196</fpage>&#x2013;<lpage>214</lpage>.</mixed-citation>
            </ref>
            <ref id="ref47">
                <label>49</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sharma</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Angleraud</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pieters</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <chapter-title>Multi-label Annotation for Visual Multi-Task Learning Models.</chapter-title>
                    <source>

                        <italic toggle="yes">2023 Seventh IEEE International Conference on Robotic Computing (IRC).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;<year>2023, December</year>; pp.<fpage>31</fpage>&#x2013;<lpage>34</lpage>.</mixed-citation>
            </ref>
            <ref id="ref48">
                <label>50</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Praklja&#x010d;i&#x0107;</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grbi&#x0107;</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vranje&#x0161;</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Tool for image annotation in context of modern object detection.</chapter-title>
                    <source>

                        <italic toggle="yes">2024 Zooming Innovation in Consumer Technologies Conference (ZINC).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;<year>2024, May</year>; pp.<fpage>48</fpage>&#x2013;<lpage>53</lpage>.</mixed-citation>
            </ref>
            <ref id="ref49">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hansen</surname>
                            <given-names>US</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Landau</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Patel</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Novel artificial intelligence-driven software significantly shortens the time required for annotation in computer vision projects.</article-title>
                    <source>

                        <italic toggle="yes">Endosc. Int. Open.</italic>
</source>
                    <year>2021</year>;<volume>09</volume>(<issue>04</issue>):<fpage>E621</fpage>&#x2013;<lpage>E626</lpage>.
                    <pub-id pub-id-type="doi">10.1055/a-1341-0689</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref58">
                <label>52</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hejabi</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Padte</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Golazizian</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>CVAT-BWV: A Web-Based Video Annotation Platform for Police Body-Worn Video.</chapter-title>
                    <source>

                        <italic toggle="yes">International Joint Conferences on Artificial Intelligence Organization.</italic>
</source>
                    <year>2024, August</year>.</mixed-citation>
            </ref>
            <ref id="ref50">
                <label>53</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>WC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>MH</given-names>
                        </name>
</person-group>:
                    <chapter-title>The Influence of Applying Digital Toolkits to Assist 3D Software Development on Individual Work Performance and Perceived Stress.</chapter-title>
                    <source>

                        <italic toggle="yes">2024 10th International Conference on Applied System Innovation (ICASI).</italic>
</source>
                    <publisher-name>IEEE</publisher-name>;<year>2024, April</year>; pp.<fpage>37</fpage>&#x2013;<lpage>39</lpage>.</mixed-citation>
            </ref>
            <ref id="ref51">
                <label>54</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mal&#x00fd;</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Real-time strategy videogame toolkit for Godot Engine.</article-title>
                    <year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref53">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hachaj</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ogiela</surname>
                            <given-names>MR</given-names>
                        </name>
</person-group>:
                    <article-title>RMoCap: an R language package for processing and kinematic analyzing motion capture data.</article-title>
                    <source>

                        <italic toggle="yes">Multimed. Syst.</italic>
</source>
                    <year>2020</year>;<volume>26</volume>(<issue>2</issue>):<fpage>157</fpage>&#x2013;<lpage>172</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00530-019-00633-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref54">
                <label>56</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gupta</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hazra</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hazra</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Mathematical Models of Heterogeneous Machine Learning Techniques for Ransomware Protection in Cyber-Physical Systems.</chapter-title>
                    <source>

                        <italic toggle="yes">2024 IEEE International Conference on Communication, Computing and Signal Processing (IICCCS).</italic>
</source>
                    <publisher-loc>India</publisher-loc>:
                    <publisher-name>ASANSOL</publisher-name>;<year>2024</year>; pp.<fpage>1</fpage>&#x2013;<lpage>5</lpage>.
                    <pub-id pub-id-type="doi">10.1109/IICCCS61609.2024.10763581</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <label>57</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Villar</surname>
                            <given-names>O</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Learning Blender.</italic>
</source>
                    <publisher-name>Addison-Wesley Professional</publisher-name>;<year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref55">
                <label>58</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jeong-Shick</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>Unity: A Powerful Tool for 3D Computer Animation Production.</article-title>
                    <source>

                        <italic toggle="yes">J. Korea Comput. Graphics Soc.</italic>
</source>
                    <year>2023</year>;<volume>29</volume>(<issue>3</issue>):<fpage>45</fpage>&#x2013;<lpage>57</lpage>.
                    <pub-id pub-id-type="doi">10.15701/kcgs.2023.29.3.45</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report367392">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.175451.r367392</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Cheng</surname>
                        <given-names>Mingyong</given-names>
                    </name>
                    <xref ref-type="aff" rid="r367392a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-9830-9652</uri>
                </contrib>
                <aff id="r367392a1">
                    <label>1</label>University of California San Diego, San Diego, California, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>3</month>
                <year>2025</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Cheng M</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport367392" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.159688.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This paper lacks scientific rigor, empirical evidence, and a coherent research structure. While it claims to analyze CLIP&#x2019;s ability to generate hands, it fails to present any dataset, experimental setup, or meaningful evaluation. Instead of offering concrete findings, it reiterates well-known limitations of AI-generated hands without substantiating them with data. The methodology section is vague and reads more like a generic overview of CLIP rather than a structured research process. Key elements such as dataset selection, baseline comparisons, control variables, and evaluation metrics are either absent or superficially mentioned without real application. The 
                <italic>Related Work</italic> section is an unstructured compilation of papers, many of which are unrelated to the topic, with no synthesis of how they inform the study. Additionally, the writing is redundant, with concepts appearing multiple times in different sections without adding depth. The paper anthropomorphizes AI in an unhelpful manner, using phrases like "AI is like a human trapped in a museum," which add little technical value. The conclusion and future scope sections propose vague solutions without demonstrating how they could be implemented or validated.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>No</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Creative AI Application, Generative AI &amp; Art</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
