Normative data for Lezak ’ s Tinkertoy test in healthy Italian adults

The Tinkertoy test is a tool for the neuropsychological assessment of executive functions and a predictor of employability. Originally a children’s toy comprising pieces to assemble freely, the TinkerToy Test examines organizational abilities, planning, and response flexibility. It allows subjects to use their own initiative and does not force them to choose from a series of predetermined alternatives. Tinkertoy test normative values were collected from 256 neurologically healthy Italian subjects. Multivariable analysis showed sex and education to have significant confounding effects. Adjusted and inferential cut-off points were determined and converted into equivalent scores, applying a distribution-free technique. Franca Crippa ( ) Corresponding author: franca.crippa@unimib.it Crippa F, Cesana L and Daini R. How to cite this article: Normative data for Lezak’s Tinkertoy test in healthy Italian adults [version 1; 2016, :727 (doi: ) referees: 2 approved with reservations] F1000Research 5 10.12688/f1000research.8409.1 © 2016 Crippa F . This is an open access article distributed under the terms of the , which Copyright: et al Creative Commons Attribution Licence permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the (CC0 1.0 Public domain dedication). Creative Commons Zero "No rights reserved" data waiver The author(s) declared that no grants were involved in supporting this work. Grant information: Competing interests: No competing interests were disclosed. 22 Apr 2016, :727 (doi: ) First published: 5 10.12688/f1000research.8409.1 1 2


Introduction
Executive functions are the cognitive capacities that control lowerlevel functions and are essential to future-oriented thought and behaviour.They are affected by head injuries 24 or arise because of a focal frontal lesion, either cortical 5,24,25 or subcortical 13 .In particular, the term executive functions refers to cognitive, emotional and behavioural aspects of conduct involved in achieving a specific purpose.Executive functions include processes that are complex, mixed together and in constant interaction.They facilitate the optimum adaptation of the individual to the environment 1,16,23,24 .Lezak 16 suggested the division of competencies into four specific components: volition, planning, goal-oriented behaviour and effective performance.Several psychometric instruments are available for evaluating executive functions during neuropsychological examinations.However, most of them are generally highly structured (the task and the stimuli set the goal and the processes required to achieve that goal) 15,16 .Moreover, none of the tools currently in use for evaluating the performance in the domain of executive functions is able to assess how the patients are able to formulate a goal and to plan how to pursue it, which are prerequisites for a return to work as well as for social life.
The Tinkertoy was created originally as a toy for kids made of various wooden and plastic pieces (wooden dowels, knobs, wheels, connectors, caps, points), to be assembled freely in order to make constructions.Based on this toy, Lezak created a test for the neuropsychological assessment of executive functions, which gives subjects the opportunity to use their own initiative and does not force them to choose from a series of predetermined alternatives.In fact, one of the most relevant characteristics of frontal lobe syndrome is an environment-dependent behaviour, which makes it difficult to cope with the requirements of everyday life.In this respect, Lezak's Tinkertoy test (TTT) stands out, because it was specifically conceived to examine the ability to generate the most achievable goal, to organize, to plan and act, and to respond in a flexible way in a given context 15 .At the outset, studies of the TTT showed that it could be considered a useful predictor of employability 3,15 .Particularly, some researchers found the TTT complexity score correlated more positively with the employability of traumatic braindamaged patients, than other tests for executive functions, such as trail making test -B, maze tracking, and several WAIS-R subtests 9,10 .Ownsworth and Shum 20 showed that the difference in TTT scores between employed and unemployed patients after strokes was highly significant (p < 0.005 in a group of 27 subjects.)According to the authors, the TTT seems to describe productivity outcomes better than other tests of executive functioning (i.e. the FAS test and the five -point test), independent of the presence of hemiplegia and the elapsed time since the stroke.Furthermore, the TTT has been shown to be useful both for differentiating between types of dementia and for evaluating the severity of dementia 12,17 .In addition, subjects generally find the TTT interesting or amusing, so that it is easy to carry out this test, even in the cases of patients who are not very cooperative.Despite the fact that the TTT is commonly used in a clinical context, the only normative data refers to a very small sample of non-Italians 16 .Given both the potential relevance of this instrument for neuropsychological practice, and the lack of any validation so far, the present study aimed at setting TTT normative values in Italian adults, in order to determine firstly inferential cut-off points and their tolerance limits, and then equivalent scores, applying a statistical technique developed for neuropsychological tests 7,8,22 .

Participants
Two hundred and fifty six neurologically unimpaired Italian subjects (mean age: 44.6; sd 20.85, range: 15-86 years) enrolled in this study on a voluntary basis, with verbal consent.The Research Ethics Committee of the University of Milano-Bicocca approved this (permit number: RM-2016-40) as a minimal risk study, whereby a signed consent document was not required.The subjects were nearly equally distributed according to sex (126 women and 130 men) and age class (range: 15 to 86 years).The level of education (from primary school to university) was recorded in years.Nobody showed a history or evidence of psychiatric disorders or dementia.The demographic distribution of the sample is shown in Table 1.

Stimuli and procedure
Test items, administration procedure and scoring criteria in this study followed the ones described by Lezak in the fourth edition of the Neuropsychological Assessment Handbook 16 .The test items were selected from the classic version of the Tinkertoy set.Namely, 50 items were used: 24 wooden dowels (4 red, 4 green, 4 orange, 6 blue, 6 yellow), 10 wooden knobs, 4 wooden wheels, 4 wooden caps, 4 wooden connectors, and 4 plastic yellow connectors (Figure 1).
Each subject was individually presented with the aforementioned 50 pieces in different colours and forms, placed at random on a clean surface, and were told to build up whatever construction they wanted with a 5-minute minimum time limit, but no maximum time limit.On completion, the subjects were asked to say what the construction represented.Assessment took into account 7 performance variables: 1. Made construction(s) -whether the subject made any combination of pieces; 2. Number of pieces -total number of pieces used; 3. Name -whether and when the subject gave a name appropriate to the construction's appearance; 4. Mobility (wheels working) and moving parts; 5. Three-dimensionality -whether the subject's construction had three dimensions; 6. Free-standing -whether the subject's construction stayed standing; 7. Errors -pieces forced together (misfit), connections not properly made (incomplete fit), and dropped and not picked up pieces (see Table 2).At the end a complexity score was given, determined by the sum of the points earned in each of the performance variables, with a maximum of 12 points (for two examples, see Figure 2).

Data analysis
The choice of the equivalent scores procedure was prompted by the need to obtain norms that could be directly compared to the already available norms of a wide set of other neuropsychological tests.In the first place, the influence of age, education and gender, the latter dichotomised, was evaluated through a linear multiple regression model, with least square estimation method.Several monotonic transformations of independent variables were analysed and the most effective in reducing the residual variance was adopted.The effect of each variable was studied partialling out the effect held in common with the other variables, after discarding     age, as non significant as a covariate.In this way, it was possible to estimate the effects of confounding factors on the raw scores and, based on these estimates, adjusted scores were computed, adding or subtracting the contribution of the significant confounding effect.After ranking adjusted scores, Wilks' nonparametric procedure was applied to set tolerance limits 26,27 for pathological TTT result 0 (the lower 5% of all population).The maximum equivalent score, 4, was set with the analogous procedure for the upper 5% of population, whereas equivalent scores 1,2,3 were determined based on the ranking.Spss 21 package for the Social Science led to linear model estimation and to the ranking, Wilks' tolerance intervals by mean of the R package 'tolerance' 27 .Normality criteria are generally appraised by comparing one subject's performance to that of all the other subjects.This implies homogeneity across the subjects in the comparison, and hence imposes the requirement that all possible factors influencing performance have been taken into account and removed from the raw scores.From a statistical point of view, this aim can be accomplished using stratification, which nonetheless, in some cases, raises problems concerning the sample size in each stratum.Alternatively, the effect of confounding factors can be removed from raw scores in a multiple regression model 4,8,22 .In order to set correction grids for the raw values of participants' complexity scores, a linear model for the simultaneous effect of sex, age and educational level in years was fitted.Apart from sex, coded as a dummy variable, all dependent and independent variables were centred, where centring each variable on its mean corrected for any overlap with the effect of other terms of the model.The multiple regression proved significant (F 2;242 ) = 9.08; p <0.001, adjusted R square = 0.45).With regard to regression coefficients, sex and education proved significant (p = 0.002 and p = 0.008 respectively), whereas age did not, due to multicollinearity with education (r p = 0.422; p < .001)and it was therefore discarded.On average, females obtained lower scores than males (8.71 versus 9.40, sd 1.743 and 1.645 respectively).Education played a positive but modest role, an increase in the score from one education class to the adjacent one accounting approximately half a point (Table 3).

Results
Let y f,, y m indicate the score of a female and a male respectively and x the number of years of education.Then, the estimated impact of confounding variables on the TTT Complexity Centred raw scores can be expressed as a linear function of the confounding variables, sex as a dichotomous variable (males coded as 0, females as 1) and centred years of education.
The estimation of the linear regression for the raw scores gives: Accordingly, adjustment was performed subtracting the estimated contribution of the confounding variables from each raw score, distinctly for females, with x = 1, and for males, with x = 0 in (2).In order to produce the adjustment to be applied to patients raw scores evaluated in rehabilitation practice, Table 4 shows the correction grid with the points to be added to raw Complexity Scores in order to calculate adjusted scores.Once the adjusted distribution had been computed, the identification of a cut-off point that assessed normality or impairment was a crucial step 19 .The appropriate criteria were represented by the interval underlying the lowest 5% tail of the adjusted scores in the cumulative distribution.
However, misclassification of performance may arise and needs to be taken into account.In using the widely accepted value of the lower 5% of the normal population (regarded as a reasonable criterion for classifying subjects that are probably not normal) there is an inherent risk of incorrect categorization.The estimation of inferential tolerance limits enable one to obtain the thresholds above (or below) which there is at least (or at most) a desired percentage of the population, and the estimation of these limits keeps errors in performance assessment under control 7,18 .With the thirteenth observation, corresponding to the value of 6.25, representing the fifth percentile of the cumulative distribution function, the tenth and the sixteenth observation were identified as the outer and the inner limits, yielding the values of 5.86 and 6.44 respectively.Values equal to, or lower than, the outer tolerance limit indicate a pathological performance, with a controlled error risk.In order to compare the performance in this test to those in other tests, the standardization issue needs to be faced.The commonly used z-scores raise various difficulties, such as an alteration of the statistical dispersion of adjusted scores and problems with floor and ceiling values 7 .Distribution-free techniques are required here, since the best standpoint seems to be that of regarding adjusted scores as raw estimates of performance and hence converting them into an ordinal scale with just a few ordinal values, by means of the cumulative function of adjusted scores.A 5-point scale from (0 to 4), termed equivalent scores, is widely used, where 0 indicates the score that lies below the outer non-parametric tolerance limit of adjusted scores, Equivalent scores 1, 2 and 3 are intermediate between 0 and 4, id est they are obtained in the cumulative adjusted scores distribution.The equivalent score 4 indicates a performance equal to or superior to the median, thus no longer distinguishing between scores found in the upper half of the distribution.Equivalent scores 1, 2 and 3 are intermediate between 0 and 4 on a quasi-interval scale.An equivalent score equal to 0 is considered below the normal range, with a controlled error risk.This contracted scale of equivalent scores is then measured on a quasi-interval scale 8 and may be viewed as a standardisation of adjusted scores.Table 5 shows the equivalent score limits, the density (i.e. the number of subjects within each equivalent score), and the cumulative frequency of subjects from 0 to 4 equivalent scores.executive functions, such as the Wisconsin card sorting and Weigl tests 14 and show an effect of culture and learning in structuring high-level functions.The relationship with education was also found by Apollonio and collaborators with the FAB 2 .Adjusted scores and inferential cut-off scores were calculated.Moreover, adjusted scores were transformed into equivalent scores, since the availability of equivalent scores makes it possible to evaluate whether a patient presents a homogeneous cognitive profile, or if he/she presents selective deficits in one or more cognitive areas.Therefore, it is now possible to compare the performance of brain-damaged patients directly with the TTT and other neuropsychological tests, using normative data with equivalent scores.

Open Peer Review
Current Referee Status: In this vein, the word "adults" in the title seems to be not completely appropriate.Even more importantly, I don't see any reason why ethnicity should impact performance on this test.To sum up, I strongly suggest a shorter and more appealing title: Normative data for Lezak's Tinkertoy test.
Actually, so far normative data for this test are limited and based on a relatively small sample.Thus, I commend the authors for their intent to address this lack of evidence.There are, however, some shortcomings that should be addressed in a new version of the manuscript.
If my understanding is correct, the authors treated age as a continuous rather than discrete variable.If so, Table 1 is not necessary since it leaves the reader the idea that the age effect is spurious, due to the fact that the different age groups are not the same size.Alternatively, the authors could consider the opportunity to use statistical analysis like ANOVA -quite robust against different group sizes -rather than linear regression.
The main problem with the present paper, however, has to do with the fact that the authors missed a great opportunity to demonstrate that the Tinkertoy test has good construct validity.Previous studies reported that among elderly, demented, and traumatic brain injury patients the Tinkertoy test score has a significant, positive correlation with performances on the Trail Making and Wisconsin Card Sorting Test.In particular, the Tinkertoy complexity score turned out to be very sensitive to disorders of executive functions.In no way, however, can this be taken as convincing evidence that the Tinkertoy test is a reliable and valid instrument to assess executive functions in healthy people.I encourage the authors to provide further data on this topic.
Finally, I want to focus authors' attention on some grammatical and lexical issues.As to grammar: Methods section, second paragraph: subject and verb are not in agreement as to number "Each subject…… were told…".As to lexicon: For more than 100 years, in the field of experimental psychology, the term "subjects" has been used to describe people who take part in research and its use is still widely accepted.In the last decades, however, several psychological societies argued that the term "subject" is disrespectful, and recommended to replace it by "participants".Authors could consider this possibility.Analogously, notwithstanding the taxonomic label "frontal lobe syndrome" is still very popular among

Figure 1 .
Figure 1.Tinkertoy items used by Lezak 16 for the Tinkertoy test.

1 . 1 2.
Made construction(s) MD whether S makes any combination of pieces Number of pieces NP total number of pieces used • 1 = NP ≤20 • 2 = NP ≤30 • 3 = NP ≤40 • 4 = NP ≤50 3. Name N whether and when S gives a name appropriate to the construction's appearance • 0 = none N • 1 = description or post hoc naming • 2 = vague or inappropriate N • 3 = appropriate N 4. Mobility M a) and moving parts M b) • M a): working wheels • M b): moving parts •

Figure 2 .
Figure 2. Two examples of performance at the TTT by a neurologically unimpaired subject and a TBI patient recruited exclusively for this comparison.The first is a male, 46 years old, with 13 years of education; his performance has been evaluated as 11.64 (corrected score) and 4 as equivalent score, according to Table2.The TBI patient is a female, 36 years old, with 13 years of education; her performance has been evaluated as 4.33 (corrected score) and 0 as equivalent score.
of Neuroscience, Psychology, Drug Research, Child Health, University of Florence, Florence, Italy This study provides normative data on the Lezak's Tinkertoy test for Italian population ranging in age from adolescence to older adulthood (range: 15-86 years).

Table 1 . Distribution of the experimental sample (n=256) according to age and education level.
Values represent the number of subjects.