Last week, I wrote about the problems with the evaluation tools used to determine eligibility in early intervention. The full post is here. But, the highlights are: 1) the primary protocol used is more than 10 years old, the secondary protocol is 8 years old; 2) the primary protocol does not allow providers to account for prematurity; 3) and the primary and secondary protocols were developed for children who are growing up in middle-class, mainstream culture, Mainstream American English monolingual (and mono-dialectal) environments.
There is very little consideration given to DEI factors in the determination of eligibility of EI services as currently implemented.
This lack of DEI leads to overidentification of very young children as eligible for early intervention services. That’s due to what’s known as sensitivity and specificity. A test, protocol, or procedure has good sensitivity when it is able to correctly diagnose a problem. In our case, good sensitivity would be correctly identifying very young children who actually have developmental delays. A test, protocol, or procedure has good specificity when it correctly determines there is not a problem when no problem exists. For the purposes of our discussion, good specificity would be correctly determining that no problem exists when very young children are, in fact, typically developing. It should be immediately obvious that the protocol has issues with both sensitivity and specificity – especially sensitivity. The primary protocol and procedures as currently used have very poor sensitivity. That is, they routinely identify children as having developmental delays when even the test developers acknowledge that differences in opportunities to engage in different skills/activities can affect young children’s performance.
I’ll refer you back to my previous post for examples of differences in opportunities so we can keep going.
When we use standardized tests, we administer the test the same way to every client. That’s why it’s called “standardized.” Standardized tests generally require the examiner to determine the examinee’s basal (or base level) performance (often, the highest point at which the examinee answered 3-5 questions correctly consecutively) and the ceiling level of performance (the lowest point at which the examinee answered 3-5 questions incorrectly consecutively.) The raw score (number correct on most standardized tests) is calculated by subtracting the number of errored responses between the basal and ceiling levels. If a standardized test is normed, the examinee’s raw score is used to calculate values like a standard score or percentile rank that can be compared to the normative sample usually on the basis of the examinee’s age- or grade level. Examiners compare the individual’s performance to the normative sample to determine 1) if a problem exists or not and 2) the severity of the problem if it does exist. Standard scores and percentile ranks provide this information in slightly different ways. Standard scores use the individual’s raw score and convert it (generally) to a scale with a mean of 100 and a standard deviation of +/- 15. That gives a range of ‘typical development’ of 85-115. From an interpretation standpoint, if a child achieves a standard score of 94, that means their performance fell within normal or typical limits. I may want to monitor this child’s progress periodically because this performance level is on the lower end of typical expectations, but I generally would not consider this child to demonstrate any delays that require intervention. If a child performs above a standard score of 115, their performance is well above age- or grade-level expectations and I’m not concerned about their skills from a clinical perspective. (I might be worried about them becoming bored in a classroom setting, etc., however!)
Percentile ranks are another way to measure an individual’s performance against a larger group to determine if a problem exists. As the name “percentile rank” implies, an individual’s performance is ‘ranked’ in terms of where it falls when compared to the larger group. When percentile ranks are reported, their interpretation is relatively straightforward. If an examine performs at the 50% percentile, that means 50% of other people in the same group performed above that level of performance and 50% performed below that level of performance. In other words, someone who performs at the 50% percentile is right in the middle of the pack. If someone performs at the 99% percentile, that means they performed better than 99% of other people in the same group. Conversely, if someone performs at the 10% percentile, that means they performed better than only 10% of the people they were compared to — or, put another way, 90% of people in the larger group performed better than the person you just tested. Note that percentages are already present in the definition and interpretation of percentile rank. That will become important below.
There are relations between standard scores and percentile ranks because both are based on the normal distribution from statistics (i.e., the bell shaped curve). A good explanation of the relations between standard scores and percentile ranks can be found here (along with information about developmental language disorder). The short version is this: just like standard scores between 85-115 are considered to be within normal/typical limits, performance at or above the 16th percentile generally is considered to be within normal/typical limits.
On many standardized protocols, the raw score also can be used to determine an age-equivalent. These scores are based upon complicated statistical formulas that require substantial data transformations. The important thing to understand is that age equivalents are based on the average raw score performance of all participants from the normative sample at different ages. The raw score is converted to the age equivalent by determining the statistically calculated age at which children generally achieved that number of raw score points. For example, a child who is 36 months old (3 years) may achieve the same raw score as children who generally were 60 months old (5 years old). Does that mean we should treat the 3 year old like a 5 year old? No, the 3 year is old is a 3 year old developmentally, but their language skills may be precocious. The same is true in the opposite situation. A child who is 36 months old may have achieved the same raw score as children who are 18 months old. Does that mean a 3 year old should be treated like a toddler? No, it does not. Researchers have long advocated for discontinuing the use of age-equivalents because they are statistical artifacts and do not represent the actual strengths and weaknesses of the individual examinee. For example, did the 3 year old whose age equivalency was that of a 5 year old perform well on vocabulary items? Did they have some early literacy knowledge that kept the examiner from being able to obtain the 3-5 consecutive incorrect responses to stop testing? Was the 3 year old whose age equivalency was that of an 18 month old judged on the basis of their performance in English without taking their primary Spanish language development into account?
So, why are age equivalents still included in the scoring and reporting of performance on standardized tests? The answer is because many EI programs require that a “percent delay” based on the age equivalent be used to determine eligibility (see the previous post for the ways in which a child can be found eligible for EI services). When completing evaluations to determine eligibility for EI, only the raw score and age equivalent based on that raw score are calculated. The age equivalent is used as the numerator while the child’s chronological (not adjusted) age is used as the denominator. In the case of a 24 month old child who performed at the “12 month” level in the communication domain, the math looks like this: 12 month age equivalency/24 months chronological age = 50% delay. Exactly how should parents and professionals interpret this information based on statistical artifacts?
The thing is, we’ve judged a child’s eligibility for EI using protocols and procedures with poor sensitivity (i.e., a high false positive rate). Then, the problem is compounded by the way in which the “percent delay” is calculated using a statistically mythical model child. Current procedures used to determine eligibility for EI result in the over-identification and enrollment in services for many children who live in culturally and linguistically diverse households. Is anyone else bothered by this? I certainly am!
If we took this hypothetical 24 month old who performed at the “12 month” level with a “50% delay” in communication and instead looked at the corresponding standard scores and percentile ranks, we come away with a very different interpretation of the data. This child’s raw scores would be 13 on the receptive language subtest and 12 on the expressive language subtest. When those raw scores are converted to standard scores, the child achieved a standard score of 79 receptively and 80 expressively. The corresponding percentile ranks are 8 and 9, respectively. Remember, standard scores between 85-115 are considered to be within typical limits. Standard scores of 79 and 80 are lower than the typical range, but not “50% delay” below. These standard scores fall approximately -1.25 standard deviations below age-level expectations. The percentile ranks of 8 and 9 are below the 16th percentile and indicate that 92% and 91% of same age children performed better on these tasks.
For what it’s worth, if this child was 36 months old and was being evaluated for eligibility for services under an Individual Education Plan (IEP), they would not be eligible. To ensure better equity in determining eligibility, the public schools (generally) have adopted the following criterium: Performance below the 7th percentile [or where more than 93% of the larger normative group performed better than the individual] on at least 2 measures of the area being evaluated.
What’s to be done? First and foremost, EI providers need to be aware of the impact of current policies and procedures on children who are growing up in CLD environments. Growing up in a non-MAE environment does not mean that a child’s development is delayed. But, it does mean that EI providers must be culturally responsive and take these factors into consideration during evaluations. EI providers should receive additional training in the use and interpretation of information included in examiner’s manuals of standardized protocols. EI providers must be aware of the limitations of the standardized protocol they are administering to children and their families. State-level agencies should give serious consideration to adopting eligibility procedures that provide better specificity (e.g., allowing providers to choose standardized protocols that evaluate the child in the dominant language of the home and/or account for prematurity). State-level agencies also should consider adopting the use of standard scores and/or percentile ranks as the measures by which the presence or absence of delay is determined and then providing a measure of the severity of that delay. Doing so would provide more and more useful information to parents, EI providers, other professionals involved in the child’s care (who use SS and percentile ranks), and allow for better continuity of services when a child transitions from EI to the school system at age 3 years.
And, last but certainly not least, researchers and test developers need to rethink items that are included on protocols at these early developmental stages. One simple change would be to add a “no opportunity” option to the scoring of existing protocols. Currently, if a child has no opportunity to demonstrate a particular skill, that item has to be scored as a “0” and counts against the total raw score. There is a difference between not having an opportunity to do something and not being able to do it. There also should not be 7 items on a cognitive subtest that differentiate performance between children who identify as African American/Black and children who identify as Caucasian/White. These items should be investigated as potentially implicitly biased (possibly on the basis of “no opportunity”.) It may well be that different forms need to be developed for children learning different languages and/or different dialects. The developmental sequence of certain early language markers is different in English and Spanish because the languages are different. Children who are growing up learning more than one language have a different developmental trajectory than children growing up speaking only one language. The same is true for children learning non-MAE dialects of English.
In the next post, I’ll tackle what I see as the over-reliance on receptive language outcomes over expressive language outcomes despite parents’ explicitly stated concerns about their children’s expressive language develop.
As always, thank you for reading and I look forward to your comments!