Opinion

It’s Time to Re-evaluate the Apgar Score

When Virginia Apgar, MD, proposed her now-universal scoring system for newborns in 1953, her primary purpose was to get attention paid to the newborn because, as she wrote, “Nine months observation of the mother surely warrants one-minute observation of the baby.”1 After the national Collaborative Perinatal Study showed that low Apgar scores occurred more frequently in those who died in the neonatal period or had higher rates of neurological morbidity at 1 year of life, the Apgar score spread to where it is now assigned to newborns in almost every country in the world. A PubMed search for Apgar score yields almost 12 000 publications.

However, almost 40 years ago, questions arose about the (mis)use of the Apgar score, mainly based on studies that found limitations when using the score to predict short-term or long-term morbidity and mortality.2 In these studies, scores were often parsed into low (0-3), medium (4-6), and high (7-10) groups, but to our knowledge, these divisions have never been validated. Warnings about equating low scores with asphyxia or telling parents what was likely to happen in the future to their newborn based on the Apgar scores have been sounded for decades, including by the American Academy of Pediatrics, which in 1986 stated, “the scores alone should not be considered either of or consequent to substantial asphyxia.”3 Nevertheless, recent publications still use the score to predict outcome in groups of newborns.

While problems with poor prognostic ability are well documented and referenced, other very important concerns about the clinical use of the Apgar score have become apparent over the last 20 years. First, there are a number of studies that have demonstrated high levels of interobserver variability. There are even geographic or cultural differences in assigning the Apgar score. For example, while 8.8% of newborns born in Latvia get a 5-minute score of 10, that’s the score assigned to 92.7% of newborns born in France.4 Second, there is the issue of prematurity. The original Apgar score did not specify how to assess parameters such as reflex irritability and tone in premature newborns. As a result, there is no consensus on how to score a newborn born at 24 weeks’ gestation, even if tone and reflex irritability are completely normal for this stage of development.

Third, once people began to pay attention to the newborn, they began to respond when the newborn had bradycardia or apnea, but there is no consensus on how to account for interventions when assigning the Apgar score. Apgar did not specify how to score newborns receiving an intervention, even though 7% of the newborns in her first article were ventilated,1 resulting in large variation in how these newborns are scored. Acceptable oxygen saturation levels defined in the latest edition of the Neonatal Resuscitation Program (NRP)5 could result in a newborn getting marked off for color but still have a normal oxygen level. Finally, and most significantly, with the advent of the NRP and its requirement that “at least 1 qualified individual…whose only responsibility is the management of the newly born baby”5 be at every delivery, Apgar’s initial goal for her score, to focus attention on the newborn, has been achieved. The Apgar score is not used in the NRP.

If a medical procedure or test had similar problems with accuracy, reproducibility, universality, and even utility, there would be calls for its retirement, and there have been several such calls over the years. Yet as recently as 2015, the American Academy of Pediatrics and the American College of Obstetrics and Gynecology endorsed the continued use of the Apgar score.6 They also recommended to report resuscitative interventions along with the traditional scoring, what they termed the Expanded Apgar score. It consists of 7 possible treatments and subtracts points for resuscitation interventions performed at each point. Another modification is the Specified Apgar score, in which the definitions of the 5 parameters of the original score are specified to ignore the need for any interventions and just assess the newborn and are renormalized to gestational age for the neurological responses rather than comparing them to what a term newborn would look like. The Combined Apgar score incorporates the Specified and the Expanded Apgar scores.7 These alternatives, and others, have only been assessed in relatively small prospective studies in selected populations, to our knowledge.

Given the multiple concerns and limitations, we believe that the time has come to relook at the Apgar score to determine if it is still useful, if it needs to be revised or replaced by a different system, or if we should do away with scoring newborns after delivery altogether. However, to do so, we must first define the purpose of a scoring system. There are 3 potential goals for any quantitative assessment of a newborn after delivery: (1) it can be an easy way to describe the condition (and interventions needed to achieve that condition) in a reproducible (and numerical) way and help in clinical communication; (2) it can measure the immediate clinical results of delivery and resuscitation; and (3) it can quantify the individual condition (and intervention) of a newborn at different points and will thus make groups of newborns comparable, becoming a tool for classifying risk of short-term and long-term complications, research, and quality improvement.

Based on these scoring goals, useful end points for a large prospective clinical trial can be defined. Thus, we suggest to borrow from the experience of researchers investigating plasma transfusion practices in the pediatric intensive care unit, who needed to define clinically significant bleeding before embarking on any large prospective trials. A panel of international experts developed a set of clinical definitions or scenarios, and these were assessed by more than 500 practitioners using a web-based survey platform. The expert panel then formulated bleeding definitions that reflected a high level of agreement among the survey respondents. A second expert panel used a Delphi process until there was agreement on all the elements.

To begin this process, a library of potential outcomes important to measure in a clinical trial is needed. We invite readers to send their suggestions for the purpose(s) of why a scoring system for all newborns should be performed and/or what outcomes should be measured in a clinical trial to determine the best scoring system, if any.

(Mario Rüdiger, Henry J. Rozycki, JAMA Pediatr. Published online February 24, 2020. doi:10.1001/jamapediatrics.2019.6016)

References

1. Pediatr Clin North Am. 1966;13(3):645-650.
2. Pediatrics. 1981;68(1):36-44.
3. Pediatrics. 1986;78(6):1148-1149.
4. Paediatr Perinat Epidemiol. 2017;31(4):338-345.
5. Circulation. 2015;132(18)(suppl 2):S543-S560.
6. Pediatrics. 2015;136(4):819-822.
7. BMC Pediatr. 2015;15:18.