NHANES is not a longitudinal study; that is, it does not follow participants over time. Rather, the data collected in any NHANES cycle can be viewed as a snapshot of the US population from the period corresponding to that cycle. Analyses that combine data across cycles typically assume that the underlying population characteristics have not changed across the cycles being combined.
However, as NHANES has now collected data over more than two decades, it may also contain evidence of characteristics that have changed over time. In this analysis, we consider a specific question: whether the distribution of BMI, a standard indicator of obesity, has changed across cycles. We ask this question separately for various ethnicities and genders, as BMI is known to vary substantially across population subgroups. As an illustration of best practices, we use methods from the survey package which take into account the complex sample selection design of NHANES.
Relevant variables
To identify variables that contain information about BMI and the NHANES tables they are available in, we can use the nhanesSearch() function in the nhanesA package.
library(nhanesA)nhanesOptions(log.access =TRUE)nhanesSearch("body mass index")
# A tibble: 12 × 7
Variable.Name Variable.Description Data.File.Name Data.File.Description
<chr> <chr> <chr> <chr>
1 BMXBMI Body Mass Index (kg/m**2) BMX Body Measures
2 BMXBMI Body Mass Index (kg/m**2) BMX_B Body Measures
3 BMXBMI Body Mass Index (kg/m**2) BMX_C Body Measures
4 BMXBMI Body Mass Index (kg/m**2) BMX_D Body Measures
5 BMXBMI Body Mass Index (kg/m**2) BMX_E Body Measures
6 BMXBMI Body Mass Index (kg/m**2) BMX_F Body Measures
7 BMXBMI Body Mass Index (kg/m**2) BMX_G Body Measures
8 BMXBMI Body Mass Index (kg/m**2) BMX_H Body Measures
9 BMXBMI Body Mass Index (kg/m**2) BMX_I Body Measures
10 BMXBMI Body Mass Index (kg/m**2) BMX_J Body Measures
11 BMXBMI Body Mass Index (kg/m**2) P_BMX Body Measures
12 BMXBMI Body Mass Index (kg/m**2) BMX_L Body Measures
# ℹ 3 more variables: Begin.Year <int>, EndYear <int>, Component <chr>
These results tell us that BMI measurements are available as the BMXBMI variable in the BMX tables. For any reasonable analysis, we will need to combine these at least with demographic information available in the DEMO tables.
BMI by age
To compare the distribution of BMI across cycles, we need to first understand the factors that affect its distribution within a cycle. Natural covariates are gender and ethnicity, and possibly age. To understand the dependence on age, we choose a particular demographic subgroup (white females) from a particular cycle, and plot BMI vs age.
The smooth line is a LOESS line giving a nonparametric estimator of the average BMI as a function of age for this population subgroup. Unfortunately, the data shown in this figure are not an i.i.d. sample from the population, and so the estimated smooth may be biased and misleading. To take the complex survey design on NHANES into account, we can use tools in the survey package, which implements variants of many standard statistical analysis tools appropriate for survey data.
The survey package does not implement a survey variant of LOESS, although it does implement local polynomial smoothing (see `?svysmooth). We will instead use a parametric variant that supports “non-linear” mean functions via basis splines. Before doing this, we first need to set up a survey design object with suitable weights, id, and strata information.
This smooth represents an estimate of the expected BMI as a function of age for white females. What we are really interested in is how this function changes over time, that is across cycles, for this and other population subgroups.
To do this, we first obtain the survey design objects for each cycle, restricting our attention to (non-Hispanic) white and black adults.
The resulting smooths can be compared using the following plot.
xyplot(smoothDF, AvgBMI ~ Age | gender + ethnicity, type ="l", groups = cycle,grid =TRUE, auto.key =TRUE)
Figure 3
Although it is not straightforward to interpret this plot, it is clear that average BMI tends to initially increase with age, after which it tends to stabilize. For this reason, we choose to consider only a middle age group, namely, adults between ages 40–59.
Average BMI in 40–59 year olds
Once we decide on a specific age group to consider, we can ignore the dependence on age and simply compare the average BMI across cycles for various population subgroups. Of course, the estimated average BMI values are not of much use unless we also calculate standard errors or confidence intervals, which are nontrivial to compute. For this, we again use the survey package, and specifically its svymean() function, for which the associated confidence intervals can be easily obtained.
As the BMI values are somewhat right-skewed, we calculate the average and confidence interval for log-BMI, and transform it back to the BMI scale before plotting.
If we now plot the confidence intervals across cycles, we see that although successive confidence intervals are mostly overlapping, there is a distinct general trend of increasing average BMI values over time.
segplot(factor(cycle) ~ LCL + UCL |interaction(RIAGENDR, RIDRETH1),data = CI, # level = estimate,draw.bands =FALSE, lwd =2, horizontal =FALSE,ylab ="95% Confidence Intervals for Average (log) BMI \n (for Age Group 40-59 years)",scales =list(x =list(rot =90)),centers = estBMI) +layer_(panel.grid(v =-1, h =0))