Introduction to NHANES

The National Health and Nutrition Examination Survey (NHANES) is a program of the National Center for Health Statistics (NCHS), which is part of the US Centers for Disease Control and Prevention (CDC). It measures the health and nutritional status of adults and children in the United States in a series of surveys that combine interviews and physical examinations.

Although the program began in the early 1960s, its structure was changed in the 1990s. Since 1999, the program has been conducted on an ongoing basis, where a nationally representative sample of about 5,000 persons (across 15 counties) is examined each year, with public-use data released in two-year cycles. This phase of the program is referred to as continuous NHANES.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. Although the details of the responses recorded vary from cycle to cycle, there is a substantial amount of consistency, making it possible to compare data across cycles. Sampling weights are provided along with demographic details for each participant; see the NHANES analytic guidelines for details. NHANES is a rich resource that has been used extensively in epidemiological research.

Public-use data: web resources

NHANES makes a large volume of data available for download. However, rather than a single download, these data are made available as a number of separate SAS transport files, referred to as “data files” in the NHANES ecosystem, for each cycle. Each such data file or table contains records for several related variables. A comprehensive manifest of data files available for download is available here, along with subsets broken up into the following “components”: Demographics, Dietary, Examination, Laboratory, and Questionnaire.

For each data table listed in these manifests, a link to a “Doc File” (which is an HTML webpage describing the data file) and a link to a SAS transport file is provided. An additional list of limited access data files are documented here, but the corresponding data file download links are not available.

An additional manifest of variables is separately available for each component, and gives more detailed information about both the variables and the data files they are recorded in, although these tables do not provide download links directly: Demographics, Dietary, Examination, Laboratory, Questionnaire.

In addition, a search interface is also available.

For reasons not specified, NHANES releases data files as SAS transport files, and provides links to proprietary Windows-only software that can supposedly be used to convert these files to CSV files.

Public-use data: R resources

One of the goals of the Epiconnector project is to provide and document an alternative access path to NHANES data and documentation via the R ecosystem. It builds on the nhanesA R package, along with utilities such as SQL databases and docker, to enable efficient and reproducible analyses of NHANES data.

The nhanesA package

The nhanesA package provides a user-friendly interface to download and process data and documentation files from the NHANES website. To use the utilities in this package, we first need to know a few more details about how NHANES data and documentation are structured.

Each available data file, which we henceforth call an NHANES table, can be identified uniquely by a name. Generally speaking, each public-use table has a corresponding data file (a SAS transport file, with extension xpt) and a corresponding documentation file (a webpage, with extension htm). The URLs from which these files can be downloaded can usually be predicted from the table name, and the cycle it belongs to. Cycles are typically of 2-year duration, starting from 1999-2000.

Although there are exceptions, a table that is available for one cycle will typically be available for other cycles as well, with a suffix appended to the name of the table indicating the cycle. To make these details concrete, let us use the nhanesManifest() function in the nhanesA package to download the list of available tables and look at the names and URLs for the DEMO data files, which contain demographic information and sampling weights for each study participant.

library(nhanesA)
manifest <- nhanesManifest("public") |> sort_by(~ Table)
subset(manifest, startsWith(Table, "DEMO"))
     Table                                             DocURL
370   DEMO   /Nchs/Data/Nhanes/Public/1999/DataFiles/DEMO.htm
369 DEMO_B /Nchs/Data/Nhanes/Public/2001/DataFiles/DEMO_B.htm
368 DEMO_C /Nchs/Data/Nhanes/Public/2003/DataFiles/DEMO_C.htm
366 DEMO_D /Nchs/Data/Nhanes/Public/2005/DataFiles/DEMO_D.htm
367 DEMO_E /Nchs/Data/Nhanes/Public/2007/DataFiles/DEMO_E.htm
371 DEMO_F /Nchs/Data/Nhanes/Public/2009/DataFiles/DEMO_F.htm
372 DEMO_G /Nchs/Data/Nhanes/Public/2011/DataFiles/DEMO_G.htm
373 DEMO_H /Nchs/Data/Nhanes/Public/2013/DataFiles/DEMO_H.htm
374 DEMO_I /Nchs/Data/Nhanes/Public/2015/DataFiles/DEMO_I.htm
375 DEMO_J /Nchs/Data/Nhanes/Public/2017/DataFiles/DEMO_J.htm
377 DEMO_L /Nchs/Data/Nhanes/Public/2021/DataFiles/DEMO_L.htm
                                               DataURL     Years
370   /Nchs/Data/Nhanes/Public/1999/DataFiles/DEMO.xpt 1999-2000
369 /Nchs/Data/Nhanes/Public/2001/DataFiles/DEMO_B.xpt 2001-2002
368 /Nchs/Data/Nhanes/Public/2003/DataFiles/DEMO_C.xpt 2003-2004
366 /Nchs/Data/Nhanes/Public/2005/DataFiles/DEMO_D.xpt 2005-2006
367 /Nchs/Data/Nhanes/Public/2007/DataFiles/DEMO_E.xpt 2007-2008
371 /Nchs/Data/Nhanes/Public/2009/DataFiles/DEMO_F.xpt 2009-2010
372 /Nchs/Data/Nhanes/Public/2011/DataFiles/DEMO_G.xpt 2011-2012
373 /Nchs/Data/Nhanes/Public/2013/DataFiles/DEMO_H.xpt 2013-2014
374 /Nchs/Data/Nhanes/Public/2015/DataFiles/DEMO_I.xpt 2015-2016
375 /Nchs/Data/Nhanes/Public/2017/DataFiles/DEMO_J.xpt 2017-2018
377 /Nchs/Data/Nhanes/Public/2021/DataFiles/DEMO_L.xpt 2021-2023
            Date.Published
370 Updated September 2009
369 Updated September 2009
368 Updated September 2009
366 Updated September 2009
367         September 2009
371         September 2011
372   Updated January 2015
373           October 2015
374         September 2017
375          February 2020
377         September 2024

The nhanesA package allows both data and documentation files to be accessed, either by specifying their URL explicitly, or simply using the table name, in which case the relevant URL is constructed from it. For example,

demo_b <- nhanesFromURL("/Nchs/Data/Nhanes/Public/2001/DataFiles/DEMO_B.xpt",
                        translated = FALSE)
demo_c <- nhanes("DEMO_C", translated = FALSE)
str(demo_b[1:10])
'data.frame':   11039 obs. of  10 variables:
 $ SEQN    : num  9966 9967 9968 9969 9970 ...
 $ SDDSRVYR: num  2 2 2 2 2 2 2 2 2 2 ...
 $ RIDSTATR: num  2 2 2 2 2 2 2 2 2 1 ...
 $ RIDEXMON: num  2 1 1 2 2 2 1 2 1 NA ...
 $ RIAGENDR: num  1 1 2 2 1 2 1 2 1 1 ...
 $ RIDAGEYR: num  39 23 84 51 16 14 44 63 13 80 ...
 $ RIDAGEMN: num  472 283 1011 612 200 ...
 $ RIDAGEEX: num  473 284 1012 612 200 ...
 $ RIDRETH1: num  3 4 3 3 2 2 3 1 4 3 ...
 $ RIDRETH2: num  1 2 1 1 5 5 1 3 2 1 ...
str(demo_c[1:10])
tibble [10,122 × 10] (S3: tbl_df/tbl/data.frame)
 $ SEQN    : int [1:10122] 21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ...
 $ SDDSRVYR: int [1:10122] 3 3 3 3 3 3 3 3 3 3 ...
 $ RIDSTATR: int [1:10122] 2 2 2 2 2 2 2 2 2 2 ...
 $ RIDEXMON: int [1:10122] 1 2 1 2 2 2 1 2 1 2 ...
 $ RIAGENDR: int [1:10122] 1 2 2 1 1 2 1 1 2 1 ...
 $ RIDAGEYR: int [1:10122] 19 16 14 17 55 52 0 63 13 3 ...
 $ RIDAGEMN: int [1:10122] 232 203 172 208 671 633 3 765 163 42 ...
 $ RIDAGEEX: int [1:10122] 233 205 172 209 672 634 4 766 164 42 ...
 $ RIDRETH1: int [1:10122] 4 4 3 4 3 3 1 4 4 4 ...
 $ RIDRETH2: int [1:10122] 2 2 1 2 1 1 3 2 2 2 ...

The data in these files appear as numeric codes, and must be interpreted using codebooks available in the documentation files, which can be parsed as follows.

demo_b_codebook <-
    nhanesCodebookFromURL("/Nchs/Data/Nhanes/Public/2001/DataFiles/DEMO_B.htm")
demo_b_codebook$RIDSTATR 
$`Variable Name:`
[1] "RIDSTATR"

$`SAS Label:`
[1] "Interview/Examination Status"

$`English Text:`
[1] "Interview and Examination Status of the Sample Person."

$`Target:`
[1] "Both males and females 0 YEARS -\r 150 YEARS"

$RIDSTATR
# A tibble: 3 × 5
  `Code or Value` `Value Description`            Count Cumulative `Skip to Item`
  <chr>           <chr>                          <int>      <int> <lgl>         
1 1               Interviewed Only                 562        562 NA            
2 2               Both Interviewed and MEC exam… 10477      11039 NA            
3 .               Missing                            0      11039 NA            
demo_b_codebook$RIAGENDR
$`Variable Name:`
[1] "RIAGENDR"

$`SAS Label:`
[1] "Gender"

$`English Text:`
[1] "Gender of the sample person"

$`Target:`
[1] "Both males and females 0 YEARS -\r 150 YEARS"

$RIAGENDR
# A tibble: 3 × 5
  `Code or Value` `Value Description` Count Cumulative `Skip to Item`
  <chr>           <chr>               <int>      <int> <lgl>         
1 1               Male                 5331       5331 NA            
2 2               Female               5708      11039 NA            
3 .               Missing                 0      11039 NA            

By default, the data access step converts the raw data into more meaningful values using the corresponding codebook.

demo_c <- nhanes("DEMO_C", translated = TRUE)
DT::datatable(demo_c)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

Further analysis can be performed on these resulting datasets which are regular R data frames. Simple examples of such analyses, and other functionality in the nhanesA package such as search utilities, are described in Ale et al, 2024.

Limitations of this approach

The nhanesA package is designed to access NHANES data on demand from the CDC website. The efficiency of such an approach is naturally limited by available bandwidth. Another limitation that is not obvious at first glance is apparent when we try to combine data across multiple cycles. Not all variables are measured in all cycles, and even when they are, they may not be included in the same tables (and sometimes they are included in multiple tables). Analyzing the availability of variables of interest is difficult with the rudimentary search facilities available on the NHANES website.

Another subtle issue that is important from the perspective of reproducible research is the possibility of data updates (see below). NHANES is an ongoing program, so new datasets are released on a regular basis. More importantly from a reproducibility angle, previously released datasets are sometimes updated. Older versions are not retained on the NHANES website. This means that an analysis performed on a given date may be impossible to recreate on a later date, unless the relevant data sets have been retained.

Efficient and and reproducible analyses of NHANES data

To address these limitations, we have developed several tools, each building on the previous ones, to create a user-friendly platform for analysts who are comfortable with R as a data analysis platform. Briefly,

  • The cachehttp package enables local caching of NHANES data and documentation files that are only re-downloaded if they have been updated.

  • The nhanes-snapshot repository is used to download and periodically update raw data (as compressed CSV files) and documentation (as HTML files) with timestamps, so that they can serve as a snapshot of NHANES data available on specific dates.

  • The nhanes-postgres repository uses these snapshots to populate a Postgresql database inside a Docker container.

  • The nhanesA package has been modified to recognize the database when it is avilable, and use it as an alternative data source for both data and documentation, bypassing the NHANES website. Using nhanesA in this mode leads to speedup of several orders of magnitude while requiring almost no change in user code.

  • The phonto package provides more advanced analysis tools that take advantage of the local database.

The easiest way to get started with these tools is to run the nhanes-postgres docker image as described in the README. In addition to the Postgresql database, the container includes R and RStudio Server along with versions of nhanesA and phonto configured to use the database. Once the included instance of RStudio Server is accessed through a browser, one can use it as a regular R session without the need to explicitly interact with the backend database in any way. This is not, however, the only way, and advanced users may prefer to use only the database from the container, accessing it from outside via port forwarding.

Other articles on this site describe more detailed examples of analyses using these tools, as well as other checks and utilities that help with such analyses.

Frequency of NHANES data releases

We conclude this document with a brief look at how frequently NHANES data files are published and / or updated, based on the information contained in the table manifest.

Recall from above that the NHANES table manifest includes a Date.Published column. This allows us to tabulate NHANES data release dates. We expect that bulk releases of tables happen all together, generally in two year intervals, while some tables may be released or updated on an as-needed basis.

The release information (available by month of release) can be summarized by tabulating the Date.Published field:

xtabs(~ Date.Published, manifest) |> sort() |> tail(20)
Date.Published
        December 2007             July 2010             June 2020 
                   13                    13                    13 
 Updated October 2014             July 2022           August 2021 
                   14                    15                    17 
        December 2018         November 2007         November 2021 
                   17                    17                    18 
Updated November 2020              May 2004             June 2002 
                   19                    21                    34 
         October 2015        September 2011        September 2013 
                   37                    38                    38 
       September 2017        September 2009         February 2020 
                   40                    41                    48 
       September 2024    Updated April 2022 
                   55                    59 

Parsing these dates systematically, we get

pubdate <- manifest$Date.Published
updates <- startsWith(pubdate, "Updated")
datesplit <- strsplit(pubdate, split = "[[:space:]]")
datesplit[updates] <- lapply(datesplit[updates], "[", -1)
pub_summary <-
    data.frame(updated = updates,
               year = sapply(datesplit, "[[", 2) |> as.numeric(),
               month = sapply(datesplit, "[[", 1) |> factor(levels = month.name))

Although there are a few too many months, we can plot the number of releases + updates by month as follows.

pubfreq <- xtabs(~ interaction(month, year, sep = "-") + updated, pub_summary)
npub <- rowSums(pubfreq)
npub.date <- as.Date(paste0("01", "-", names(npub)), format = "%d-%B-%Y")
xyplot(npub ~ npub.date, type = "h", grid = TRUE,
       xlab = "Month", ylab = "Number of tables published / updated") +
    latticeExtra::layer(panel.text(x[y > 30], y[y > 30],
                                   format(x[y > 30], "%Y-%m"),
                                   pos = 3, cex = 0.75))

We can also plot the release / update frequency by year as follows.

xtabs(~ year + updated, pub_summary) |>
    barchart(horizontal = FALSE, ylab = "Number of tables",
             auto.key = list(text = c("Original", "Update"), columns = 2),
             scales = list(x = list(rot = 45)))

A full table of number of releases by month is given by the following, showing that there is at least one update almost every month.

pubfreq0 <- pubfreq[rowSums(pubfreq) > 0, , drop = FALSE]
pubfreq0
                                   updated
interaction(month, year, sep = "-") FALSE TRUE
                     June-2002         34    0
                     February-2003      0    1
                     September-2003     1    0
                     January-2004       5    0
                     May-2004          21    3
                     June-2004          2    0
                     July-2004         10    1
                     September-2004     5    3
                     November-2004      2    0
                     December-2004      2    0
                     January-2005       3    1
                     February-2005      4    2
                     April-2005         1    0
                     June-2005          1    3
                     August-2005        1    0
                     October-2005       0    2
                     November-2005      7    0
                     December-2005      7    1
                     January-2006       2    0
                     February-2006      6    0
                     March-2006         3    3
                     April-2006         7    6
                     May-2006           2    2
                     June-2006          6    2
                     July-2006          8    1
                     August-2006       11    7
                     September-2006     4    1
                     November-2006      1    0
                     December-2006      4    0
                     January-2007       1    2
                     February-2007      1    0
                     March-2007         1    5
                     May-2007           1    1
                     June-2007          0    2
                     July-2007          2    3
                     August-2007        0    3
                     September-2007     0    1
                     October-2007       2    3
                     November-2007     17    7
                     December-2007     13    2
                     January-2008      12    1
                     February-2008      4    0
                     March-2008        12    2
                     April-2008        12    2
                     May-2008           4    2
                     June-2008          4    3
                     July-2008          7    0
                     August-2008        1    0
                     September-2008     2    1
                     October-2008       3    0
                     December-2008      4    0
                     January-2009       2    1
                     February-2009      1    0
                     March-2009         4    0
                     April-2009         4    0
                     May-2009           1    0
                     June-2009          1    4
                     July-2009          0    5
                     August-2009        1    1
                     September-2009    41    5
                     October-2009       3    1
                     December-2009      2    3
                     January-2010       8    0
                     February-2010      1    1
                     March-2010         5    0
                     April-2010         2    5
                     May-2010           2    5
                     June-2010          3    1
                     July-2010         13    4
                     August-2010        3    2
                     September-2010     6    1
                     October-2010       1    3
                     November-2010      2    2
                     March-2011         0    1
                     April-2011         0    3
                     June-2011          3    0
                     August-2011        2    1
                     September-2011    38    6
                     October-2011       4    1
                     November-2011      1    0
                     December-2011      5    0
                     January-2012      11    5
                     February-2012      3    1
                     March-2012         1    6
                     April-2012         6    0
                     May-2012           2    0
                     June-2012         12    0
                     July-2012          2    0
                     August-2012        4    1
                     September-2012     3    1
                     October-2012       1    0
                     November-2012      2    0
                     December-2012      1    0
                     January-2013       3    1
                     February-2013      4    1
                     March-2013         2    1
                     April-2013         2    2
                     May-2013           0    8
                     June-2013          3    3
                     July-2013          5    0
                     August-2013        0    1
                     September-2013    38    0
                     October-2013       2    2
                     November-2013      8    0
                     December-2013      1    1
                     January-2014       4    0
                     February-2014      3    2
                     March-2014         7    1
                     April-2014         1    1
                     May-2014           0    1
                     June-2014          1    0
                     July-2014          4    1
                     August-2014        1    1
                     September-2014     6    1
                     October-2014       0   14
                     November-2014      1    0
                     December-2014      4    4
                     January-2015       4    2
                     February-2015      4    6
                     March-2015         1    0
                     May-2015           0    5
                     June-2015          0    2
                     July-2015          1    0
                     August-2015        1    0
                     September-2015     1    1
                     October-2015      37    5
                     November-2015      3    0
                     December-2015      3    0
                     January-2016      10    1
                     February-2016      3    0
                     March-2016         7    3
                     April-2016         4    0
                     May-2016           3    1
                     June-2016          4    0
                     July-2016          2    0
                     August-2016        4    3
                     September-2016     5    4
                     October-2016       1    2
                     November-2016      0    1
                     December-2016      6    6
                     February-2017      3    1
                     March-2017         4    3
                     April-2017         4    0
                     June-2017          1    0
                     August-2017        1    4
                     September-2017    40    3
                     October-2017       2    0
                     December-2017      6    5
                     January-2018       1    0
                     February-2018      3    1
                     March-2018         3    0
                     April-2018         5    3
                     May-2018           3    0
                     June-2018          9    0
                     July-2018          5    0
                     September-2018     5    0
                     October-2018       4    0
                     November-2018      7    1
                     December-2018     17    0
                     January-2019       6    0
                     February-2019      4    6
                     March-2019         0    2
                     April-2019         5    1
                     May-2019           5    0
                     June-2019          1    1
                     August-2019        2    0
                     September-2019     9    0
                     October-2019       0    2
                     November-2019      3    4
                     December-2019      6    3
                     January-2020       2    1
                     February-2020     48    1
                     March-2020        11    0
                     April-2020         2    1
                     May-2020           3    0
                     June-2020         13    1
                     July-2020          5    0
                     August-2020        8    1
                     October-2020       5    0
                     November-2020      7   19
                     December-2020      3    0
                     February-2021      1    1
                     March-2021         0    1
                     April-2021         7    0
                     May-2021          10    1
                     June-2021         12    1
                     July-2021          8    0
                     August-2021       17    1
                     September-2021     9    1
                     October-2021       6    0
                     November-2021     18    7
                     December-2021      4    1
                     January-2022       2    0
                     February-2022      1    6
                     March-2022         6    0
                     April-2022         1   59
                     May-2022           1    5
                     June-2022          4    0
                     July-2022         15    9
                     August-2022        5    2
                     September-2022     8    0
                     October-2022       0    2
                     November-2022      2    0
                     December-2022      0    2
                     January-2023       1    0
                     February-2023      3    0
                     March-2023         1    0
                     April-2023         1    0
                     May-2023           2    3
                     July-2023          1    0
                     September-2023     3    2
                     October-2023       0    2
                     November-2023      2    0
                     January-2024       1    0
                     May-2024           1    0
                     June-2024          1    0
                     July-2024          1    0
                     September-2024    55    0
                     October-2024       8    0
                     December-2024      4    0
                     February-2025      4    0
                     April-2025         0    2

Session information

print(sessionInfo(), locale = FALSE)
R Under development (unstable) (2025-04-06 r88113)
Platform: x86_64-apple-darwin22.2.0
Running under: macOS Ventura 13.1

Matrix products: default
BLAS:   /usr/local/Cellar/openblas/0.3.29/lib/libopenblasp-r0.3.29.dylib 
LAPACK: /Users/deepayan/local/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nhanesA_1.3      kableExtra_1.4.0 lattice_0.22-7   knitr_1.43      

loaded via a namespace (and not attached):
 [1] sass_0.4.8          utf8_1.1.4          generics_0.1.3     
 [4] xml2_1.3.2          jpeg_0.1-10         stringi_1.5.3      
 [7] hms_1.1.3           digest_0.6.34       magrittr_2.0.3     
[10] RColorBrewer_1.1-2  evaluate_0.21       grid_4.6.0         
[13] timechange_0.2.0    fastmap_1.1.0       blob_1.2.4         
[16] plyr_1.8.6          jsonlite_1.8.8      RPostgres_1.4.6    
[19] DBI_1.2.2           httr_1.4.2          rvest_1.0.3        
[22] purrr_1.0.2         selectr_0.4-2       crosstalk_1.1.1    
[25] viridisLite_0.4.1   scales_1.2.1        jquerylib_0.1.4    
[28] codetools_0.2-20    cli_3.6.2           rlang_1.1.3        
[31] dbplyr_2.5.0        bit64_4.0.5         munsell_0.5.0      
[34] cachem_1.0.4        yaml_2.2.1          tools_4.6.0        
[37] deldir_1.0-6        dplyr_1.1.4         interp_1.1-4       
[40] colorspace_2.0-0    DT_0.18             curl_4.3.1         
[43] png_0.1-7           vctrs_0.6.5         R6_2.5.1           
[46] lifecycle_1.0.3     lubridate_1.9.3     stringr_1.5.1      
[49] htmlwidgets_1.5.3   bit_4.0.5           foreign_0.8-90     
[52] pkgconfig_2.0.3     bslib_0.6.1         pillar_1.10.0      
[55] glue_1.6.2          Rcpp_1.0.10         systemfonts_1.0.6  
[58] xfun_0.49           tibble_3.2.1        tidyselect_1.2.1   
[61] latticeExtra_0.6-30 rstudioapi_0.13     htmltools_0.5.7    
[64] rmarkdown_2.26      svglite_2.1.1       compiler_4.6.0     

References

  • Laha Ale, Robert Gentleman, Teresa Filshtein Sonmez, Deepayan Sarkar, Christopher Endres (2024). nhanesA: achieving transparency and reproducibility in NHANES research. Database, Volume 2024, baae028, https://doi.org/10.1093/database/baae028