This material was developed as part of a course to learn R
and RStudio
and gain proficiency in data analysis. The sample datasets used are tabular: tables of data collected by National Health and Nutrition Examination Survey (NHANES) a survey research program conducted by the National Center for Health Statistics (NCHS) to assess the health and nutritional status of adults and children in the United States, and to track changes over time.
Readers can use the material as a series of “self-paced tutorials” as the chapters were written in such a manner that all commands are available as well as data files.
The material is suited for beginners. Completing this course material has the following goals:
– Install and run R
and Rstudio
software with additional packages
– Understand programming concepts such as variables, conditional statements, data stream, and pipelines
– Examine, compare and contrast data
– Illustrate analyzes with graphics and plots
– Compose reproducible reports that can be automated
At the end of the course you’ll have acquired sufficient proficiency and independence to use the software R
within the RStudio
graphical interface to analyze complex environmental datasets in tabular form and create useful and reproducible reports with annotated graphics.
In addition this course provides a guide for both “classic R” and the newer “Tidyverse” methods of analysis. Students also explore creating “dynamic documents” with “markdown” in “literate programming” philosophy that is now recognized as a useful step forward towards “reproducible research.”
Tutorial Course – Tabular data analysis with R and Tidyverse
The course material is available online as HTML for easy Copy/Paste and is also available as a single PDF document for easy printing. Links are also live in the PDF that has and additional clickable Index to more easily find topics of interest.
Updated version (2024)
- HTML (Bookdown form)
- PDF (21.5Mb ,244 pages)
- New: Slides with preview – (or: simple list)
Previous version (2022).
– HTML (Bookdown form)
– PDF (21.5Mb ,244 pages)
Datasets
Instructions to download the 5 (five) datasets from the NHANES source are included in the material. However, these are also available to download here for convenience. See details within to understand how to use these files that are in .XPT
“transfer” format from the SAS
software.
- All 5 files:
XPTs.zip
(1Mb) - Demographics: DEMO_I.XPT (3.6 Mb) – (Data info)
- Body Mass Index: BMX_I.XPT (1.9Mb) – (Data info)
- Total Cholesterol: TCHOL_I.XPT (189 Kb) – (Data info)
- Albumin/Creatinine: ALB_CR_I.XPT(540Kb) – (Data info)
- Perfluoroalkyl and Polyfluoroalkyl PFAS_I.XPT (377 Kb) – (Data info)
The merged “Master4” file that is created during the course can also be downloaded as well as a .csv
file with either: