This package aims to make really easy to tidy data retrieved from Gapminder. A the beginning is:
When you have loaded the package you are now in possession of two super powers (functions): tidy_indice and tidy_bunch.
tidy_indice
function tidy as explain above tidy a data
sheet downloaded on Gapminder. This data sheet can be either in csv or
xlsx as indicated on the gapminder site.
tidy_indice
take as argument the path to the file and
return the data as a tidy data frame.
filepath <- system.file("extdata", "life_expectancy_years.csv", package = "tidygapminder")
# From .............................
df <- readr::read_csv(filepath)
#> Rows: 187 Columns: 220
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): country
#> dbl (219): 1800, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810,...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df)
#> # A tibble: 6 × 220
#> country `1800` `1801` `1802` `1803` `1804` `1805` `1806` `1807` `1808` `1809`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghani… 28.2 28.2 28.2 28.2 28.2 28.2 28.1 28.1 28.1 28.1
#> 2 Albania 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4 35.4
#> 3 Algeria 28.8 28.8 28.8 28.8 28.8 28.8 28.8 28.8 28.8 28.8
#> 4 Andorra NA NA NA NA NA NA NA NA NA NA
#> 5 Angola 27 27 27 27 27 27 27 27 27 27
#> 6 Antigua… 33.5 33.5 33.5 33.5 33.5 33.5 33.5 33.5 33.5 33.5
#> # ℹ 209 more variables: `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>,
#> # `1814` <dbl>, `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>,
#> # `1819` <dbl>, `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>,
#> # `1824` <dbl>, `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>,
#> # `1829` <dbl>, `1830` <dbl>, `1831` <dbl>, `1832` <dbl>, `1833` <dbl>,
#> # `1834` <dbl>, `1835` <dbl>, `1836` <dbl>, `1837` <dbl>, `1838` <dbl>,
#> # `1839` <dbl>, `1840` <dbl>, `1841` <dbl>, `1842` <dbl>, `1843` <dbl>, …
# To................................
ti_df <- tidy_indice(filepath)
head(ti_df)
#> # A tibble: 6 × 3
#> country year life_expectancy_years
#> <chr> <dbl> <dbl>
#> 1 Afghanistan 1800 28.2
#> 2 Afghanistan 1801 28.2
#> 3 Afghanistan 1802 28.2
#> 4 Afghanistan 1803 28.2
#> 5 Afghanistan 1804 28.2
#> 6 Afghanistan 1805 28.2
tidy_bunch
makes use of tidy_indice
to tidy
a whole set of data sheets and have the options to merge all data frames
into one big data frame with merge
set to
TRUE
:
dir_path <- system.file("extdata", package = "tidygapminder")
# From ................................
list.files(dir_path)
#> [1] "agriculture_land.xlsx" "life_expectancy_years.csv"
# To ..................................
td_dp <- tidy_bunch(dir_path, merge = TRUE)
head(td_dp)
#> country year Agricultural land (% of land area) life_expectancy_years
#> 1 Afghanistan 1800 NA 28.2
#> 2 Afghanistan 1801 NA 28.2
#> 3 Afghanistan 1802 NA 28.2
#> 4 Afghanistan 1803 NA 28.2
#> 5 Afghanistan 1804 NA 28.2
#> 6 Afghanistan 1805 NA 28.2
Enjoy!!!