A simple look into tidygapminder

This package aims to make really easy to tidy data retrieved from Gapminder. A the beginning is:

library(tidygapminder)

When you have loaded the package you are now in possession of two super powers (functions): tidy_indice and tidy_bunch.

tidy_indice

tidy_indice function tidy as explain above tidy a data sheet downloaded on Gapminder. This data sheet can be either in csv or xlsx as indicated on the gapminder site.

tidy_indice take as argument the path to the file and return the data as a tidy data frame.

filepath <- system.file("extdata", "life_expectancy_years.csv", package = "tidygapminder")

# From .............................
df <- readr::read_csv(filepath)
#> Rows: 187 Columns: 220
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr   (1): country
#> dbl (219): 1800, 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810,...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(df)
#> # A tibble: 6 × 220
#>   country  `1800` `1801` `1802` `1803` `1804` `1805` `1806` `1807` `1808` `1809`
#>   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 Afghani…   28.2   28.2   28.2   28.2   28.2   28.2   28.1   28.1   28.1   28.1
#> 2 Albania    35.4   35.4   35.4   35.4   35.4   35.4   35.4   35.4   35.4   35.4
#> 3 Algeria    28.8   28.8   28.8   28.8   28.8   28.8   28.8   28.8   28.8   28.8
#> 4 Andorra    NA     NA     NA     NA     NA     NA     NA     NA     NA     NA  
#> 5 Angola     27     27     27     27     27     27     27     27     27     27  
#> 6 Antigua…   33.5   33.5   33.5   33.5   33.5   33.5   33.5   33.5   33.5   33.5
#> # ℹ 209 more variables: `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>,
#> #   `1814` <dbl>, `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>,
#> #   `1819` <dbl>, `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>,
#> #   `1824` <dbl>, `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>,
#> #   `1829` <dbl>, `1830` <dbl>, `1831` <dbl>, `1832` <dbl>, `1833` <dbl>,
#> #   `1834` <dbl>, `1835` <dbl>, `1836` <dbl>, `1837` <dbl>, `1838` <dbl>,
#> #   `1839` <dbl>, `1840` <dbl>, `1841` <dbl>, `1842` <dbl>, `1843` <dbl>, …

# To................................

ti_df <- tidy_indice(filepath)

head(ti_df)
#> # A tibble: 6 × 3
#>   country      year life_expectancy_years
#>   <chr>       <dbl>                 <dbl>
#> 1 Afghanistan  1800                  28.2
#> 2 Afghanistan  1801                  28.2
#> 3 Afghanistan  1802                  28.2
#> 4 Afghanistan  1803                  28.2
#> 5 Afghanistan  1804                  28.2
#> 6 Afghanistan  1805                  28.2

tidy_bunch

tidy_bunch makes use of tidy_indice to tidy a whole set of data sheets and have the options to merge all data frames into one big data frame with merge set to TRUE:

dir_path <- system.file("extdata", package = "tidygapminder")

# From ................................
list.files(dir_path)
#> [1] "agriculture_land.xlsx"     "life_expectancy_years.csv"

# To ..................................
td_dp <- tidy_bunch(dir_path, merge = TRUE)

head(td_dp)
#>       country year Agricultural land (% of land area) life_expectancy_years
#> 1 Afghanistan 1800                                 NA                  28.2
#> 2 Afghanistan 1801                                 NA                  28.2
#> 3 Afghanistan 1802                                 NA                  28.2
#> 4 Afghanistan 1803                                 NA                  28.2
#> 5 Afghanistan 1804                                 NA                  28.2
#> 6 Afghanistan 1805                                 NA                  28.2

Enjoy!!!