--- title: "Datasets in geofi-package" author: "Markus Kainu, Leo Lahti & Joona Lehtomäki" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Datasets in geofi-package} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.height = 7, fig.width = 7, dpi = 75 ) ``` **`geofi`-package provides access to multiple dataset of different types and for different use. In this vignette we introduce the different datas and explain their use cases. Vignette *Making maps using `geofi`-package* provides multiple real-world examples of their usage.** **Packages installation** `geofi` can be installed from CRAN using ```{r, eval = FALSE} # install from CRAN install.packages("geofi") # Install development version from GitHub remotes::install_github("ropengov/geofi") ``` ```{r include = FALSE, eval = TRUE} # Let's first create a function that checks if the suggested # packages are available check_namespaces <- function(pkgs){ return(all(unlist(sapply(pkgs, requireNamespace,quietly = TRUE)))) } apiacc <- geofi::check_api_access() pkginst <- check_namespaces(c("geofacet","ggplot2","dplyr")) apiacc_pkginst <- all(apiacc,pkginst) ``` ## Municipality keys Official administrative regions in Finland are based on municipalities. In 2021 there are 309 municipalities in Finland and the number is decreasing over time through mergers.\ \ Each municipality belongs to a higher level regional classifications such as regions (maakunta) or health care districts (sairaanhoitopiiri). `municipality_key_`-datasets are based on Statistics Finland [Statistical classification](https://data.stat.fi/api/classifications/v2/) -api with few modification and provided on yearly basis. ```{r municipality_keys, eval = pkginst} library(geofi) library(dplyr) d <- data(package = "geofi") as_tibble(d$results) %>% select(Item,Title) %>% filter(grepl("municipality_key", Item)) ``` Looking at the names of \`municipality_key_2023\` there is 69 different variables from each municipality. ```{r municipality_key_names, eval = TRUE} names(geofi::municipality_key_2023) ``` With these municipality keys you can easily aggregate municipalities for plotting or you can list different regional breakdowns.\ ```{r municipality_key_maakunta, eval = pkginst} geofi::municipality_key_2023 %>% count(maakunta_code,maakunta_name_fi,maakunta_name_sv,maakunta_name_en) ``` Municipality keys are joined with the municipality spatial data by default, meaning that data returned by `get_municipality()` can be aggregated as it is. ## Spatial data Spatial data is provided as administrative regions (polygons), population and statistical grids (polygons) and municipality centers (points). ### Municipality borders Municipality borders are provided yearly from 2013 and in two scales 1: 1 000 000 and 1:4 500 000. Use `1000` or `4500` as value for `scale`-argument, respectively. ```{r municipality_map, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} municipalities <- get_municipalities(year = 2023, scale = 4500) plot(municipalities["municipality_name_fi"], border = NA) ``` ### Municipality borders with population In 2022 a new data source is introduced that provides you municipality borders with municipality population data. Spatial data is provided in 1:4 500 000 scale. Calling the function with year = 2019 returns population data from 2019-12-31 with spatial data on borders from 2020. The statistical variables in the data are: total population (vaesto), share of the total population (vaesto_p), number of men (miehet), men's share of the population in an area (miehet_p) and women (naiset), women's share (naiset_p), those aged under 15: number (ika_0_14), share (ika_0_14p), those aged 15 to 64: number (ika_15_64), share (ika_15_64p), and aged 65 or over: number (ika_65_), share (ika_65_p). To plot men's share at the municipality level in 2020 (2021 municipality borders) you can simply to this. ```{r muni_pop_map1, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} get_municipality_pop(year = 2022) %>% subset(select = miehet_p) %>% plot() ``` Aggregating the absolute population numbers is straightforward: to plot population at Wellbeing service county level you can do. ```{r muni_pop_map2, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} get_municipality_pop(year = 2022) %>% group_by(hyvinvointialue_name_fi) %>% summarise(vaesto = sum(vaesto)) %>% select(vaesto) %>% plot() ``` To plot the men's share at wellbeing service country level you have to add one more step ```{r muni_pop_map3, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} get_municipality_pop(year = 2022) %>% dplyr::group_by(hyvinvointialue_name_fi) %>% summarise(vaesto = sum(vaesto), miehet = sum(miehet)) %>% mutate(share = miehet/vaesto*100) %>% select(share) %>% plot() ``` ### Zipcodes Zipcodes are provided in a single resolution from 2015. ```{r zipcode_map, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} zipcodes <- get_zipcodes(year = 2023) plot(zipcodes["nimi"], border = NA) ``` ### Statistical grid [Grid net for statistics](https://stat.fi/org/avoindata/paikkatietoaineistot/tilastoruudukko_1km_en.html) both in 1 km x 1 km and 5 km x 5km covers whole of Finland. The grid net includes all grid squares in Finland. Statistics Finland [proprietary grid database](https://stat.fi/tup/ruututietokanta/index_en.html) provides the attribute statistical data for these grid nets. ```{r statisticsl_grid_data, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} stat_grid <- get_statistical_grid(resolution = 5, auxiliary_data = TRUE) plot(stat_grid["euref_x"], border = NA) ``` ### Population grid Number of population by both 1 km x 1 km and 5 km x 5 km grids. The number of population on the last day of the reference year (31 December) by age group. Data includes only inhabited grids. The statistical variables of the data are: Total population (`vaesto`), number of men (`miehet`) and women (`naiset`), under 15 year olds (`ika_0_14`), 15-64 year olds (`ika_15_64`), and aged over 65 (`ika_65_`). Only the number of population is reported for grids of under 10 inhabitants. See [Population grid data](https://stat.fi/org/avoindata/paikkatietoaineistot/vaestoruutuaineisto_5km_en.html). The data describes the population distribution independent of administrative areas (such as municipal borders). The data is suitable for examination of population distribution and making various spatial analysis. ```{r population_grid_data, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} pop_grid <- get_population_grid(year = 2018, resolution = 5) plot(pop_grid["kunta"], border = NA) ``` ### Central localities of municipalities [National Land Survey of Finland](https://maanmittauslaitos.fi) maintains [Topological Database](https://www.maanmittauslaitos.fi/en/maps-and-spatial-data/datasets-and-interfaces/product-descriptions/topographic-database) that contains a wide range of layers from which you can access the locations of central localities of each municipality in Finland. ```{r central_localities, fig.height = 7, fig.width = 4, eval = apiacc_pkginst} plot(municipality_central_localities["teksti"]) ``` ## Custom geofacet grid data From Ryan Hafen's [blog](https://ryanhafen.com/blog/geofacet/): > The [geofacet](https://hafen.github.io/geofacet/) package extends [ggplot2](https://ggplot2.tidyverse.org/) in a way that makes it easy to create geographically faceted visualizations in R. To geofacet is to take data representing different geographic entities and apply a visualization method to the data for each entity, with the resulting set of visualizations being laid out in a grid that mimics the original geographic topology as closely as possible. `geofi`-package contains custom grids to be used with various Finnish administrative breakdowns as listed below. ```{r geofacets, eval = apiacc_pkginst} d <- data(package = "geofi") as_tibble(d$results) %>% select(Item,Title) %>% filter(grepl("grid", Item)) %>% print(n = 100) ``` Here is an example where population data at municipality level is pulled from THL from 2000 to 2022, then aggregated at the levels of regions (`maakunta`) and then plotted with ggplot2 using grid `geofi::grid_maakunta`. Population data is provided as part of geofi package as `geofi::sotkadata_population`. ```{r geofacet, fig.height = 8, fig.width = 10, eval = apiacc_pkginst} # Let pull population data from THL sotkadata <- geofi::sotkadata_population # lets aggregate population data dat <- left_join(geofi::municipality_key_2023 %>% select(-year), sotkadata) %>% group_by(maakunta_code, maakunta_name_fi,year) %>% summarise(population = sum(primary.value, na.rm = TRUE)) %>% na.omit() %>% ungroup() %>% rename(code = maakunta_code, name = maakunta_name_fi) library(geofacet) library(ggplot2) ggplot(dat, aes(x = year, y = population/1000, group = name)) + geom_line() + facet_geo(facets = ~name, grid = grid_maakunta, scales = "free_y") + theme(axis.text.x = element_text(size = 6)) + scale_x_discrete(breaks = seq.int(from = 2000, to = 2023, by = 5)) + labs(title = unique(sotkadata$indicator.title.fi), y = "%") ```