Design Principles for {ColOpenData}

This vignette outlines the design decisions that have been taken during the development of the ColOpenData R package, and provides some of the reasoning, and possible pros and cons of each decision.

This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.

Scope

ColOpenData provides an easy access to Colombian Open Government Data from two main data sources: Departamento Administrativo Nacional de Estadística (DANE), and Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM). These include Demographic, Geospatial, Climate, and Population Projections’ datasets. The package centralizes information an provides harmonized and cleaned datasets for easier data analysis.

In addition, the package includes a series of documentation and dictionaries to help the user navigate the open datasets available. Further information regarding included documentation, naming conventions, and uses can be found here: Documentation and Datasets

Output

The output for each function is a data.frame with the requested data. When downloading geospatial data, which includes maps, an sf object is created.

Design Decisions

A manual cleaning process was necessary to address issues like inconsistent headers in the original datasets, while harmonization ensured alignment with tidy data standards and national naming conventions. To avoid repeating these steps at each download, and ensure stable and independent access to open data, the already processed datasets were stored on a private server for later redistribution. The datasets list and dictionaries are stored inside the package data, to avoid unnecessary downloads of frequently consulted datasets.

Additional particularities include:

Dependencies

The package dependencies are:

Additional Considerations

The datasets included in the package contain changes which may alter the structure, format, or content, meaning the data does not reflect the official dataset. The package is developed independently, with no endorsement or involvement from these institutions or any Colombian government body. The authors of ColOpenData are not liable for how users utilize the data, and users are responsible for any outcomes from their use or analysis of the data.