{densityarea}
To get started with using {densityarea}
, we’ll need to
load some packages, and some data to work with.
{densityarea}
is meant to play nicely with tidyverse-style
data processing, in addition to loading the package itself, we’ll also
load {dplyr}
. We have the option of working with the
density polygons in the form of simple features from {sf}
,
so we’ll load that as well. Finally, we’ll load {ggplot2}
and {ggdensity}
for the sake of data visualization.
The dataset s01
is a data frame of vowel formant
measurements.
data(s01, package = "densityarea")
head(s01)
#> # A tibble: 6 × 10
#> name age sex word vowel plt_vclass ipa_vclass F1 F2 dur
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 s01 y f OKAY EY eyF ejF 764. 2088. 0.2
#> 2 s01 y f UM AH uh ʌ 700. 1881. 0.19
#> 3 s01 y f I'M AY ay aj 889. 1934. 0.07
#> 4 s01 y f LIVED IH i ɪ 556. 1530. 0.05
#> 5 s01 y f IN IH i ɪ 612. 2323. 0.06
#> 6 s01 y f COLUMBUS AH @ ə 612. 1904. 0.07
Let’s plot the original, raw data from s01
, with the
Highest Density Regions overlaid (thanks to the {ggdensity}
package).
ggplot(data = s01,
aes(x = F2,
y = F1)
)+
geom_point(alpha = 0.1)+
stat_hdr(probs = c(0.8, 0.5),
aes(fill = after_stat(probs)),
color = "black",
alpha = 0.8)+
scale_y_reverse()+
scale_x_reverse()+
scale_fill_brewer(type = "seq")+
coord_fixed()
The function ggdensity::get_hdr()
is perfect for quickly
adding interpretable densities to your plots. To work with these
densities as polygons, we can use
densityarea::density_polygons()
.
Per the name of the package, we can get the area within each of these
density polygons with density_area()
.
As a first data processing step, let’s log transform and flip our
F1
and F2
values.
To get the area within the 80% density polygon for the entire data
set, we’ll pass s01
through a dplyr::reframe()
function.
s01 |>
group_by(name) |>
reframe(
density_area(lF2, lF1, probs = 0.8)
)
#> # A tibble: 1 × 4
#> name level_id prob area
#> <chr> <int> <dbl> <dbl>
#> 1 s01 1 0.8 0.406
Or, if we wanted the areas associated with subsets of the data (say,
for each vowel
) we’d just change our
dplyr::group_by()
call.
Let’s rearrange the order of rows to see the largest areas first.
vowel_areas |>
arrange(desc(area))
#> # A tibble: 15 × 5
#> name vowel level_id prob area
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 s01 AO 1 0.8 0.488
#> 2 s01 AA 1 0.8 0.278
#> 3 s01 AH 1 0.8 0.274
#> 4 s01 AE 1 0.8 0.258
#> 5 s01 UW 1 0.8 0.229
#> 6 s01 EH 1 0.8 0.226
#> 7 s01 IY 1 0.8 0.206
#> 8 s01 UH 1 0.8 0.203
#> 9 s01 OW 1 0.8 0.197
#> 10 s01 IH 1 0.8 0.186
#> 11 s01 AY 1 0.8 0.171
#> 12 s01 AW 1 0.8 0.170
#> 13 s01 ER 1 0.8 0.145
#> 14 s01 EY 1 0.8 0.101
#> 15 s01 OY 1 0.8 0.0904
In the simplest approach, we can use density_polygons()
to return a data frame for just one probability level, 60%.
s01 |>
group_by(name) |>
reframe(
density_polygons(lF2, lF1, probs = 0.6)
)->
sixty_poly_df
head(sixty_poly_df)
#> # A tibble: 6 × 7
#> name level_id id prob lF2 lF1 order
#> <chr> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 s01 1 1 0.6 -7.44 -6.78 1
#> 2 s01 1 1 0.6 -7.42 -6.76 2
#> 3 s01 1 1 0.6 -7.41 -6.75 3
#> 4 s01 1 1 0.6 -7.40 -6.72 4
#> 5 s01 1 1 0.6 -7.40 -6.69 5
#> 6 s01 1 1 0.6 -7.40 -6.69 6
Now, it’s possible for the HDR polygon to actually come in multiple pieces, but in this case, there’s just one polygon, so we can plot it.
To get polygons associated with multiple probability levels, we
simply pass a vector of values to probs
.
s01 |>
group_by(name) |>
reframe(
density_polygons(lF2,
lF1,
probs = c(0.6, 0.8))
)->
multi_poly_df
head(multi_poly_df)
#> # A tibble: 6 × 7
#> name level_id id prob lF2 lF1 order
#> <chr> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 s01 2 1 0.8 -7.78 -6.01 1
#> 2 s01 2 1 0.8 -7.80 -6.01 2
#> 3 s01 2 1 0.8 -7.82 -6.02 3
#> 4 s01 2 1 0.8 -7.85 -6.02 4
#> 5 s01 2 1 0.8 -7.87 -6.02 5
#> 6 s01 2 1 0.8 -7.90 -6.02 6
We can also get density_polygons()
to return the
polygons as simple features, as defined in the {sf}
package, by passing it the argument as_sf = TRUE
.
s01 |>
group_by(name) |>
reframe(
density_polygons(lF2,
lF1,
probs = c(0.8, 0.6),
as_sf = TRUE)
) |>
st_sf()->
multi_poly_sf
The final function there, sf::st_sf()
, wasn’t strictly
necessary, but makes life a little easier for plotting. Here’s what the
result looks like:
multi_poly_sf
#> Simple feature collection with 2 features and 3 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -7.93703 ymin: -6.913825 xmax: -7.208434 ymax: -6.009484
#> CRS: NA
#> # A tibble: 2 × 4
#> name level_id prob geometry
#> <chr> <int> <dbl> <POLYGON>
#> 1 s01 1 0.6 ((-7.636315 -6.19764, -7.645598 -6.201069, -7.65986 -6.2…
#> 2 s01 2 0.8 ((-7.777586 -6.009484, -7.801131 -6.010429, -7.824676 -6…
And here’s a plot.