Nodal Attribute Specification in ERGM Terms

The Statnet Team

2024-11-06

This document provides some examples of how to specify nodal attributes and their transformations in ergm terms. For the R help on the topic, see ?nodal_attributes and for help on implementing terms that use this interface, see API?nodal_attributes.

Extraction and transformation

It is sometimes desirable to specify a transformation of a nodal attribute as a covariate in a model term. Most ergm terms now support a new Tidyverse-inspired user interface to do so. Arguments using this interface are typically called attr, attrs, by, or on and are interpreted depending on their type:

character string

Extract the vertex attribute with this name.

character vector of length greater than 1

Extract the vertex attributes and paste them together, separated by dots if the term expects categorical attributes and (typically) combine into a covariate matrix if it expects quantitative attributes.

function

The function is called on the LHS network, expected to return a vector or matrix of appropriate dimension. (Shorter vectors and matrix columns will be recycled as needed.)

formula

Borrowing the interface from , the expression on the right hand side of the formula is evaluated in an environment of the vertex attributes of the network, expected to return a vector or matrix of appropriate dimension. (Shorter vectors and matrix columns will be recycled as needed.) Within this expression, the network itself accessible as either or . For example, in the example below, would return the absolute difference of each actor’s “Grade” attribute from its network-wide mean, divided by the network size.

object created by

Use as is, checking only for correct length and type, with optional attribute indicating the predictor’s name.

Any of these arguments may also be wrapped in COLLAPSE_SMALLEST(attr, n, into), a convenience function that will transform the attribute by collapsing the smallest n categories into one, naming it according to the into argument. Note that into must be of the same type (numeric, character, etc.) as the vertex attribute in question. This is compatible with using magrittr’s pipes for improved readability, i.e., attr %>% COLLAPSE_SMALLEST(n, into). This is illustrated in the next section.

Then, taking faux.mesa.high dataset’s actor attribute Grade, representing the grade of the student, we can evaluate, equivalently, the linear effect of grade on overall activity of an actor:

library(ergm)
data(faux.mesa.high)
summary(faux.mesa.high~nodecov("Grade")) # String
## nodecov.Grade 
##          3491
summary(faux.mesa.high~nodecov(~Grade)) # Formula
## nodecov.Grade 
##          3491
summary(faux.mesa.high~nodecov(function(nw) nw%v%"Grade")) # Function
## nodecov.nw%v%"Grade" 
##                 3491

Taking advantage of nodecov’s new ability to take matrix-valued arguments, we might also evaluate a polynomial effect of Grade:

summary(faux.mesa.high~nodecov(~cbind(Grade,Grade^2)))
## Warning: In term 'nodecov' in package 'ergm': Attribute specification
## '~cbind(Grade, Grade^2)' is a matrix with some column names set and others
## not; you may need to set them manually. See example(nodal_attributes) for
## more information.
## nodecov.cbind(Grade,Grade^2).1 nodecov.cbind(Grade,Grade^2).2 
##                           3491                          31123

Notice the Warning. This is because the way cbind() assigns column names, the name of the second column will be blank unless we set it directly, in which case it can be anything:

x <- 1:2
cbind(x,x^2)
##      x  
## [1,] 1 1
## [2,] 2 4
cbind(x,x2=x^2)
##      x x2
## [1,] 1  1
## [2,] 2  4
cbind(x,`x^2`=x^2) # Backticks for arbitrary names.
##      x x^2
## [1,] 1   1
## [2,] 2   4

As the warning suggested, we can ensure that all columns have names, in which case they are not replaced with numbers:

summary(faux.mesa.high~nodecov(~cbind(Grade,Grade2=Grade^2)))
##  nodecov.Grade nodecov.Grade2 
##           3491          31123

General functions, such as stats::poly(), can also be used:

summary(faux.mesa.high~nodecov(~poly(Grade,2)))
## nodecov.poly(Grade,2).1 nodecov.poly(Grade,2).2 
##               -2.412818                4.974174

We can even pass a random nodal covariate. Notice that setting an attribute “name” gives it a label:

randomcov <- structure(I(rbinom(network.size(faux.mesa.high),1,0.5)), name="random")
summary(faux.mesa.high~nodefactor(I(randomcov)))
## nodefactor.random.1 
##                 197

Level selection

For categorical attributes, to select which levels are of interest and their ordering, use the argument . Selection of nodes (from the appropriate vector of nodal indices) is likewise handled as the selection of levels, using the argument . These arguments are interpreted as follows:

object created by

Use the given list of levels as is.

numeric or logical vector

Used for indexing of a list of all possible levels (typically, unique values of the attribute) in default order (typically lexicographic). In particular, levels=TRUE will retain all levels. Negative values exclude. Another special value is LARGEST, which will refer to the most frequent category, so, say, to set such a category as the baseline, pass levels=-LARGEST. In addition, LARGEST(n) will refer to the n largest categories. SMALLEST works analogously. Note that if there are ties in frequencies, they will be broken arbitrarily. To specify numeric or logical levels literally, wrap in I().

Retain all possible levels; usually equivalent to passing .

character vector

Use as is.

function

The function is called on the list of unique values of the attribute, the values of the attribute themselves, and the network itself, depending on its arity. Its return value is interpreted as above.

formula

The expression on the right hand side of the formula is evaluated in an environment in which the network itself is accessible as , the list of unique values of the attribute as or as , and the attribute vector itself as . Its return value is interpreted as above.

Note that or often has a default that is sensible for the term in question.

Returning to the faux.mesa.high example, and treating Grade as a categorical variable, we can use a number of combinations:

# Activity by grade with a baseline grade excluded:
summary(faux.mesa.high~nodefactor(~Grade))
##  nodefactor.Grade.8  nodefactor.Grade.9 nodefactor.Grade.10 
##                  75                  65                  36 
## nodefactor.Grade.11 nodefactor.Grade.12 
##                  49                  28
# Retain all levels:
summary(faux.mesa.high~nodefactor(~Grade, levels=TRUE)) # or levels=NULL
##  nodefactor.Grade.7  nodefactor.Grade.8  nodefactor.Grade.9 
##                 153                  75                  65 
## nodefactor.Grade.10 nodefactor.Grade.11 nodefactor.Grade.12 
##                  36                  49                  28
# Use the largest grade as baseline (also Grade 7):
table(faux.mesa.high %v% "Grade")
## 
##  7  8  9 10 11 12 
## 62 40 42 25 24 12
summary(faux.mesa.high~nodefactor(~Grade, levels=-LARGEST))
##  nodefactor.Grade.8  nodefactor.Grade.9 nodefactor.Grade.10 
##                  75                  65                  36 
## nodefactor.Grade.11 nodefactor.Grade.12 
##                  49                  28
# Collapse the smallest two grades (11 and 12) into a new category, 99.
library(magrittr) # For the %>% operator.
summary(faux.mesa.high~nodefactor((~Grade) %>% COLLAPSE_SMALLEST(2, 99)))
##  nodefactor.Grade.8  nodefactor.Grade.9 nodefactor.Grade.10 
##                  75                  65                  36 
## nodefactor.Grade.99 
##                  77
# Mixing between lower and upper grades:
summary(faux.mesa.high~mm(~Grade>=10))
## mm[Grade>=10=FALSE,Grade>=10=TRUE]  mm[Grade>=10=TRUE,Grade>=10=TRUE] 
##                                 27                                 43
# Mixing between grades 7 and 8 only:
summary(faux.mesa.high~mm("Grade", levels=I(c(7,8))))
## mm[Grade=7,Grade=8] mm[Grade=8,Grade=8] 
##                   0                  33
# or
summary(faux.mesa.high~mm("Grade", levels=1:2))
## mm[Grade=7,Grade=8] mm[Grade=8,Grade=8] 
##                   0                  33
# or using levels2 (see ? mm) to filter the combinations of levels,
summary(faux.mesa.high~mm("Grade",
        levels2=~sapply(.levels,
                        function(l)
                          l[[1]]%in%c(7,8) && l[[2]]%in%c(7,8))))
## mm[Grade=7,Grade=7] mm[Grade=7,Grade=8] mm[Grade=8,Grade=8] 
##                  75                   0                  33

Generally, levels2= selects from among the combinations of levels selected by levels=. Here are some examples, using the attribute Sex (which as two levels):

# Here is the full list of combinations of sexes in an undirected network:
summary(faux.mesa.high~mm("Sex", levels2=TRUE))
## mm[Sex=F,Sex=F] mm[Sex=F,Sex=M] mm[Sex=M,Sex=M] 
##              82              71              50
# Select only the second combination:
summary(faux.mesa.high~mm("Sex", levels2=2))
## mm[Sex=F,Sex=M] 
##              71
# Equivalently,
summary(faux.mesa.high~mm("Sex", levels2=-c(1,3)))
## mm[Sex=F,Sex=M] 
##              71
# or
summary(faux.mesa.high~mm("Sex", levels2=c(FALSE,TRUE,FALSE)))
## mm[Sex=F,Sex=M] 
##              71
# Select all *but* the second one:
summary(faux.mesa.high~mm("Sex", levels2=-2))
## mm[Sex=F,Sex=F] mm[Sex=M,Sex=M] 
##              82              50
# We can select via a mixing matrix: (Network is undirected and
# attributes are the same on both sides, so we can use either M or
# its transpose.)
(M <- matrix(c(FALSE,TRUE,FALSE,FALSE),2,2))
##       [,1]  [,2]
## [1,] FALSE FALSE
## [2,]  TRUE FALSE
summary(faux.mesa.high~mm("Sex", levels2=M)+mm("Sex", levels2=t(M)))
## mm[Sex=F,Sex=M] mm[Sex=F,Sex=M] 
##              71              71
# Select via an index of a cell:
idx <- cbind(1,2)
summary(faux.mesa.high~mm("Sex", levels2=idx))
## mm[Sex=F,Sex=M] 
##              71
# Or, select by specific attribute value combinations, though note the
# names 'row' and 'col' and the order for undirected networks:
summary(faux.mesa.high~mm("Sex",
                          levels2 = I(list(list(row="M",col="M"),
                                           list(row="M",col="F"),
                                           list(row="F",col="M")))))
## Warning: In term 'mm' in package 'ergm': Selected cells '[M,F]' are
## redundant (below the diagonal) in the mixing matrix and will have count 0.
## mm[Sex=M,Sex=M] mm[Sex=M,Sex=F] mm[Sex=F,Sex=M] 
##              50               0              71

The attributes of the mm() term can be a two-sided formula with different attributes:

summary(faux.mesa.high~mm(Grade~Race, levels2=TRUE))
##  mm[Grade=7,Race=Black]  mm[Grade=8,Race=Black]  mm[Grade=9,Race=Black] 
##                       1                       6                       5 
## mm[Grade=10,Race=Black] mm[Grade=11,Race=Black] mm[Grade=12,Race=Black] 
##                       4                       7                       3 
##   mm[Grade=7,Race=Hisp]   mm[Grade=8,Race=Hisp]   mm[Grade=9,Race=Hisp] 
##                      92                      15                      28 
##  mm[Grade=10,Race=Hisp]  mm[Grade=11,Race=Hisp]  mm[Grade=12,Race=Hisp] 
##                      10                      19                      14 
##  mm[Grade=7,Race=NatAm]  mm[Grade=8,Race=NatAm]  mm[Grade=9,Race=NatAm] 
##                      37                      53                      25 
## mm[Grade=10,Race=NatAm] mm[Grade=11,Race=NatAm] mm[Grade=12,Race=NatAm] 
##                      16                      15                      10 
##  mm[Grade=7,Race=Other]  mm[Grade=8,Race=Other]  mm[Grade=9,Race=Other] 
##                       0                       0                       1 
## mm[Grade=10,Race=Other] mm[Grade=11,Race=Other] mm[Grade=12,Race=Other] 
##                       0                       0                       0 
##  mm[Grade=7,Race=White]  mm[Grade=8,Race=White]  mm[Grade=9,Race=White] 
##                      23                       1                       6 
## mm[Grade=10,Race=White] mm[Grade=11,Race=White] mm[Grade=12,Race=White] 
##                       6                       8                       1
# It is possible to have collapsing functions in the formula; note
# the parentheses around "~Race": this is because a formula
# operator (~) has lower precedence than pipe (|>):
summary(faux.mesa.high~mm(Grade~(~Race) %>% COLLAPSE_SMALLEST(3,"BWO"), levels2=TRUE))
##    mm[Grade=7,Race=BWO]    mm[Grade=8,Race=BWO]    mm[Grade=9,Race=BWO] 
##                      24                       7                      12 
##   mm[Grade=10,Race=BWO]   mm[Grade=11,Race=BWO]   mm[Grade=12,Race=BWO] 
##                      10                      15                       4 
##   mm[Grade=7,Race=Hisp]   mm[Grade=8,Race=Hisp]   mm[Grade=9,Race=Hisp] 
##                      92                      15                      28 
##  mm[Grade=10,Race=Hisp]  mm[Grade=11,Race=Hisp]  mm[Grade=12,Race=Hisp] 
##                      10                      19                      14 
##  mm[Grade=7,Race=NatAm]  mm[Grade=8,Race=NatAm]  mm[Grade=9,Race=NatAm] 
##                      37                      53                      25 
## mm[Grade=10,Race=NatAm] mm[Grade=11,Race=NatAm] mm[Grade=12,Race=NatAm] 
##                      16                      15                      10