1 Introduction

This is a notebook for hands-on sessions of the Social Network Analysis course for Ph.D. students of the NASP doctoral school, University of Milan (Italy).
Analysed data are taken from @bianchi-coworking

2 Collaboration

2.1 Loading data

Now, we need to import data on collaboration relationships.
In order to let R find the data, we need to move the data file to the same directory where our Rstudio project is located (i.e., the working directory)

If you don’t remember what working directory you’re in:

getwd()

## [1] "/Users/federico/Desktop/lab"

Let’s inspect the data structure: we could either open the file outside Rstudio or even from within
It’s a .txt file. What do you see?
It’s an adjacency matrix (see slides)
Values are 0 and 1 and are separated by a tab (\t)
In order to load it, we need a function to read a table data structure.
read.table is classic R, while readr::read_delim is faster. Let’s load the readr package, included in the tidyverse.

# library(readr)
library(tidyverse)

Now we can inspect the read_delim function in the console, by hitting ?read_delim

?read_delim

What information do we need? * the file name: collaboration.txt * the column separator: "\t" * whether we have column names: no

Where is my data file?

dir()

## [1] "collaboration.txt" "data"              "lab_files"        
## [4] "lab.html"          "lab.nb.html"       "lab.Rmd"          
## [7] "lab.Rproj"         "support.csv"

In order to maximise reproducibility, we need the path to be available to all possible colleagues. We can use the here::here function, which builds a new path regardless of the original path. Let’s load the here package.

library(here)

## here() starts at /Users/federico/Desktop/lab

The function here() finds my working directory and builds a plausible path towards the specified arguments.

here("data", "collaboration.txt")

## [1] "/Users/federico/Desktop/lab/data/collaboration.txt"

Now we’re ready to load collaboration data as a new object. We can use read_tsv, which takes "\t" as a default value of the delim parameter

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE)

What kind of object have we stored our data in?

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  class()

## Rows: 29 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (29): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

It’s a tibble, the tidyverse version of a data.frame.

Let’s inspect our data

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  str()

## Rows: 29 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (29): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## spc_tbl_ [29 × 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ X1 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X2 : num [1:29] 0 0 0 1 0 0 1 1 1 1 ...
##  $ X3 : num [1:29] 0 0 0 0 0 1 0 1 1 1 ...
##  $ X4 : num [1:29] 0 1 0 0 0 0 0 0 1 1 ...
##  $ X5 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X6 : num [1:29] 0 0 1 0 0 0 0 0 1 1 ...
##  $ X7 : num [1:29] 0 1 0 0 0 0 0 0 0 0 ...
##  $ X8 : num [1:29] 0 1 1 0 0 0 0 0 0 1 ...
##  $ X9 : num [1:29] 0 1 1 1 0 1 0 0 0 0 ...
##  $ X10: num [1:29] 0 1 1 1 0 1 0 1 0 0 ...
##  $ X11: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X12: num [1:29] 1 0 0 0 0 0 0 0 0 0 ...
##  $ X13: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X14: num [1:29] 0 1 1 1 0 0 0 0 1 0 ...
##  $ X15: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X16: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X17: num [1:29] 0 0 0 0 0 1 0 0 0 0 ...
##  $ X18: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X19: num [1:29] 0 0 1 0 0 1 0 1 1 1 ...
##  $ X20: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X21: num [1:29] 0 0 0 0 0 0 1 1 0 1 ...
##  $ X22: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X23: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X24: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X25: num [1:29] 0 0 1 0 0 0 0 0 0 0 ...
##  $ X26: num [1:29] 0 0 1 0 0 0 0 0 0 1 ...
##  $ X27: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X28: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X29: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   X2 = col_double(),
##   ..   X3 = col_double(),
##   ..   X4 = col_double(),
##   ..   X5 = col_double(),
##   ..   X6 = col_double(),
##   ..   X7 = col_double(),
##   ..   X8 = col_double(),
##   ..   X9 = col_double(),
##   ..   X10 = col_double(),
##   ..   X11 = col_double(),
##   ..   X12 = col_double(),
##   ..   X13 = col_double(),
##   ..   X14 = col_double(),
##   ..   X15 = col_double(),
##   ..   X16 = col_double(),
##   ..   X17 = col_double(),
##   ..   X18 = col_double(),
##   ..   X19 = col_double(),
##   ..   X20 = col_double(),
##   ..   X21 = col_double(),
##   ..   X22 = col_double(),
##   ..   X23 = col_double(),
##   ..   X24 = col_double(),
##   ..   X25 = col_double(),
##   ..   X26 = col_double(),
##   ..   X27 = col_double(),
##   ..   X28 = col_double(),
##   ..   X29 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

It’s interpreted as a case-by-variable matrix, with 29 variables.
Instead, what we need is an adjacency matrix.

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix()

## Rows: 29 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (29): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

##       X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
##  [1,]  0  0  0  0  0  0  0  0  0   0   0   1   0   0   0   0   0   0   0   0
##  [2,]  0  0  0  1  0  0  1  1  1   1   0   0   1   1   1   1   0   0   0   0
##  [3,]  0  0  0  0  0  1  0  1  1   1   0   0   0   1   0   0   0   0   1   1
##  [4,]  0  1  0  0  0  0  0  0  1   1   0   0   0   1   0   0   0   0   0   0
##  [5,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [6,]  0  0  1  0  0  0  0  0  1   1   0   0   1   0   1   1   1   0   1   1
##  [7,]  0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [8,]  0  1  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
##  [9,]  0  1  1  1  0  1  0  0  0   0   0   0   0   1   0   0   0   0   1   1
## [10,]  0  1  1  1  0  1  0  1  0   0   0   0   1   0   1   1   0   0   1   1
## [11,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [12,]  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [13,]  0  1  0  0  0  1  0  0  0   1   0   0   0   0   1   1   0   0   0   1
## [14,]  0  1  1  1  0  0  0  0  1   0   0   0   0   0   0   0   0   0   0   0
## [15,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   0   1   0   0   0   1
## [16,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   1   0   0   0   0   1
## [17,]  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   1
## [18,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [19,]  0  0  1  0  0  1  0  1  1   1   0   0   0   0   0   0   0   0   0   1
## [20,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   1   1   0   1   0
## [21,]  0  0  0  0  0  0  1  1  0   1   0   0   0   0   0   0   0   0   0   0
## [22,]  0  0  0  0  0  0  0  0  0   0   0   0   1   0   1   1   0   0   0   0
## [23,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   0   1   0   1   1
## [24,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   1   0
## [25,]  0  0  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   1   1   0
## [26,]  0  0  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
## [27,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   1
## [28,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   0
## [29,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##       X21 X22 X23 X24 X25 X26 X27 X28 X29
##  [1,]   0   0   0   0   0   0   0   0   0
##  [2,]   0   0   0   0   0   0   0   0   0
##  [3,]   0   0   1   0   1   1   1   1   0
##  [4,]   0   0   0   0   0   0   0   0   0
##  [5,]   0   0   0   0   0   0   0   0   0
##  [6,]   0   0   1   0   0   0   0   0   0
##  [7,]   1   0   0   0   0   0   0   0   0
##  [8,]   1   0   0   0   0   0   1   1   0
##  [9,]   0   0   1   0   0   0   0   0   0
## [10,]   1   0   1   0   0   1   1   1   0
## [11,]   0   0   0   0   0   0   0   0   0
## [12,]   0   0   0   0   0   0   0   0   0
## [13,]   0   1   1   0   0   0   0   0   0
## [14,]   0   0   0   0   0   0   0   0   0
## [15,]   0   1   1   0   0   0   0   0   0
## [16,]   0   1   0   0   0   0   0   0   0
## [17,]   0   0   1   0   0   0   0   0   0
## [18,]   0   0   0   0   1   0   0   0   0
## [19,]   0   0   1   1   1   1   1   1   0
## [20,]   0   0   1   0   0   0   1   0   0
## [21,]   0   0   0   0   0   0   0   0   0
## [22,]   0   0   0   0   0   0   0   0   0
## [23,]   0   0   0   0   0   0   1   0   0
## [24,]   0   0   0   0   0   1   0   0   0
## [25,]   0   0   0   0   0   1   1   0   0
## [26,]   0   0   0   1   1   0   1   0   0
## [27,]   0   0   1   0   1   1   0   0   0
## [28,]   0   0   0   0   0   0   0   0   0
## [29,]   0   0   0   0   0   0   0   0   0

Now our data are interpreted by R as a matrix.

(collab_data <- 
  here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix())

## Rows: 29 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (29): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

##       X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
##  [1,]  0  0  0  0  0  0  0  0  0   0   0   1   0   0   0   0   0   0   0   0
##  [2,]  0  0  0  1  0  0  1  1  1   1   0   0   1   1   1   1   0   0   0   0
##  [3,]  0  0  0  0  0  1  0  1  1   1   0   0   0   1   0   0   0   0   1   1
##  [4,]  0  1  0  0  0  0  0  0  1   1   0   0   0   1   0   0   0   0   0   0
##  [5,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [6,]  0  0  1  0  0  0  0  0  1   1   0   0   1   0   1   1   1   0   1   1
##  [7,]  0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [8,]  0  1  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
##  [9,]  0  1  1  1  0  1  0  0  0   0   0   0   0   1   0   0   0   0   1   1
## [10,]  0  1  1  1  0  1  0  1  0   0   0   0   1   0   1   1   0   0   1   1
## [11,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [12,]  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [13,]  0  1  0  0  0  1  0  0  0   1   0   0   0   0   1   1   0   0   0   1
## [14,]  0  1  1  1  0  0  0  0  1   0   0   0   0   0   0   0   0   0   0   0
## [15,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   0   1   0   0   0   1
## [16,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   1   0   0   0   0   1
## [17,]  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   1
## [18,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [19,]  0  0  1  0  0  1  0  1  1   1   0   0   0   0   0   0   0   0   0   1
## [20,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   1   1   0   1   0
## [21,]  0  0  0  0  0  0  1  1  0   1   0   0   0   0   0   0   0   0   0   0
## [22,]  0  0  0  0  0  0  0  0  0   0   0   0   1   0   1   1   0   0   0   0
## [23,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   0   1   0   1   1
## [24,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   1   0
## [25,]  0  0  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   1   1   0
## [26,]  0  0  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
## [27,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   1
## [28,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   0
## [29,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##       X21 X22 X23 X24 X25 X26 X27 X28 X29
##  [1,]   0   0   0   0   0   0   0   0   0
##  [2,]   0   0   0   0   0   0   0   0   0
##  [3,]   0   0   1   0   1   1   1   1   0
##  [4,]   0   0   0   0   0   0   0   0   0
##  [5,]   0   0   0   0   0   0   0   0   0
##  [6,]   0   0   1   0   0   0   0   0   0
##  [7,]   1   0   0   0   0   0   0   0   0
##  [8,]   1   0   0   0   0   0   1   1   0
##  [9,]   0   0   1   0   0   0   0   0   0
## [10,]   1   0   1   0   0   1   1   1   0
## [11,]   0   0   0   0   0   0   0   0   0
## [12,]   0   0   0   0   0   0   0   0   0
## [13,]   0   1   1   0   0   0   0   0   0
## [14,]   0   0   0   0   0   0   0   0   0
## [15,]   0   1   1   0   0   0   0   0   0
## [16,]   0   1   0   0   0   0   0   0   0
## [17,]   0   0   1   0   0   0   0   0   0
## [18,]   0   0   0   0   1   0   0   0   0
## [19,]   0   0   1   1   1   1   1   1   0
## [20,]   0   0   1   0   0   0   1   0   0
## [21,]   0   0   0   0   0   0   0   0   0
## [22,]   0   0   0   0   0   0   0   0   0
## [23,]   0   0   0   0   0   0   1   0   0
## [24,]   0   0   0   0   0   1   0   0   0
## [25,]   0   0   0   0   0   1   1   0   0
## [26,]   0   0   0   1   1   0   1   0   0
## [27,]   0   0   1   0   1   1   0   0   0
## [28,]   0   0   0   0   0   0   0   0   0
## [29,]   0   0   0   0   0   0   0   0   0

2.2 From data to graphs

In order to apply some SNA functions, we need to use the igraph package, which handles our data as igraph objects, to which its functions can be applied

library(igraph)

## 
## Caricamento pacchetto: 'igraph'

## I seguenti oggetti sono mascherati da 'package:lubridate':
## 
##     %--%, union

## I seguenti oggetti sono mascherati da 'package:dplyr':
## 
##     as_data_frame, groups, union

## I seguenti oggetti sono mascherati da 'package:purrr':
## 
##     compose, simplify

## Il seguente oggetto è mascherato da 'package:tidyr':
## 
##     crossing

## Il seguente oggetto è mascherato da 'package:tibble':
## 
##     as_data_frame

## I seguenti oggetti sono mascherati da 'package:stats':
## 
##     decompose, spectrum

## Il seguente oggetto è mascherato da 'package:base':
## 
##     union

We have to turn our data into an igraph object.
There’s a series of primitives starting with graph_from that can be useful for us
We need to specify that the matrix should be interpreted as undirected graph

(collab_net <- graph_from_adjacency_matrix(collab_data, mode = "undirected"))

## Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
## with mode = "undirected" as of igraph 1.6.0.
## ℹ Use mode = "max" to achieve the original behavior.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## IGRAPH c98a2b1 UN-- 29 82 -- 
## + attr: name (v/c)
## + edges from c98a2b1 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## + ... omitted several edges

The summary shows us:

that the network is undirected and “named” (UN)
29 nodes
82 links
the list of the edges

An igraph object is a complex R object made of lists

str(collab_net)

## Class 'igraph'  hidden list of 10
##  $ : num 29
##  $ : logi FALSE
##  $ : num [1:82] 11 3 6 7 8 9 12 13 14 15 ...
##  $ : num [1:82] 0 1 1 1 1 1 1 1 1 1 ...
##  $ : NULL
##  $ : NULL
##  $ : NULL
##  $ : NULL
##  $ :List of 4
##   ..$ : num [1:3] 1 0 1
##   ..$ : Named list()
##   ..$ :List of 1
##   .. ..$ name: chr [1:29] "X1" "X2" "X3" "X4" ...
##   ..$ : Named list()
##  $ :<environment: 0x1347e2888>

Network data can be managed more efficiently with the tidygraph package, which allows us to apply the tidyverse logic on igraph objects

library(tidygraph)

## 
## Caricamento pacchetto: 'tidygraph'

## Il seguente oggetto è mascherato da 'package:igraph':
## 
##     groups

## Il seguente oggetto è mascherato da 'package:stats':
## 
##     filter

We can make a tbl_graph object, structude into two tables: one for node attributes and one for the edge list.

as_tbl_graph(collab_net)

## # A tbl_graph: 29 nodes and 82 edges
## #
## # An undirected simple graph with 5 components
## #
## # Node Data: 29 × 1 (active)
##    name 
##    <chr>
##  1 X1   
##  2 X2   
##  3 X3   
##  4 X4   
##  5 X5   
##  6 X6   
##  7 X7   
##  8 X8   
##  9 X9   
## 10 X10  
## # ℹ 19 more rows
## #
## # Edge Data: 82 × 2
##    from    to
##   <int> <int>
## 1     1    12
## 2     2     4
## 3     2     7
## # ℹ 79 more rows

Let’s use the table of node attributes to change the node names

(collab_net <- 
  collab_net %>% 
  as_tbl_graph() %>% 
  activate(nodes) %>% 
  mutate(name = as.character(1:vcount(collab_net))))

## # A tbl_graph: 29 nodes and 82 edges
## #
## # An undirected simple graph with 5 components
## #
## # Node Data: 29 × 1 (active)
##    name 
##    <chr>
##  1 1    
##  2 2    
##  3 3    
##  4 4    
##  5 5    
##  6 6    
##  7 7    
##  8 8    
##  9 9    
## 10 10   
## # ℹ 19 more rows
## #
## # Edge Data: 82 × 2
##    from    to
##   <int> <int>
## 1     1    12
## 2     2     4
## 3     2     7
## # ℹ 79 more rows

Let’s inspect subsets of our network using the edge list

collab_net %>% 
  activate(edges) %>% 
  filter(to != 10 & from != 10)

## # A tbl_graph: 29 nodes and 67 edges
## #
## # An undirected simple graph with 6 components
## #
## # Edge Data: 67 × 2 (active)
##     from    to
##    <int> <int>
##  1     1    12
##  2     2     4
##  3     2     7
##  4     2     8
##  5     2     9
##  6     2    13
##  7     2    14
##  8     2    15
##  9     2    16
## 10     3     6
## # ℹ 57 more rows
## #
## # Node Data: 29 × 1
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## # ℹ 26 more rows

2.3 Visualization

We can use base R visualization but it’s not very efficient

plot(collab_net)

More efficient and elegant visualization can be made with the ggraph package, which allows us to use ggplot2 on igraph objects

library(ggraph)

Here’s a fist attempt with Fruchterman-Rheingold layout

collab_plot <- collab_net %>% 
  ggraph(layout = "fr") +
  geom_edge_link(
    colour = "grey30",
    width = .5,
    alpha = .5
  ) +
  geom_node_point(
    size = 2,
    color = "red"
  ) +
  geom_node_text(aes(label = name))

2.4 Connectivity

A basic graph-level network property we could be interested in is how connected a network is

Let’s start from the number of edges

ecount(collab_net)

## [1] 82

We can also retrieve the whole list of edges

E(collab_net)

## + 82/82 edges from c98a2b1 (vertex names):
##  [1] 1 --12 2 --4  2 --7  2 --8  2 --9  2 --10 2 --13 2 --14 2 --15 2 --16
## [11] 3 --6  3 --8  3 --9  3 --10 3 --14 3 --19 3 --20 3 --23 3 --25 3 --26
## [21] 3 --27 3 --28 4 --9  4 --10 4 --14 6 --9  6 --10 6 --13 6 --15 6 --16
## [31] 6 --17 6 --19 6 --20 6 --23 7 --21 8 --10 8 --19 8 --21 8 --27 8 --28
## [41] 9 --14 9 --19 9 --20 9 --23 10--13 10--15 10--16 10--19 10--20 10--21
## [51] 10--23 10--26 10--27 10--28 13--15 13--16 13--20 13--22 13--23 15--16
## [61] 15--20 15--22 15--23 16--20 16--22 17--20 17--23 18--25 19--20 19--23
## [71] 19--24 19--25 19--26 19--27 19--28 20--23 20--27 23--27 24--26 25--26
## [81] 25--27 26--27

The number of edges depends on the number of vertices

vcount(collab_net)

## [1] 29

We can also retrieve the list of vertices

V(collab_net)

## + 29/29 vertices, named, from c98a2b1:
##  [1] 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29

2.4.1 Density

In order to infer the level of connectivity from the number of edges, we need to compare it to the theoretically maximum value of possible links
We combine each vertex with all other possible vertices
Since our network is undirected, we divide by 2, so that we don’t count pairs twice
We’ll store the calculation output in a new object because we’re going to use it later again and we want to stay D.R.Y.

(max_links <- vcount(collab_net) * (vcount(collab_net) - 1) / 2)

## [1] 406

Now, we can calculate density as the ratio between the number of actual links and the number of possible links.

(density <- ecount(collab_net) / max_links)

## [1] 0.2019704

Actually, we don’t need to perform this calculation every time. There is a specific function in igraph

edge_density(collab_net)

## [1] 0.2019704

3 Support

3.1 Loading data

Data on support relationships are stored with a different structure
It’s an edgelist (see slides)
We still need to assign data to a matrix object (not an adjacency matrix, though)

(sup_dat <- 
  here("data", "support.csv") %>% 
  read_delim(delim = " ", col_names = FALSE) %>% 
  as.matrix())

## Rows: 99 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: " "
## chr (2): X1, X2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

##        X1    X2   
##   [1,] "V1"  "V29"
##   [2,] "V2"  "V4" 
##   [3,] "V2"  "V14"
##   [4,] "V2"  "V15"
##   [5,] "V2"  "V28"
##   [6,] "V3"  "V10"
##   [7,] "V3"  "V13"
##   [8,] "V3"  "V15"
##   [9,] "V3"  "V16"
##  [10,] "V3"  "V25"
##  [11,] "V3"  "V26"
##  [12,] "V3"  "V27"
##  [13,] "V4"  "V2" 
##  [14,] "V5"  "V24"
##  [15,] "V6"  "V12"
##  [16,] "V6"  "V13"
##  [17,] "V6"  "V15"
##  [18,] "V6"  "V16"
##  [19,] "V6"  "V17"
##  [20,] "V6"  "V20"
##  [21,] "V6"  "V23"
##  [22,] "V7"  "V1" 
##  [23,] "V7"  "V12"
##  [24,] "V8"  "V2" 
##  [25,] "V8"  "V4" 
##  [26,] "V8"  "V10"
##  [27,] "V8"  "V13"
##  [28,] "V8"  "V15"
##  [29,] "V8"  "V25"
##  [30,] "V8"  "V27"
##  [31,] "V8"  "V28"
##  [32,] "V9"  "V2" 
##  [33,] "V9"  "V14"
##  [34,] "V10" "V3" 
##  [35,] "V10" "V13"
##  [36,] "V10" "V15"
##  [37,] "V10" "V19"
##  [38,] "V10" "V27"
##  [39,] "V12" "V1" 
##  [40,] "V12" "V2" 
##  [41,] "V12" "V10"
##  [42,] "V12" "V20"
##  [43,] "V13" "V16"
##  [44,] "V14" "V2" 
##  [45,] "V14" "V8" 
##  [46,] "V14" "V9" 
##  [47,] "V14" "V25"
##  [48,] "V15" "V6" 
##  [49,] "V15" "V10"
##  [50,] "V15" "V13"
##  [51,] "V15" "V16"
##  [52,] "V15" "V22"
##  [53,] "V16" "V13"
##  [54,] "V16" "V15"
##  [55,] "V16" "V22"
##  [56,] "V17" "V6" 
##  [57,] "V17" "V20"
##  [58,] "V18" "V25"
##  [59,] "V19" "V3" 
##  [60,] "V19" "V25"
##  [61,] "V19" "V27"
##  [62,] "V20" "V6" 
##  [63,] "V20" "V12"
##  [64,] "V20" "V17"
##  [65,] "V20" "V19"
##  [66,] "V20" "V23"
##  [67,] "V21" "V6" 
##  [68,] "V21" "V7" 
##  [69,] "V21" "V10"
##  [70,] "V21" "V29"
##  [71,] "V22" "V10"
##  [72,] "V22" "V12"
##  [73,] "V22" "V13"
##  [74,] "V22" "V15"
##  [75,] "V22" "V16"
##  [76,] "V23" "V6" 
##  [77,] "V23" "V17"
##  [78,] "V23" "V20"
##  [79,] "V24" "V5" 
##  [80,] "V24" "V12"
##  [81,] "V25" "V1" 
##  [82,] "V25" "V19"
##  [83,] "V25" "V27"
##  [84,] "V26" "V3" 
##  [85,] "V26" "V10"
##  [86,] "V26" "V19"
##  [87,] "V26" "V24"
##  [88,] "V26" "V25"
##  [89,] "V26" "V27"
##  [90,] "V27" "V19"
##  [91,] "V27" "V25"
##  [92,] "V28" "V2" 
##  [93,] "V28" "V4" 
##  [94,] "V28" "V8" 
##  [95,] "V29" "V1" 
##  [96,] "V29" "V4" 
##  [97,] "V29" "V9" 
##  [98,] "V29" "V20"
##  [99,] "V29" "V21"

3.2 From data to graph

(support <- graph_from_edgelist(sup_dat))

## IGRAPH 21f724c DN-- 28 99 -- 
## + attr: name (v/c)
## + edges from 21f724c (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges

Why are nodes 28? We were expecting 29
One of the nodes might be an isolate
It’s always good practice to store the list of edges and the list of nodes in two different objects, then integrate them in a graph object
The nodes are coded in the edge list file as “V1”, “V2”, …, “V29”
So, we’ll generate a character vector by combining “V” with numbers between 1-29

sup_edges <- sup_dat
sup_nodes <- paste0("V", 1:29)

Now we have a vector of nodes (node list) and a matrix of edges (edge list). We can then generate a graph object in a safer way through graph_from_data_frame, which takes a node list (vector) and and edge list (matrix) as inputs.

(sup_net <- graph_from_data_frame(d = sup_edges, vertices = sup_nodes))

## IGRAPH 9af0edf DN-- 29 99 -- 
## + attr: name (v/c)
## + edges from 9af0edf (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges

3.3 Connectivity

Calculating density:

edge_density(sup_net)

## [1] 0.1219212

4 Node-level connectivity

4.1 Degree (undirected)

Degree of a graph vertex is its number of edges (= size of its personal network)

Let’s look at the degree in collaboration of our nodes

igraph::degree(collab_net)

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  1  9 12  4  0 10  2  7  8 15  0  1  8  4  8  7  3  1 12 11  3  3 10  2  5  6 
## 27 28 29 
##  8  4  0

mean(igraph::degree(collab_net))

## [1] 5.655172

sd(igraph::degree(collab_net))

## [1] 4.177113

This is the degree distribution
It’s a numeric vector (each value comes with a label), so various operations can be applied on it, e.g.

sort(igraph::degree(collab_net), decreasing = TRUE)

## 10  3 19 20  6 23  2  9 13 15 27  8 16 26 25  4 14 28 17 21 22  7 24  1 12 18 
## 15 12 12 11 10 10  9  8  8  8  8  7  7  6  5  4  4  4  3  3  3  2  2  1  1  1 
##  5 11 29 
##  0  0  0

If we are interested in the relative degree:

degree_distribution(collab_net)

##  [1] 0.10344828 0.10344828 0.06896552 0.10344828 0.10344828 0.03448276
##  [7] 0.03448276 0.06896552 0.13793103 0.03448276 0.06896552 0.03448276
## [13] 0.06896552 0.00000000 0.00000000 0.03448276

It’s easier to visualize it by using basic R plotting functions

collab_net %>% 
  igraph::degree() %>% 
  hist(
    col = "firebrick",
    xlab = "Degree",
    ylab = "Frequency",
    main = "Professional collaboration",
    breaks = 0:max(.)
  )

## In- and out-degree (directed)

If the network is directed: * in-degree (popularity): number of arcs directed to a vertex * out-degree (activity): number of arcs directed away from a vertex

Let’s look at the support network

sup_net %>% 
  igraph::degree(mode = "in") %>% 
  hist(
    col = "red",
    xlab = "In-degree",
    ylab = "Frequency",
    main = "Support",
    breaks = 0:max(.)
  )

Out-degree

sup_net %>% 
  igraph::degree(mode = "out") %>% 
  hist(
    col = "red",
    xlab = "In-degree",
    ylab = "Frequency",
    main = "Support",
    breaks = 0:max(.)
  )

4.2 Eigenvector centrality

Example data

(net1 <- read_rds(here("data", "cent_example_1.rds")))

## This graph was created by an old(er) igraph version.
## ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
## For now we convert it on the fly...

## IGRAPH e0b5591 U--- 11 17 -- 
## + attr: cent (v/n)
## + edges from e0b5591:
##  [1] 1--11 2-- 4 3-- 5 3--11 4-- 8 5-- 9 5--11 6-- 7 6-- 8 6--10 6--11 7-- 9
## [13] 7--10 7--11 8-- 9 8--10 9--10

If we plot the network by visualizing degree centrality, who is the most central node?

V(net1)$names <- 1:vcount(net1)

my_plot <- 
  net1 %>% 
  ggraph(layout = "auto") +
  geom_edge_link(
    color = "grey",
    width = .5
  ) +
  geom_node_text(aes(label = V(net1)$names))

## Using "stress" as default layout

my_plot +
  geom_node_point(
    size = igraph::degree(net1),
    color = "red",
    alpha = .3
  )

Who would you choose for a policy intervention to let something diffuse more rapidly?

We need to calculate eigenvector centrality (see slides)

eigen_centrality(net1)

## $vector
##  [1] 0.2259630 0.0645825 0.3786244 0.2415182 0.5709057 0.9846544 1.0000000
##  [8] 0.8386195 0.9113529 0.9986474 0.8450304
## 
## $value
## [1] 3.739685
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 11
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 3000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 4
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 19
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 13

Retrieving the vector only (scaled values)

eigen_centrality(net1)$vector

##  [1] 0.2259630 0.0645825 0.3786244 0.2415182 0.5709057 0.9846544 1.0000000
##  [8] 0.8386195 0.9113529 0.9986474 0.8450304

my_plot +
  geom_node_point(
    aes(
      size = eigen_centrality(net1)$vector,
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Compare degree centrality and eigenvector centrality

tibble(
  node = 1:vcount(net1),
  degree_cent = igraph::degree(net1),
  eigen_cent = eigen_centrality(net1)$vector
) %>%
  arrange(desc(degree_cent))

4.3 Betweenness centrality

Calculate

(btw_net1 <- betweenness(net1, normalized = T))

##  [1] 0.00000000 0.00000000 0.00000000 0.20000000 0.08518519 0.21851852
##  [7] 0.05925926 0.36296296 0.16296296 0.02962963 0.32592593

Plot

my_plot +
  geom_node_point(
    aes(
      size = btw_net1,
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Compare degree centrality, eigenvector centrality, and betweenness centrality

tibble(
  node = 1:vcount(net1),
  degree_cent = igraph::degree(net1),
  eigen_cent = eigen_centrality(net1)$vector,
  btw_cent = btw_net1
) %>%
  arrange(desc(degree_cent))

4.4 Clustering

clustering is one of the most important processes in network evolution
the outcome of clustering is the formation of cohesive subgroups

How many clusters? Discuss

my_plot

Cliques (considers undirected networks with reciprocated ties only, otherwise a clique should be maximal reciprocation)
List

It doesn’t make much sense. At least 4

cliques(net1, min = 3)

## [[1]]
## + 3/11 vertices, from e0b5591:
## [1]  6  8 10
## 
## [[2]]
## + 3/11 vertices, from e0b5591:
## [1]  8  9 10
## 
## [[3]]
## + 3/11 vertices, from e0b5591:
## [1]  6  7 10
## 
## [[4]]
## + 3/11 vertices, from e0b5591:
## [1]  6  7 11
## 
## [[5]]
## + 3/11 vertices, from e0b5591:
## [1]  7  9 10
## 
## [[6]]
## + 3/11 vertices, from e0b5591:
## [1]  3  5 11

cliques(collab_net, min = 6)

## [[1]]
## + 6/29 vertices, named, from c98a2b1:
## [1] 3  10 19 20 23 27
## 
## [[2]]
## + 6/29 vertices, named, from c98a2b1:
## [1] 6  10 13 15 20 23
## 
## [[3]]
## + 6/29 vertices, named, from c98a2b1:
## [1] 3  6  9  19 20 23
## 
## [[4]]
## + 6/29 vertices, named, from c98a2b1:
## [1] 3  6  10 19 20 23
## 
## [[5]]
## + 6/29 vertices, named, from c98a2b1:
## [1] 6  10 13 15 16 20

It’s more useful to avoid a priori definitions
Bottom-up classification of social/political reality
- categories are concepts that we analysts (or lay people) apply to single objects by deduction
- however, they emerge from induction-based inferences by analogy (abstract away from idiosyncratic elements) (Harrison C. White)
- this is what community detection algorithms do: partitioning a graph into dense network regions loosely connected to each other
Several algorithms can do that: some of them aim to optimize the partition modularity, Q (i.e., the number of within-group edges compared to a random distribution of the groups)
The most popular one is the Girvan-Newman algorithm (Girvan & Newman, 2002), based on edge betweenness, i.e. whether an edge falls along a pair’s geodesics
1. set the max number of cohesive subsets required, k
2. find the edge with highest edge betweenness
3. delete it and count the number of components
4. if the number of components > k, stop, otherwise back to 2.
For directed networks, considers undirected ties (reciprocated ties)

(comm <- cluster_edge_betweenness(collab_net, directed = FALSE))

## IGRAPH clustering edge betweenness, groups: 16, mod: 0.058
## + groups:
##   $`1`
##   [1] "1"  "12"
##   
##   $`2`
##   [1] "2"
##   
##   $`3`
##    [1] "3"  "6"  "9"  "10" "13" "15" "16" "19" "20" "23" "27"
##   
##   $`4`
##   + ... omitted several groups/vertices

I can retrieve each node’s membership to a cluster

sort(membership(comm))

##  1 12  2  3  6  9 10 13 15 16 19 20 23 27  4 14  5  7 21  8 11 17 18 22 24 25 
##  1  1  2  3  3  3  3  3  3  3  3  3  3  3  4  4  5  6  6  7  8  9 10 11 12 13 
## 26 28 29 
## 14 15 16

modularity(comm)

## [1] 0.05844735

Fast & Greedy: doesn’t search the network so thoroughly, it partitions the network then re-shuffles until it reaches high modularity

cluster_fast_greedy(collab_net)

## IGRAPH clustering fast greedy, groups: 7, mod: 0.3
## + groups:
##   $`1`
##   [1] "2"  "4"  "7"  "9"  "14" "21"
##   
##   $`2`
##    [1] "3"  "8"  "10" "18" "19" "24" "25" "26" "27" "28"
##   
##   $`3`
##   [1] "6"  "13" "15" "16" "17" "20" "22" "23"
##   
##   $`4`
##   + ... omitted several groups/vertices

Directed networks: based on random walks

cluster_infomap(sup_net)

## IGRAPH clustering infomap, groups: 7, mod: 0.47
## + groups:
##   $`1`
##   [1] "V1"  "V7"  "V21" "V29"
##   
##   $`2`
##   [1] "V2"  "V4"  "V8"  "V9"  "V14" "V28"
##   
##   $`3`
##   [1] "V3"  "V18" "V19" "V25" "V26" "V27"
##   
##   $`4`
##   + ... omitted several groups/vertices

For larger undirected networks, consider Louvain or Leiden

cluster_louvain(collab_net)

## IGRAPH clustering multi level, groups: 7, mod: 0.3
## + groups:
##   $`1`
##   [1] "1"  "12"
##   
##   $`2`
##   [1] "2"  "4"  "7"  "9"  "14"
##   
##   $`3`
##    [1] "3"  "8"  "10" "18" "19" "21" "24" "25" "26" "27" "28"
##   
##   $`4`
##   + ... omitted several groups/vertices

plot

collab_net %>% 
  mutate(comp = group_components()) %>% 
  filter(comp == 1) %>% 
  mutate(subgroups = group_fast_greedy()) %>% 
  ggraph(layout = "fr") +
  geom_edge_fan(
    width = .5,
    color = "grey30"
  ) +
  geom_node_point(
    aes(color = factor(subgroups)),
    size = 5
  )

Load attributes

node_attributes <- 
  here("data", "attributes.csv") %>% 
  read_delim(delim = ";")

## New names:
## Rows: 29 Columns: 12
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: ";" chr
## (5): id, education, gender, family, children dbl (7): ...1, age, age3,
## seniority, seniority_rec, soc_capital, satisfaction
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

vertex_attr(collab_net) <- node_attributes

Plotting the network: cohesive subgroups, gender, seniority

collab_net %>% 
  mutate(comp = group_components()) %>% 
  filter(comp == 1) %>% 
  mutate(subgroups = group_fast_greedy()) %>% 
  ggraph(layout = "fr") +
  geom_edge_fan(
    width = .5,
    color = "grey30"
  ) +
  geom_node_point(
    aes(
      color = factor(subgroups),
      size = seniority,
      shape = gender
    )
  )

5 Example on ego-nets

we’re going to work on the Sociable ego-network data (slightly modified)
download files
three files (show them):
- egos’ attributes
- alters’ attributes (no ego-alter list is needed because ego-alter ties are implicit in the alters’ file)
- alter-alter edgelist with possible edge attributes

Node attributes

(ego_attr <- read_csv2(file = here("data", "ego-class.csv")))

## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

## Rows: 229 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (12): random, gender, marital, educ, occ.typ, tel.int, dis.phys.count.ca...
## dbl  (3): ego_id, age, mmse
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Alters’ attributes

(alter_attr <- read_csv(file = here("data", "alter-data.csv")))

## Rows: 2729 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): freq.rec, week.daily, alter.sex, alter.cores, role, role.rec, intgen
## dbl (2): ego_id, alter_id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Alter-alter ties

(alter_alter <- read_csv(here("data", "alter-ties.csv")))

## Rows: 8303 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): ego_id, altsource_id, alttarg_id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Research question: is network complexity, rather than size, beneficial for older adults’ cognitive abilities?
We need to compute graph-level measures (complexity and size)
We could code a loop that subsets alter_alter by ego_id and turn each ego-network into an igraph object

More efficient: creating an egor object and generating a list of igraph networks

library(egor)

(
whole_df <- 
  threefiles_to_egor(
    egos = ego_attr,
    alters.df = alter_attr,
    edges = alter_alter,
    ID.vars = list(
      ego = "ego_id",
      alter = "alter_id",
      source = "altsource_id",
      target = "alttarg_id"
    )
  )
)

## # EGO data (active): 229 × 15
##   .egoID random   age gender marital    educ  occ.typ tel.int dis.phys.count.cat
## *  <dbl> <chr>  <dbl> <chr>  <chr>      <chr> <chr>   <chr>   <chr>             
## 1      1 No        84 Male   Not marri… midd… 3. Ret… No      One               
## 2      2 No        81 Male   Married    prim… 1. Ele… No      One               
## 3      3 No        83 Male   Married    prim… 3. Ret… No      Multiple          
## 4      4 No        77 Female Married    prim… 1. Ele… No      One               
## 5      5 No        93 Male   Married    prim… 1. Ele… No      Multiple          
## # ℹ 224 more rows
## # ℹ 6 more variables: hcare.count.cat <chr>, adl.rec <chr>, gds.rec <chr>,
## #   demen <chr>, rsa <chr>, mmse <dbl>
## # ALTER data: 2,729 × 9
##   .altID .egoID freq.rec week.daily alter.sex alter.cores role  role.rec intgen
## *  <dbl>  <dbl> <chr>    <chr>      <chr>     <chr>       <chr> <chr>    <chr> 
## 1    101      1 yearly   No         Male      No          child imm.fam  Yes   
## 2    102      1 yearly   No         Male      No          child imm.fam  Yes   
## 3    103      1 yearly   No         Male      No          child imm.fam  Yes   
## # ℹ 2,726 more rows
## # AATIE data: 8,303 × 3
##   .egoID .srcID .tgtID
## *  <dbl>  <dbl>  <dbl>
## 1      1    101    102
## 2      1    101    103
## 3      1    101    104
## # ℹ 8,300 more rows

What kind of object is it?

Different parts can be inspected

whole_df[[3]]

Now we can easily convert all ego-networks into igraph objects

graph_list <- as_igraph(whole_df)

As a list, we can inspect each ego network

graph_list[[203]]

## IGRAPH be74ba1 UN-- 7 10 -- 
## + attr: .egoID (g/n), name (v/c), freq.rec (v/c), week.daily (v/c),
## | alter.sex (v/c), alter.cores (v/c), role (v/c), role.rec (v/c),
## | intgen (v/c)
## + edges from be74ba1 (vertex names):
##  [1] 20601--20602 20601--20603 20601--20604 20601--20605 20602--20603
##  [6] 20602--20604 20602--20605 20603--20604 20603--20605 20604--20605

Calculating size of ego-networks

whole_df[[1]] <- 
  whole_df[[1]] %>% 
  mutate(net_size = map_int(graph_list, vcount))

whole_df[[1]] %>% 
  summarise(
    mean_size = mean(net_size),
    sd_size = sd(net_size)
  )

Calculating number of cohesive subgroups in each ego-network through Girvan-Newman algorithm + integrating it into the egor object

whole_df[[1]] <- 
  whole_df[[1]] %>%
    mutate(
      clusters = graph_list %>%
        map_int(\(egonet) 
          cluster_edge_betweenness(egonet) %>%
            length()
        )
    )

count(whole_df[[1]], clusters)

Testing the hypothetical association between number of cohesive sub-groups and cognitive abilities on the Sociable dataset

whole_df[[1]] %>%
  ggplot(aes(mmse)) +
#  scale_fill_grey() +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(xintercept = median(whole_df[[1]]$mmse, na.rm = T), color = "black", linetype = "dashed") +
  theme_bw() +
  labs(x = "Cognitive functioning (MMSE)", y = "Frequency")

## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_bin()`).

whole_df[[1]] %>%
  ggplot(aes(clusters)) +
  geom_bar() + 
  geom_vline(xintercept = median(whole_df[[1]]$clusters, na.rm = TRUE), color = "red", linetype = "dashed") +
  scale_x_continuous(breaks = 0:max(whole_df[[1]]$clusters)) +
  labs(y = "") +
  xlab("# of cohesive subgroups")

Association of MMSE with number of cohesive subgroups

whole_df[[1]] %>% 
  ggplot(aes(clusters, mmse)) +
  geom_point() +
  geom_smooth() +
  geom_jitter() +
  scale_x_continuous(breaks = 0:16) +
  geom_hline(yintercept = 27, color = "red") +
  geom_hline(yintercept = median(whole_df[[1]]$mmse, na.rm = T), linetype = "dashed") +
  geom_vline(xintercept = median(whole_df[[1]]$clusters), linetype = "dashed") +
  theme_minimal()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

Final model adjusting for depression

m <- mmse ~ clusters + net_size + age + educ + gender
ols <- lm(m, data = whole_df[[1]])
sjPlot::tab_model(ols, vcov.fun = "HC", show.se = TRUE)

	mmse
Predictors	Estimates	std. Error	CI	p
(Intercept)	37.01	8.02	21.21 – 52.80	<0.001
clusters	0.64	0.20	0.24 – 1.04	0.002
net size	0.11	0.07	-0.03 – 0.25	0.116
age	-0.18	0.09	-0.36 – -0.01	0.042
educ [middle]	-2.40	1.34	-5.05 – 0.25	0.075
educ [primary]	-1.83	1.09	-3.98 – 0.32	0.094
educ [uni or higher]	1.08	1.19	-1.25 – 3.42	0.361
gender [Male]	0.02	0.86	-1.68 – 1.71	0.985
Observations	226
R² / R² adjusted	0.145 / 0.117

Visualize estimates

p <- jtools::plot_summs(
  ols,
  coefs = c(
    "cohesive subgroups" = "clusters",
    "network size" = "net_size"
    # ,
    # "age" = "age",
    # "gender: male" = "gendermale",
    # "education: lower secondary" = "education2",
    # "education: higher secondary" = "education3",
    # "education: tertiary" = "education4",
    # "assisted living facility" = "rsa1",
    # "GDS > 1" = "gds.2yes"
  ),
  colors = "black",
  scale = F,
  robust = "HC"
  # ,
  # inner_ci_level = .9
  # ,
  # colors = "rainbow"
)

## Registered S3 method overwritten by 'future':
##   method               from      
##   all.equal.connection parallelly

p + 
  theme_bw() +
  labs(x = "", y = "")

6 Statistical inference

6.1 Case study: is there solidarity in my coworking space?

We assume that support reciprocation is an obligation in case of high solidarity

For statistical inference we need a differet suite of packages

library(statnet)

## Caricamento del pacchetto richiesto: tergm

## Caricamento del pacchetto richiesto: ergm

## Caricamento del pacchetto richiesto: network

## 
## 'network' 1.19.0 (2024-12-08), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Caricamento pacchetto: 'network'

## I seguenti oggetti sono mascherati da 'package:igraph':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## 
## 'ergm' 4.8.1 (2025-01-20), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.

## Caricamento del pacchetto richiesto: networkDynamic

## 
## 'networkDynamic' 0.11.5 (2024-11-21), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Registered S3 method overwritten by 'tergm':
##   method                   from
##   simulate_formula.network ergm

## 
## 'tergm' 4.2.1 (2024-10-08), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Caricamento pacchetto: 'tergm'

## Il seguente oggetto è mascherato da 'package:ergm':
## 
##     snctrl

## Caricamento del pacchetto richiesto: ergm.count

## 
## 'ergm.count' 4.1.2 (2024-06-15), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Caricamento del pacchetto richiesto: sna

## Caricamento del pacchetto richiesto: statnet.common

## 
## Caricamento pacchetto: 'statnet.common'

## Il seguente oggetto è mascherato da 'package:ergm':
## 
##     snctrl

## I seguenti oggetti sono mascherati da 'package:base':
## 
##     attr, order

## sna: Tools for Social Network Analysis
## Version 2.8 created on 2024-09-07.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## 
## Caricamento pacchetto: 'sna'

## I seguenti oggetti sono mascherati da 'package:igraph':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census

## Caricamento del pacchetto richiesto: tsna

## 
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information

## unable to reach CRAN

Creating a network object

(support_net <- 
  network(
    x = as_adjacency_matrix(sup_net), 
    vertex.attr = node_attributes
  ))

##  Network attributes:
##   vertices = 29 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 99 
##     missing edges= 0 
##     non-missing edges= 99 
## 
##  Vertex attribute names: 
##     ...1 age age3 children education family gender id satisfaction seniority seniority_rec soc_capital vertex.names 
## 
## No edge attributes

Another way (provided the intergraph packages is installed):

vertex_attr(sup_net) <- node_attributes
(support_net <- intergraph::asNetwork(sup_net))

##  Network attributes:
##   vertices = 29 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 99 
##     missing edges= 0 
##     non-missing edges= 99 
## 
##  Vertex attribute names: 
##     ...1 age age3 children education family gender id satisfaction seniority seniority_rec soc_capital vertex.names 
## 
## No edge attributes

Inspecting attributes

support_net %v% "age"

##  [1] 40 41 25 28 28 31 39 28 31 28 33 52 31 27 31 31 32 31 34 40 28 31 24 27 31
## [26] 24 34 27 36

Different functions to get the usual descriptive metrics:

gden(support_net)

## [1] 0.1219212

Some of them are masked by igraph:

(indeg_sup <- sna::degree(support_net, gmode = "digraph", cmode = "indegree"))

##  [1] 4 6 3 4 1 5 1 2 2 7 0 5 7 2 7 5 3 0 5 5 1 2 2 2 7 1 6 2 2

Luckily, ggraph supports network objects:

support_net %>% 
  ggraph(layout = "nicely") +
    geom_edge_fan(
    arrow = arrow(length = unit(2, "mm")),
    end_cap = circle(3, "mm"),
    color = "grey30",
    width = .3
  ) +
  geom_node_point(
    aes(color = gender),
    size = indeg_sup,
    alpha = .5,
    show.legend = FALSE
  ) +
  theme_void()

Let’s find the traces left by reciprocation in our network data.

Let’s count the complete dyads (i.e., pairs of nodes with reciprocal ties). Warning: the number of ties in those dyads is twice that of complete dyads!

mutuality(support_net)

## Mut 
##  25

What’s the state of the other dyads in the network?

sna::dyad.census(support_net)

##      Mut Asym Null
## [1,]  25   49  332

812 dyads, 406 pairs of nodes (29 * 28 / 2), 99 dyads with at least one tie, of which 25 are symmetrical, which means 50 ties are in symmetric dyads + 49 which are not in symmetric dyads \(=\) 99 dyads with a tie

What should we compare theses numbers to?

fix number of nodes
assume a data-generating process

Let’s compare it with a random network with 29 nodes. We need to specify a probability for ties to occur (let’s pick density). We then generate a random graph according to an Erdös-Rényi model with \(n =\) 29 and \(p =\) 0.5.

random_net <- rgraph(n = 29, m = 1, tprob = .5)

We then count the complete dyads in the generated random graph and compare it to the empirical network

mutuality(random_net)

## Mut 
## 102

Instead of just one instance of a random graph model, we generate a distribution of random graphs according to an E-R model with \(p =\) 0.5.

random_net <- rgraph(n = 29, m = 1000, tprob = .5)

Let’s compare the observed number of complete dyads with the distribution of complete dyads according to the model

hist(mutuality(random_net), xlim = c(0, 200))
abline(v = mutuality(support_net), col = "blue")

Let’s test reciprocation in our observed network against a more realistic distribution: an E-R model with p equal to the raw probability of ties to occur in our netork (i.e., the density)

random_net <- rgraph(n = 29, m = 1000, tprob = gden(support_net))
hist(mutuality(random_net), xlim = c(0, 50))
abline(v = mutuality(support_net), col = "blue")

All these models assume independent observations, while observing \(x_{ij} \& x_{ji}\) is not independent on the observation of \(x_{ij}\) or \(x_{ji}\). We need a model that accounts for stochastic interdependencies between the data.

7 Exponential Random Graph Models

Specifying a baseline model

m0 <- support_net ~ edges
summary(m0)

## edges 
##    99

Fitting the model: MPLE is enough because the model doesn’t assume any interdependencies

fit0 <- ergm(m0)

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Evaluating log-likelihood at the estimate.

summary(fit0)

## Call:
## ergm(formula = m0)
## 
## Maximum Likelihood Results:
## 
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -1.9744     0.1073      0  -18.41   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  602.1  on 811  degrees of freedom
##  
## AIC: 604.1  BIC: 608.8  (Smaller is better. MC Std. Err. = 0)

We can compute the odds-ratio from the log-odds

coef(fit0)

##     edges 
## -1.974362

exp(coef(fit0)) / (1 + exp(coef(fit0)))

##     edges 
## 0.1219212

It’s equal to the density of the network.

Dyadic independence model

m1 <- update.formula(m0, . ~ . + mutual)
summary(m1)

##  edges mutual 
##     99     25

Fitting now requires MCMCMLE

fit1 <- ergm(m1)

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Warning: 'glpk' selected as the solver, but package 'Rglpk' is not available;
## falling back to 'lpSolveAPI'. This should be fine unless the sample size and/or
## the number of parameters is very big.

## 1

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0163.

## Convergence test p-value: 0.0036. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## 
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit1)

## Call:
## ergm(formula = m1)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges   -2.6059     0.1569      0 -16.605   <1e-04 ***
## mutual   2.6288     0.3523      0   7.462   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  550.2  on 810  degrees of freedom
##  
## AIC: 554.2  BIC: 563.6  (Smaller is better. MC Std. Err. = 0.76)

Markov model

m2 <- update.formula(m1, . ~ . + twopath + gwesp(decay = .5, fixed = T))

## alternativa
#m2 <- support_net ~ edges + mutual + gwesp(.5, fixed = T) + gwdsp(.5, fixed = T)

fit2 <- ergm(m2)

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 1.1058.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0099.
## Convergence test p-value: 0.2068. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0180.
## Convergence test p-value: 0.1926. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0306.
## Convergence test p-value: 0.0179. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0034.
## Convergence test p-value: 0.0043. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## 
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit2)

## Call:
## ergm(formula = m2)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -2.23342    0.34409      0  -6.491   <1e-04 ***
## mutual               2.00211    0.35443      0   5.649   <1e-04 ***
## twopath             -0.24523    0.05625      0  -4.359   <1e-04 ***
## gwesp.OTP.fixed.0.5  1.07669    0.15227      0   7.071   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  493.2  on 808  degrees of freedom
##  
## AIC: 501.2  BIC: 520  (Smaller is better. MC Std. Err. = 1.373)

Let’s model degree distribution

m3 <- update.formula(m2, . ~ . + gwidegree(.5, fixed = T) + gwodegree(.5, fixed = T))
fit3 <- ergm(m3)

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 1.0304.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0957.
## Convergence test p-value: 0.6142. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0135.
## Convergence test p-value: 0.1400. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0499.
## Convergence test p-value: 0.3290. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0269.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## 
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit3)

## Call:
## ergm(formula = m3)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -2.07979    0.44223      0  -4.703   <1e-04 ***
## mutual               2.14735    0.44296      0   4.848   <1e-04 ***
## twopath             -0.25111    0.05879      0  -4.271   <1e-04 ***
## gwesp.OTP.fixed.0.5  1.02806    0.17033      0   6.036   <1e-04 ***
## gwideg.fixed.0.5    -0.64257    0.65093      0  -0.987    0.324    
## gwodeg.fixed.0.5    -0.05031    0.77698      0  -0.065    0.948    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  494.1  on 806  degrees of freedom
##  
## AIC: 506.1  BIC: 534.3  (Smaller is better. MC Std. Err. = 0.5511)

Correlation between node-level attributes and our network

m4 <- update.formula(m2, . ~ . + absdiff("age") + nodeicov("age") + nodeocov("age"))
fit4 <- ergm(m4)

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.7926.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0737.
## Convergence test p-value: 0.4819. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1581.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1072.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0320.
## Convergence test p-value: 0.3373. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0204.
## Convergence test p-value: 0.0360. Not converged with 99% confidence; increasing sample size.
## Iteration 7 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0305.
## Convergence test p-value: 0.0016. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## 
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit4)

## Call:
## ergm(formula = m4)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -3.23295    0.73258      0  -4.413   <1e-04 ***
## mutual               2.12735    0.35956      0   5.916   <1e-04 ***
## twopath             -0.22760    0.05456      0  -4.172   <1e-04 ***
## gwesp.OTP.fixed.0.5  1.04272    0.15128      0   6.893   <1e-04 ***
## absdiff.age         -0.03129    0.01861      0  -1.681   0.0927 .  
## nodeicov.age         0.04393    0.01780      0   2.468   0.0136 *  
## nodeocov.age        -0.01038    0.01871      0  -0.555   0.5791    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  488.1  on 805  degrees of freedom
##  
## AIC: 502.1  BIC: 535  (Smaller is better. MC Std. Err. = 0.7257)

Multiplexity

m5 <- update.formula(m4, . ~ . + dyadcov(intergraph::asNetwork(collab_net)))
summary(ergm(m5))

## Starting maximum pseudolikelihood estimation (MPLE):

## Obtaining the responsible dyads.

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 1.1698.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1338.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0801.
## Convergence test p-value: 0.9820. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1487.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1207.
## Estimating equations are not within tolerance region.
## Iteration 6 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0914.
## Convergence test p-value: 0.5585. Not converged with 99% confidence; increasing sample size.
## Iteration 7 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1062.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 8 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0648.
## Convergence test p-value: 0.0235. Not converged with 99% confidence; increasing sample size.
## Iteration 9 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0606.
## Convergence test p-value: 0.0151. Not converged with 99% confidence; increasing sample size.
## Iteration 10 of at most 60:
## 1 Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0172.
## Convergence test p-value: 0.0020. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## 
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

## Call:
## ergm(formula = m5)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                                  Estimate Std. Error MCMC %
## edges                                            -4.87029    0.95494      0
## mutual                                            2.18463    0.61228      0
## twopath                                          -0.29619    0.05483      0
## gwesp.OTP.fixed.0.5                               0.87276    0.15730      0
## absdiff.age                                      -0.03253    0.02339      0
## nodeicov.age                                      0.06743    0.02097      0
## nodeocov.age                                      0.01545    0.02087      0
## dyadcov.intergraph::asNetwork(collab_net).mutual  3.48254    0.54774      0
## dyadcov.intergraph::asNetwork(collab_net).utri    2.25068    0.36573      0
## dyadcov.intergraph::asNetwork(collab_net).ltri    2.15149    0.37502      0
##                                                  z value Pr(>|z|)    
## edges                                             -5.100  < 1e-04 ***
## mutual                                             3.568  0.00036 ***
## twopath                                           -5.402  < 1e-04 ***
## gwesp.OTP.fixed.0.5                                5.548  < 1e-04 ***
## absdiff.age                                       -1.391  0.16435    
## nodeicov.age                                       3.216  0.00130 ** 
## nodeocov.age                                       0.740  0.45930    
## dyadcov.intergraph::asNetwork(collab_net).mutual   6.358  < 1e-04 ***
## dyadcov.intergraph::asNetwork(collab_net).utri     6.154  < 1e-04 ***
## dyadcov.intergraph::asNetwork(collab_net).ltri     5.737  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  393.6  on 802  degrees of freedom
##  
## AIC: 413.6  BIC: 460.6  (Smaller is better. MC Std. Err. = 0.753)

NASP-ESOL - Social Network Analysis

Hands-on sessions

Federico Bianchi