1 Introduction

This is a notebook for hands-on sessions of the Social Network Analysis course for Ph.D. students of the NASP doctoral school, University of Milan (Italy).
Analysed data are taken from

Bianchi, F., Casnici, N., & Squazzoni, F. (2018). Solidarity as a byproduct of professional collaboration. Social support and trust in a coworking space. Social Networks, 54: 61-72. doi: https://doi.org/10.1016/j.socnet.2017.12.002.

2 Packages

It is good practice to upload the needed packages at the beginning of your notebook.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(here)

## here() starts at /Users/federico/Desktop/sna

library(igraph)

## 
## Caricamento pacchetto: 'igraph'
## 
## I seguenti oggetti sono mascherati da 'package:lubridate':
## 
##     %--%, union
## 
## I seguenti oggetti sono mascherati da 'package:dplyr':
## 
##     as_data_frame, groups, union
## 
## I seguenti oggetti sono mascherati da 'package:purrr':
## 
##     compose, simplify
## 
## Il seguente oggetto è mascherato da 'package:tidyr':
## 
##     crossing
## 
## Il seguente oggetto è mascherato da 'package:tibble':
## 
##     as_data_frame
## 
## I seguenti oggetti sono mascherati da 'package:stats':
## 
##     decompose, spectrum
## 
## Il seguente oggetto è mascherato da 'package:base':
## 
##     union

library(ggraph)
library(tidygraph)

## 
## Caricamento pacchetto: 'tidygraph'
## 
## Il seguente oggetto è mascherato da 'package:igraph':
## 
##     groups
## 
## Il seguente oggetto è mascherato da 'package:stats':
## 
##     filter

library(purrr)

3 Collaboration

3.1 Loading data

Now, we need to import data on collaboration relationships.
In order to let R find the data, we need to move the data file to the same directory where our Rstudio project is located (i.e., the working directory)

If you don’t remember what working directory you’re in:

getwd()

## [1] "/Users/federico/Desktop/sna"

Let’s inspect the data structure: we could either open the file outside Rstudio or even from within
It’s a .txt file.
It’s an adjacency matrix (see slides)
Values are 0 and 1 and are separated by a tab (\t)
In order to load it, we need a function to read a table data structure.
read.table is classic R, while readr::read_delim is faster. It’s included in the tidyverse, which we have already uploaded.

Now we can inspect the read_delim function in the console, by hitting

?read_delim

What information do we need? * the file name: collaboration.txt * the column separator: "\t" * whether we have column names: no

Where is my data file?

dir()

## [1] "data"                     "sna-notebook_teacher.Rmd"
## [3] "sna-notebook.html"        "sna-notebook.nb.html"    
## [5] "sna-notebook.Rmd"         "sna.Rproj"

In order to maximise reproducibility, we need the path to be available to all possible colleagues. We can use the here::here function, which builds a new path regardless of the original path.

The function here() finds my working directory and builds a plausible path towards the specified arguments.

here("data", "collaboration.txt")

## [1] "/Users/federico/Desktop/sna/data/collaboration.txt"

Now we’re ready to load collaboration data as a new object. We can use read_tsv, which takes "\t" as a default value of the delim parameter

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE)

What kind of object have we stored our data in?

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  class()

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

It’s a tibble, the tidyverse version of a data.frame.

Let’s inspect our data

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  str()

## spc_tbl_ [29 × 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ X1 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X2 : num [1:29] 0 0 0 1 0 0 1 1 1 1 ...
##  $ X3 : num [1:29] 0 0 0 0 0 1 0 1 1 1 ...
##  $ X4 : num [1:29] 0 1 0 0 0 0 0 0 1 1 ...
##  $ X5 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X6 : num [1:29] 0 0 1 0 0 0 0 0 1 1 ...
##  $ X7 : num [1:29] 0 1 0 0 0 0 0 0 0 0 ...
##  $ X8 : num [1:29] 0 1 1 0 0 0 0 0 0 1 ...
##  $ X9 : num [1:29] 0 1 1 1 0 1 0 0 0 0 ...
##  $ X10: num [1:29] 0 1 1 1 0 1 0 1 0 0 ...
##  $ X11: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X12: num [1:29] 1 0 0 0 0 0 0 0 0 0 ...
##  $ X13: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X14: num [1:29] 0 1 1 1 0 0 0 0 1 0 ...
##  $ X15: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X16: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X17: num [1:29] 0 0 0 0 0 1 0 0 0 0 ...
##  $ X18: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X19: num [1:29] 0 0 1 0 0 1 0 1 1 1 ...
##  $ X20: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X21: num [1:29] 0 0 0 0 0 0 1 1 0 1 ...
##  $ X22: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X23: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X24: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X25: num [1:29] 0 0 1 0 0 0 0 0 0 0 ...
##  $ X26: num [1:29] 0 0 1 0 0 0 0 0 0 1 ...
##  $ X27: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X28: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X29: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   X2 = col_double(),
##   ..   X3 = col_double(),
##   ..   X4 = col_double(),
##   ..   X5 = col_double(),
##   ..   X6 = col_double(),
##   ..   X7 = col_double(),
##   ..   X8 = col_double(),
##   ..   X9 = col_double(),
##   ..   X10 = col_double(),
##   ..   X11 = col_double(),
##   ..   X12 = col_double(),
##   ..   X13 = col_double(),
##   ..   X14 = col_double(),
##   ..   X15 = col_double(),
##   ..   X16 = col_double(),
##   ..   X17 = col_double(),
##   ..   X18 = col_double(),
##   ..   X19 = col_double(),
##   ..   X20 = col_double(),
##   ..   X21 = col_double(),
##   ..   X22 = col_double(),
##   ..   X23 = col_double(),
##   ..   X24 = col_double(),
##   ..   X25 = col_double(),
##   ..   X26 = col_double(),
##   ..   X27 = col_double(),
##   ..   X28 = col_double(),
##   ..   X29 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

It’s interpreted as a case-by-variable matrix, with 29 variables.
Instead, what we need is an adjacency matrix.

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix()

##       X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
##  [1,]  0  0  0  0  0  0  0  0  0   0   0   1   0   0   0   0   0   0   0   0
##  [2,]  0  0  0  1  0  0  1  1  1   1   0   0   1   1   1   1   0   0   0   0
##  [3,]  0  0  0  0  0  1  0  1  1   1   0   0   0   1   0   0   0   0   1   1
##  [4,]  0  1  0  0  0  0  0  0  1   1   0   0   0   1   0   0   0   0   0   0
##  [5,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [6,]  0  0  1  0  0  0  0  0  1   1   0   0   1   0   1   1   1   0   1   1
##  [7,]  0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [8,]  0  1  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
##  [9,]  0  1  1  1  0  1  0  0  0   0   0   0   0   1   0   0   0   0   1   1
## [10,]  0  1  1  1  0  1  0  1  0   0   0   0   1   0   1   1   0   0   1   1
## [11,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [12,]  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [13,]  0  1  0  0  0  1  0  0  0   1   0   0   0   0   1   1   0   0   0   1
## [14,]  0  1  1  1  0  0  0  0  1   0   0   0   0   0   0   0   0   0   0   0
## [15,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   0   1   0   0   0   1
## [16,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   1   0   0   0   0   1
## [17,]  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   1
## [18,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [19,]  0  0  1  0  0  1  0  1  1   1   0   0   0   0   0   0   0   0   0   1
## [20,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   1   1   0   1   0
## [21,]  0  0  0  0  0  0  1  1  0   1   0   0   0   0   0   0   0   0   0   0
## [22,]  0  0  0  0  0  0  0  0  0   0   0   0   1   0   1   1   0   0   0   0
## [23,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   0   1   0   1   1
## [24,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   1   0
## [25,]  0  0  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   1   1   0
## [26,]  0  0  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
## [27,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   1
## [28,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   0
## [29,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##       X21 X22 X23 X24 X25 X26 X27 X28 X29
##  [1,]   0   0   0   0   0   0   0   0   0
##  [2,]   0   0   0   0   0   0   0   0   0
##  [3,]   0   0   1   0   1   1   1   1   0
##  [4,]   0   0   0   0   0   0   0   0   0
##  [5,]   0   0   0   0   0   0   0   0   0
##  [6,]   0   0   1   0   0   0   0   0   0
##  [7,]   1   0   0   0   0   0   0   0   0
##  [8,]   1   0   0   0   0   0   1   1   0
##  [9,]   0   0   1   0   0   0   0   0   0
## [10,]   1   0   1   0   0   1   1   1   0
## [11,]   0   0   0   0   0   0   0   0   0
## [12,]   0   0   0   0   0   0   0   0   0
## [13,]   0   1   1   0   0   0   0   0   0
## [14,]   0   0   0   0   0   0   0   0   0
## [15,]   0   1   1   0   0   0   0   0   0
## [16,]   0   1   0   0   0   0   0   0   0
## [17,]   0   0   1   0   0   0   0   0   0
## [18,]   0   0   0   0   1   0   0   0   0
## [19,]   0   0   1   1   1   1   1   1   0
## [20,]   0   0   1   0   0   0   1   0   0
## [21,]   0   0   0   0   0   0   0   0   0
## [22,]   0   0   0   0   0   0   0   0   0
## [23,]   0   0   0   0   0   0   1   0   0
## [24,]   0   0   0   0   0   1   0   0   0
## [25,]   0   0   0   0   0   1   1   0   0
## [26,]   0   0   0   1   1   0   1   0   0
## [27,]   0   0   1   0   1   1   0   0   0
## [28,]   0   0   0   0   0   0   0   0   0
## [29,]   0   0   0   0   0   0   0   0   0

Now our data are interpreted by R as a matrix.

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix() %>% 
  class()

## [1] "matrix" "array"

3.2 From data to graphs

In order to apply some SNA functions, we need to use the igraph package, which handles our data as igraph objects, to which its functions can be applied
We have to turn our data into an igraph object.
There’s a series of primitives starting with graph_from that can be useful for us
We need to specify that the matrix should be interpreted as undirected graph

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix() %>% 
  graph_from_adjacency_matrix(mode = "undirected")

## IGRAPH 974219b UN-- 29 82 -- 
## + attr: name (v/c)
## + edges from 974219b (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## + ... omitted several edges

The summary shows us: * that the network is undirected (UN) * 29 nodes * 82 links * the list of the edges

An igraph object is a complex R object made of lists

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix() %>% 
  graph_from_adjacency_matrix(mode = "undirected")

## IGRAPH 81b9fe0 UN-- 29 82 -- 
## + attr: name (v/c)
## + edges from 81b9fe0 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## + ... omitted several edges

Because we will then use the graph to perform analyses, we need it to be stored in an object.

(collaboration <- 
  here("data", "collaboration.txt") %>% 
  read_delim(col_names = FALSE) %>% 
  as.matrix() %>% 
  graph_from_adjacency_matrix(mode = "undirected"))

## IGRAPH 2fe0827 UN-- 29 82 -- 
## + attr: name (v/c)
## + edges from 2fe0827 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## + ... omitted several edges

3.3 Visualization

collaboration %>% 
  ggraph(layout = "fr") +
  geom_edge_link(
    colour = "grey30",
    width = .5,
    alpha = .5
  ) +
  geom_node_point(
    size = 1
  )

## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

3.4 Connectivity

A basic graph-level network property we could be interested in is how connected a network is

Let’s start from the number of edges

ecount(collaboration)

## [1] 82

We can also retrieve the whole list of edges

E(collaboration)

## + 82/82 edges from 2fe0827 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## [65] X16--X22 X17--X20 X17--X23 X18--X25 X19--X20 X19--X23 X19--X24 X19--X25
## [73] X19--X26 X19--X27 X19--X28 X20--X23 X20--X27 X23--X27 X24--X26 X25--X26
## + ... omitted several edges

The number of edges depends on the number of vertices

vcount(collaboration)

## [1] 29

We can also retrieve the list of vertices

V(collaboration)

## + 29/29 vertices, named, from 2fe0827:
##  [1] X1  X2  X3  X4  X5  X6  X7  X8  X9  X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
## [20] X20 X21 X22 X23 X24 X25 X26 X27 X28 X29

3.4.1 Density

In order to infer the level of connectivity from the number of edges, we need to compare it to the theoretically maximum value of possible links
We combine each vertex with all other possible vertices
Since our network is undirected, we divide by 2, so that we don’t count pairs twice
We’ll store the calculation output in a new object because we’re going to use it later again and we want to stay D.R.Y.

(max_links <- vcount(collaboration) * (vcount(collaboration) - 1) / 2)

## [1] 406

Now, we can calculate density as the ratio between the number of actual links and the number of possible links.

(density <- ecount(collaboration) / max_links)

## [1] 0.2019704

Actually, we don’t need to perform this calculation every time. There is a specific function in igraph

graph.density(collaboration)

## [1] 0.2019704

3.4.2 Discussion

Is the density value high or low? Is our collaboration network dense or sparse?
Let’s compare with another network.

4 Support

4.1 Loading data

Data on support relationships has a different structure
It’s an edgelist (see slides)
We still need to assign data to a matrix object (not an adjacency matrix, though)

here("data", "support.csv") %>% 
  read_delim(
    delim = " ", 
    col_names = FALSE
  ) %>% 
  as.matrix() %>% 
  graph_from_edgelist()

## IGRAPH 3d4a922 DN-- 28 99 -- 
## + attr: name (v/c)
## + edges from 3d4a922 (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges

Why are nodes 28? We were expecting 29
One of the nodes might be an isolate
It’s always good practice to store the list of edges and the list of nodes in two different objects, then integrate them in a graph object
The nodes are coded in the edge list file as “V1”, “V2”, …, “V29”
So, we’ll generate a character vector by combining “V” with numbers between 1-29

support_edges <- 
  here("data", "support.csv") %>% 
  read_delim(
    delim = " ", 
    col_names = FALSE
  ) %>% 
  as.matrix()

## Rows: 99 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: " "
## chr (2): X1, X2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now we have a vector of nodes (node list) and a matrix of edges (edge list). We can then generate a graph object in a safer way through graph_from_data_frame, which takes a node list (vector) and and edge list (matrix) as inputs.

(support <- 
   graph_from_data_frame(
     d = support_edges, 
     vertices = paste0("V", 1:29)
   ))

## IGRAPH cae113e DN-- 29 99 -- 
## + attr: name (v/c)
## + edges from cae113e (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges

4.2 Connectivity

Calculating density:

graph.density(support)

## [1] 0.1219212

4.2.1 Discussion

Is the density value high or low?
Are social networks capable of being fully connected?
Are all relationships equal in terms of possible density?

5 Centrality

How (heterogeneously) connected are nodes?

Which nodes are the most important ones?
Centrality is a (rather vague) concept of (structural) importance within a network
In a wider sense, a node’s degree can be seen as its degree centrality.

5.1 Degree centrality

5.1.1 Undirected network

Degree of a graph vertex is its number of edges (= size of its personal network)

Let’s look at the degree in collaboration of our nodes

degree(collaboration)

##  X1  X2  X3  X4  X5  X6  X7  X8  X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 
##   1   9  12   4   0  10   2   7   8  15   0   1   8   4   8   7   3   1  12  11 
## X21 X22 X23 X24 X25 X26 X27 X28 X29 
##   3   3  10   2   5   6   8   4   0

This is the degree distribution
It’s a numeric vector (each value comes with a label), so various operations can be applied on it

collaboration %>% 
  degree() %>% 
  sort(decreasing = TRUE)

## X10  X3 X19 X20  X6 X23  X2  X9 X13 X15 X27  X8 X16 X26 X25  X4 X14 X28 X17 X21 
##  15  12  12  11  10  10   9   8   8   8   8   7   7   6   5   4   4   4   3   3 
## X22  X7 X24  X1 X12 X18  X5 X11 X29 
##   3   2   2   1   1   1   0   0   0

If we are interested in the relative degree:

degree_distribution(collaboration)

##  [1] 0.10344828 0.10344828 0.06896552 0.10344828 0.10344828 0.03448276
##  [7] 0.03448276 0.06896552 0.13793103 0.03448276 0.06896552 0.03448276
## [13] 0.06896552 0.00000000 0.00000000 0.03448276

It’s easier to visualize it by using basic R plotting functions

collaboration %>% 
  degree() %>% 
  hist(
     col = "firebrick",
     xlab = "Degree",
     ylab = "Frequency",
     main = "Professional collaboration",
     breaks = min(.):max(.)
  )

A more elegant graph can be plotted with ggplot2

collaboration %>% 
  degree() %>% 
  as.tibble() %>% 
  ggplot() +
  geom_histogram(
    aes(value),
    binwidth = 1,
    color = "red",
    fill = "white"
  ) +
  theme_bw() +
  xlab("Degree") +
  ylab("Frequency")

## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` instead.
## ℹ The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

5.1.2 In- and out-degree (directed)

If the network is directed: * in-degree (popularity): number of arcs directed to a vertex * out-degree (activity): number of arcs directed away from a vertex

Let’s look at the support network

degree(support, mode = "in")

##  V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 
##   4   6   3   4   1   5   1   2   2   7   0   5   7   2   7   5   3   0   5   5 
## V21 V22 V23 V24 V25 V26 V27 V28 V29 
##   1   2   2   2   7   1   6   2   2

Visualization of the in-degree

degree(support, mode = "in") %>% 
  hist(,
     col = "red",
     xlab = "In-degree",
     ylab = "Frequency",
     main = "Support",
     breaks = min(.):max(.)
  )

Visualization of the out-degree

degree(support, mode = "out") %>% 
  hist(
     col = "blue",
     xlab = "Out-degree",
     ylab = "Frequency",
     main = "Support",
     breaks = min(.):max(.)
  )

Let’s plot both degree directions in the same chart: * We need ggplot(), which applies to data.frames or tibbles * Create a tibble with two columns (in- and out-degree) * In order to plot two variables in the same chart, we need to make it a tidy dataset: 1. Each variable must have its own column. 2. Each observation must have its own row. 3. Each value must have its own cell. * change dataset form to a longer form with degree in one variable and direction as the other variable
* now we can apply ggplot

tibble(
  indegree = support %>% degree(mode = "in"),
  outdegree = support %>% degree(mode = "out")
) %>%
  pivot_longer(
    cols = everything(),
    names_to = "direction", 
    values_to = "degree"
  ) %>%
  ggplot(aes(x = degree, fill = direction)) +
  geom_bar(position = "identity", alpha = .6) +
  theme_minimal() +
  ggtitle("Support: both degree directions")

5.1.3 Node attributes

We might want to analyse whether different degree values are linked to the subjects’ individual properties.
We need to analyse node attributes

First, we need to load data on nodal attributes.

(node_attributes <- 
  here("data", "node-attributes.csv") %>% 
  read_csv())

## New names:
## Rows: 29 Columns: 19
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): id dbl (18): ...1, education, age, age3, gender, seniority, seniority_rec,
## cont...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

Now we can integrate our dataset of the nodes’ attributes to the graph objects.

(vertex_attr(support) <- vertex_attr(collaboration) <- node_attributes)

Node attributes can be inspected

collaboration %>% 
  vertex_attr() %>% 
  summarise(
    mean = mean(seniority),
    sd = sd(seniority)
  )

… and plotted

collaboration %>% 
  vertex_attr() %>% 
  ggplot() +
  geom_histogram(aes(age))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot support w/ attributes

support %>%
  ggraph(layout = "auto") +
  geom_node_point(
    aes(
      size = support %>% degree(),
      color = gender
    ),
    show.legend = FALSE
  ) +
  # geom_edge_link() +
  geom_edge_fan(
    arrow = arrow(length = unit(2, "mm")),
    end_cap = circle(3, "mm"),
    color = "grey30",
    width = .5,
    alpha = .5
  ) +
  theme_void()

Adding degree centrality as a node attribute

collaboration %>% 
  vertex_attr() %>% 
  mutate(
    degree = collaboration %>% degree()
  ) %>% 
  as_tibble() %>% 
  ggplot() +
  geom_boxplot(aes(factor(age3), degree))

5.2 Eigenvector centrality

Example data

(net1 <- 
  here("data", "cent_example_1.rds") %>% 
  read_rds())

## IGRAPH e0b5591 U--- 11 17 -- 
## + attr: cent (v/n)
## + edges from e0b5591:
##  [1] 1--11 2-- 4 3-- 5 3--11 4-- 8 5-- 9 5--11 6-- 7 6-- 8 6--10 6--11 7-- 9
## [13] 7--10 7--11 8-- 9 8--10 9--10

If we plot the network by visualizing degree centrality, who is the most central node?

net1 %<>% 
  as_tbl_graph() %>% 
  activate(nodes) %>% 
  mutate(name = 1:vcount(.))

net1_plot <- net1 %>% 
  # as_tbl_graph() %>% 
  ggraph(layout = "auto") +
  geom_edge_link(
    color = "grey",
    width = .5
  ) +
  geom_node_text(aes(label = name))

## Using "stress" as default layout

net1_plot +
  geom_node_point(
    aes(size = degree(net1)),
    alpha = .4,
    show.legend = FALSE
  )

Who would you choose for a policy intervention to let something diffuse more rapidly?

We need to calculate eigenvector centrality (see slides)

eigen_centrality(net1)

## $vector
##         1         2         3         4         5         6         7         8 
## 0.2259630 0.0645825 0.3786244 0.2415182 0.5709057 0.9846544 1.0000000 0.8386195 
##         9        10        11 
## 0.9113529 0.9986474 0.8450304 
## 
## $value
## [1] 3.739685
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 11
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 3000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 8
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 26
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 11

composite object
we need to retrieve a vector
scaled

net1_plot +
  geom_node_point(
    aes(
      size = (eigen_centrality(net1)$vector)^2,
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Compare degree centrality and eigenvector centrality

tibble(
  vertex = 1:vcount(net1),
  deg = degree(net1),
  eigen = eigen_centrality(net1)$vector
)

5.3 Betweenness centrality

Calculate

betweenness(net1, normalized = TRUE)

##          1          2          3          4          5          6          7 
## 0.00000000 0.00000000 0.00000000 0.20000000 0.08518519 0.21851852 0.05925926 
##          8          9         10         11 
## 0.36296296 0.16296296 0.02962963 0.32592593

Plot

net1_plot +
  geom_node_point(
    aes(
      size = betweenness(net1, normalized = TRUE),
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Compare degree centrality, eigenvector centrality, and betweenness centrality

tibble(
  vertex = 1:vcount(net1),
  deg = degree(net1),
  eigen = eigen_centrality(net1)$vector,
  btw = betweenness(net1)
)

5.4 Centrality in advice

Loading data

(advice <- 
  here("data", "advice-coworking.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix() %>% 
  graph_from_adjacency_matrix())

## IGRAPH 6012415 DN-- 29 120 -- 
## + attr: name (v/c)
## + edges from 6012415 (vertex names):
##  [1] X1 ->X2  X1 ->X12 X2 ->X4  X2 ->X8  X2 ->X13 X2 ->X15 X2 ->X16 X2 ->X28
##  [9] X3 ->X6  X3 ->X10 X3 ->X15 X3 ->X16 X3 ->X19 X3 ->X20 X3 ->X23 X3 ->X25
## [17] X3 ->X26 X4 ->X1  X4 ->X2  X4 ->X15 X4 ->X27 X4 ->X28 X5 ->X9  X5 ->X24
## [25] X6 ->X13 X6 ->X15 X6 ->X16 X6 ->X17 X6 ->X20 X6 ->X21 X6 ->X23 X7 ->X2 
## [33] X8 ->X2  X8 ->X4  X8 ->X13 X8 ->X15 X8 ->X28 X9 ->X2  X9 ->X4  X9 ->X6 
## [41] X9 ->X14 X9 ->X15 X9 ->X23 X10->X3  X10->X13 X10->X15 X10->X16 X10->X19
## [49] X10->X21 X10->X25 X11->X15 X12->X2  X12->X6  X12->X7  X13->X15 X13->X16
## [57] X14->X2  X14->X4  X14->X9  X14->X25 X15->X6  X15->X13 X15->X16 X15->X21
## + ... omitted several edges

Add attributes

vertex_attr(advice) <- node_attributes

Plot degree centrality vs. betweenness centrality

advice_plot <- advice %>% 
  ggraph(layout = "kk") +
  geom_edge_fan(
    arrow = arrow(length = unit(2, "mm")),
    end_cap = circle(3, "mm"),
    color = "grey30",
    width = .5,
    alpha = .5
  ) +
  geom_node_text(aes(label = id))

advice_plot +
  geom_node_point(
    aes(
      # size = degree(advice, mode = "in"),
      size = betweenness(advice, directed = TRUE),
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Clustering and components
For small networks, Girvan-Newman algorithm is one of the most used and fastest
it maximises modularity (the degree to which the network is divided into clusters with possible maximal density within and possible minimal connection between)
it uses edge betweenness to find and remove central edges

cluster_edge_betweenness(support)

## IGRAPH clustering edge betweenness, groups: 11, mod: 0.29
## + groups:
##   $`1`
##   [1]  1  2  4  9 14 21 28 29
##   
##   $`2`
##   [1] 3
##   
##   $`3`
##   [1] 5
##   
##   $`4`
##   + ... omitted several groups/vertices

plot

support_plot <- 
  support %>% 
  ggraph(layout = "auto") +
  geom_edge_fan(
    arrow = arrow(length = unit(2, "mm")),
    end_cap = circle(3, "mm"),
    color = "grey30",
    width = .5,
    alpha = .5
  )

## Using "stress" as default layout

support_plot +
  geom_node_point(
    aes(
      color = factor(group_edge_betweenness()),
      size = 5,
      alpha = .5
    ),
    show.legend = FALSE
  ) +
  geom_node_text(aes(label = id))

6 Network processes

6.1 Reciprocity

Loading a suite of packages including sna, network, and ergm which we’ll use the make statistical inference. We load it here and not at the beginning of the notebook because it can rise conflicts with some igraph functions that we have used so far.

library(statnet)

Creating a network object, which is needed by functions included in the sna and ergm packages

(support_net <- 
  network(
    x = as_adjacency_matrix(support), 
    node_attributes
  ))

##  Network attributes:
##   vertices = 29 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 99 
##     missing edges= 0 
##     non-missing edges= 99 
## 
##  Vertex attribute names: 
##     ...1 age age3 children contract education employment family field gen_trust gender id identity motivation rev satisfaction seniority seniority_rec soc_capital vertex.names 
## 
## No edge attributes

Inspecting attributes in network

support_net %v% "age"

##  [1] 40 41 25 28 28 31 39 28 31 28 33 52 31 27 31 31 32 31 34 40 28 31 24 27 31
## [26] 24 34 27 36

Let’s calculate the number of complete dyads

mutuality(support_net) # number of complete dyads (pairs with reciprocal ties)

## Mut 
##  25

sna::dyad.census(support_net) # 812 dyads, 406 pairs of nodes (29 * 28 / 2), 99 dyads with at least one tie, of which 25 are symmetrical, which means 50 ties are in symmetric dyads + 49 which are not in symmetric dyads = 99 dyads with a tie

##      Mut Asym Null
## [1,]  25   49  332

What should we compare it to?

fix number of nodes
assume a data-generating process

Let’s compare it with a random network with 29 nodes. We need to specify a probability for ties to occur. let’s pick random chance

rnd_net <- 
  rgraph(
    n = 29, 
    m = 1, 
    tprob = .5
  )

0.5 is clearly unrealistic, so let’s take the density of our empirical network as the baseline probability of ties to occurr
Let’s simulate a whole distribution of random graphs conditioned to n = 29 and probability = .12

rnd_net <- 
  rgraph(
    n = 29, 
    m = 1000, 
    tprob = gden(support_net)
  )

Our empirically observed statistic doesn’t even lie within the stochastic distribution!

hist(mutuality(rnd_net), xlim = c(0, 30))
abline(v = mutuality(support_net), col="blue")

7 ERGM

Let’s start by fitting an ERGM with just probability of edges in it (kind of intercept in a regression model)

m0 <- support_net ~ edges
summary(m0)

## edges 
##    99

We’ll look for a distribution of random graphs with 29 nodes that has 99 as expected number of edges.
In order to do it, we search the right parameter estimate

fit0 <- ergm(m0)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Stopping at the initial estimate.

## Evaluating log-likelihood at the estimate.

summary(fit0)

## Call:
## ergm(formula = m0)
## 
## Maximum Likelihood Results:
## 
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -1.9744     0.1073      0  -18.41   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  602.1  on 811  degrees of freedom
##  
## AIC: 604.1  BIC: 608.8  (Smaller is better. MC Std. Err. = 0)

It’s very quick because there are no dependencies!
No need for MCMC MLE. We used Maximum Pseudo-Likelihood Estimation (no simulation) (almost a logistic regression)

From log-odds to odds-ratio:

coef(fit0) # log-odds

##     edges 
## -1.974362

exp(coef(fit0)) / (1 + exp(coef(fit0)))

##     edges 
## 0.1219212

The log-odds (ln(p/(i-p))) of a tie to be present in the network (according to our model) is -1.97 * the change-statistic (delta) in the number of ties, which is 1 because every tie addition increases the statistic of the edge configuration by 1

coef(fit0) * 1

##     edges 
## -1.974362

To obtain the probability, we take the inverse of the logit

exp(coef(fit0)) / (1 + exp(coef(fit0)))

##     edges 
## 0.1219212

Model with reciprocity (dyadic independence model)

m1 <- update.formula(m0, . ~ . + mutual)
summary(m1)

##  edges mutual 
##     99     25

Fitting the model to the data

fit1 <- ergm(m1)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0011.

## Convergence test p-value: 0.0019. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit1)

## Call:
## ergm(formula = m1)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges   -2.5994     0.1764      0 -14.736   <1e-04 ***
## mutual   2.6162     0.3658      0   7.151   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1126  on 812  degrees of freedom
##  Residual Deviance:  548  on 810  degrees of freedom
##  
## AIC: 552  BIC: 561.4  (Smaller is better. MC Std. Err. = 0.6832)

More reciprocal dyads in our network than in random graphs with 29 nodes and .12 density

We want to control for transitive closure, which might confound the effect of reciprocity.
We’re going to add gwesp to the model (Geometrically Weighted Edgewise Shared Partners) with .5 as a fixed smoothing constant

m2 <- update.formula(m1, . ~ . + gwesp(.5, fixed = T))
summary(m2)

##           edges          mutual gwesp.fixed.0.5 
##        99.00000        25.00000        81.46703

Estimates of this model rarely converge. If we fail fitting this model, it’s because the model needs more information on the network.
Let’s add open triads as a configuration

m3 <- update.formula(m2, . ~ . + twopath)
fit3 <- ergm(m3)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.8947.

## Estimating equations are not within tolerance region.

## Iteration 2 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0361.

## Convergence test p-value: 0.2302. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0031.
## Convergence test p-value: 0.3890. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0182.
## Convergence test p-value: 0.0148. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0106.
## Convergence test p-value: 0.0132. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0276.
## Convergence test p-value: 0.0521. Not converged with 99% confidence; increasing sample size.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0162.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit3)

## Call:
## ergm(formula = m3)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                 Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges           -2.25307    0.32703      0  -6.889   <1e-04 ***
## mutual           2.01452    0.36874      0   5.463   <1e-04 ***
## gwesp.fixed.0.5  1.06762    0.15828      0   6.745   <1e-04 ***
## twopath         -0.24091    0.05413      0  -4.450   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  494.2  on 808  degrees of freedom
##  
## AIC: 502.2  BIC: 521  (Smaller is better. MC Std. Err. = 0.5544)

Let’s now control for social selection along age

m4 <- update.formula(m3, . ~ . + nodeicov("age") + nodeocov("age") + absdiff("age"))
fit4 <- ergm(m4)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 1.0064.

## Estimating equations are not within tolerance region.

## Iteration 2 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.1868.

## Estimating equations are not within tolerance region.

## Iteration 3 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.4920.

## Estimating equations are not within tolerance region.

## Iteration 4 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.9067.

## Estimating equations are not within tolerance region.

## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.

## Iteration 5 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.4773.

## Estimating equations are not within tolerance region.

## Iteration 6 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.1749.

## Estimating equations are not within tolerance region.

## Iteration 7 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0669.

## Convergence test p-value: 0.7537. Not converged with 99% confidence; increasing sample size.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0396.
## Convergence test p-value: 0.1643. Not converged with 99% confidence; increasing sample size.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0421.
## Convergence test p-value: 0.0162. Not converged with 99% confidence; increasing sample size.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0296.
## Convergence test p-value: 0.0029. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit4)

## Call:
## ergm(formula = m4)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                 Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges           -3.15391    0.76549      0  -4.120   <1e-04 ***
## mutual           2.15869    0.36711      0   5.880   <1e-04 ***
## gwesp.fixed.0.5  1.03748    0.15600      0   6.651   <1e-04 ***
## twopath         -0.23186    0.05660      0  -4.097   <1e-04 ***
## nodeicov.age     0.04514    0.01901      0   2.374   0.0176 *  
## nodeocov.age    -0.01351    0.01911      0  -0.707   0.4796    
## absdiff.age     -0.03189    0.02073      0  -1.539   0.1239    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  487.9  on 805  degrees of freedom
##  
## AIC: 501.9  BIC: 534.8  (Smaller is better. MC Std. Err. = 0.8173)

Based on the estimated models, we have good evidence of reciprocation driving the formation of the support network.

Social Network Analysis

Analysis

Federico Bianchi