1 Introduction

Bianchi, F., Casnici, N., & Squazzoni, F. (2018). Solidarity as a byproduct of professional collaboration. Social support and trust in a coworking space. Social Networks, 54: 61-72. doi: https://doi.org/10.1016/j.socnet.2017.12.002.

2 Packages

It is good practice to upload the needed packages at the beginning of your notebook.

library(tidyverse)
library(here)
library(igraph)
library(ggraph)
library(tidygraph)
library(magrittr)
library(egor)
library(statnet)

3 Loading data

3.1 Collaboration

  • Now, we need to import data on collaboration relationships.
  • In order to let R find the data, we need to move the data file to the same directory where our Rstudio project is located (i.e., the working directory)

If you don’t remember what working directory you’re in:

getwd()
## [1] "/Users/federico/Desktop/sna-class-24"
  • Let’s inspect the data structure: we could either open the file outside Rstudio or even from within

  • It’s a .txt file.

  • It’s an adjacency matrix (see slides)

  • Values are 0 and 1 and are separated by a tab (\t)

  • In order to load it, we need a function to read a table data structure.

  • read.table is classic R, while readr::read_delim is faster. It’s included in the tidyverse, which we have already uploaded.

Now we can inspect the read_delim function in the console, by hitting

?read_delim

What information do we need?

  • the file name: collaboration.txt
  • the column separator: "\t"
  • whether we have column names: no

Where is my data file?

dir()
## [1] "data"                  "sna-class-24.Rproj"    "sna-notebook_files"   
## [4] "sna-notebook.html"     "sna-notebook.nb.html"  "sna-notebook.Rmd"     
## [7] "wp-sna-notebook_files" "wp-sna-notebook.Rmd"

In order to maximise reproducibility, we need the path to be available to all possible colleagues. We can use the here::here function, which builds a new path regardless of the original path. More here.

Let’s load the collaboration data

The function here() finds my working directory and builds a plausible path towards the specified arguments.

here("data", "collaboration.txt")
## [1] "/Users/federico/Desktop/sna-class-24/data/collaboration.txt"

Now we’re ready to load collaboration data as a new object. We can use read_tsv, which takes "\t" as a default value of the delim parameter

read_tsv(
  file = here("data", "collaboration.txt"),
  col_names = FALSE
)

What kind of object have we stored our data in?

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  class()
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

It’s a tibble, the tidyverse version of a data.frame.

Let’s inspect our data

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  str()
## spc_tbl_ [29 × 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ X1 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X2 : num [1:29] 0 0 0 1 0 0 1 1 1 1 ...
##  $ X3 : num [1:29] 0 0 0 0 0 1 0 1 1 1 ...
##  $ X4 : num [1:29] 0 1 0 0 0 0 0 0 1 1 ...
##  $ X5 : num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X6 : num [1:29] 0 0 1 0 0 0 0 0 1 1 ...
##  $ X7 : num [1:29] 0 1 0 0 0 0 0 0 0 0 ...
##  $ X8 : num [1:29] 0 1 1 0 0 0 0 0 0 1 ...
##  $ X9 : num [1:29] 0 1 1 1 0 1 0 0 0 0 ...
##  $ X10: num [1:29] 0 1 1 1 0 1 0 1 0 0 ...
##  $ X11: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X12: num [1:29] 1 0 0 0 0 0 0 0 0 0 ...
##  $ X13: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X14: num [1:29] 0 1 1 1 0 0 0 0 1 0 ...
##  $ X15: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X16: num [1:29] 0 1 0 0 0 1 0 0 0 1 ...
##  $ X17: num [1:29] 0 0 0 0 0 1 0 0 0 0 ...
##  $ X18: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X19: num [1:29] 0 0 1 0 0 1 0 1 1 1 ...
##  $ X20: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X21: num [1:29] 0 0 0 0 0 0 1 1 0 1 ...
##  $ X22: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X23: num [1:29] 0 0 1 0 0 1 0 0 1 1 ...
##  $ X24: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  $ X25: num [1:29] 0 0 1 0 0 0 0 0 0 0 ...
##  $ X26: num [1:29] 0 0 1 0 0 0 0 0 0 1 ...
##  $ X27: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X28: num [1:29] 0 0 1 0 0 0 0 1 0 1 ...
##  $ X29: num [1:29] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   X2 = col_double(),
##   ..   X3 = col_double(),
##   ..   X4 = col_double(),
##   ..   X5 = col_double(),
##   ..   X6 = col_double(),
##   ..   X7 = col_double(),
##   ..   X8 = col_double(),
##   ..   X9 = col_double(),
##   ..   X10 = col_double(),
##   ..   X11 = col_double(),
##   ..   X12 = col_double(),
##   ..   X13 = col_double(),
##   ..   X14 = col_double(),
##   ..   X15 = col_double(),
##   ..   X16 = col_double(),
##   ..   X17 = col_double(),
##   ..   X18 = col_double(),
##   ..   X19 = col_double(),
##   ..   X20 = col_double(),
##   ..   X21 = col_double(),
##   ..   X22 = col_double(),
##   ..   X23 = col_double(),
##   ..   X24 = col_double(),
##   ..   X25 = col_double(),
##   ..   X26 = col_double(),
##   ..   X27 = col_double(),
##   ..   X28 = col_double(),
##   ..   X29 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
  • It’s interpreted as a case-by-variable matrix, with 29 variables.
  • Instead, what we need is an adjacency matrix.
here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix()
##       X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
##  [1,]  0  0  0  0  0  0  0  0  0   0   0   1   0   0   0   0   0   0   0   0
##  [2,]  0  0  0  1  0  0  1  1  1   1   0   0   1   1   1   1   0   0   0   0
##  [3,]  0  0  0  0  0  1  0  1  1   1   0   0   0   1   0   0   0   0   1   1
##  [4,]  0  1  0  0  0  0  0  0  1   1   0   0   0   1   0   0   0   0   0   0
##  [5,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [6,]  0  0  1  0  0  0  0  0  1   1   0   0   1   0   1   1   1   0   1   1
##  [7,]  0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##  [8,]  0  1  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
##  [9,]  0  1  1  1  0  1  0  0  0   0   0   0   0   1   0   0   0   0   1   1
## [10,]  0  1  1  1  0  1  0  1  0   0   0   0   1   0   1   1   0   0   1   1
## [11,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [12,]  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [13,]  0  1  0  0  0  1  0  0  0   1   0   0   0   0   1   1   0   0   0   1
## [14,]  0  1  1  1  0  0  0  0  1   0   0   0   0   0   0   0   0   0   0   0
## [15,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   0   1   0   0   0   1
## [16,]  0  1  0  0  0  1  0  0  0   1   0   0   1   0   1   0   0   0   0   1
## [17,]  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   1
## [18,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## [19,]  0  0  1  0  0  1  0  1  1   1   0   0   0   0   0   0   0   0   0   1
## [20,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   1   1   0   1   0
## [21,]  0  0  0  0  0  0  1  1  0   1   0   0   0   0   0   0   0   0   0   0
## [22,]  0  0  0  0  0  0  0  0  0   0   0   0   1   0   1   1   0   0   0   0
## [23,]  0  0  1  0  0  1  0  0  1   1   0   0   1   0   1   0   1   0   1   1
## [24,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   1   0
## [25,]  0  0  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   1   1   0
## [26,]  0  0  1  0  0  0  0  0  0   1   0   0   0   0   0   0   0   0   1   0
## [27,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   1
## [28,]  0  0  1  0  0  0  0  1  0   1   0   0   0   0   0   0   0   0   1   0
## [29,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
##       X21 X22 X23 X24 X25 X26 X27 X28 X29
##  [1,]   0   0   0   0   0   0   0   0   0
##  [2,]   0   0   0   0   0   0   0   0   0
##  [3,]   0   0   1   0   1   1   1   1   0
##  [4,]   0   0   0   0   0   0   0   0   0
##  [5,]   0   0   0   0   0   0   0   0   0
##  [6,]   0   0   1   0   0   0   0   0   0
##  [7,]   1   0   0   0   0   0   0   0   0
##  [8,]   1   0   0   0   0   0   1   1   0
##  [9,]   0   0   1   0   0   0   0   0   0
## [10,]   1   0   1   0   0   1   1   1   0
## [11,]   0   0   0   0   0   0   0   0   0
## [12,]   0   0   0   0   0   0   0   0   0
## [13,]   0   1   1   0   0   0   0   0   0
## [14,]   0   0   0   0   0   0   0   0   0
## [15,]   0   1   1   0   0   0   0   0   0
## [16,]   0   1   0   0   0   0   0   0   0
## [17,]   0   0   1   0   0   0   0   0   0
## [18,]   0   0   0   0   1   0   0   0   0
## [19,]   0   0   1   1   1   1   1   1   0
## [20,]   0   0   1   0   0   0   1   0   0
## [21,]   0   0   0   0   0   0   0   0   0
## [22,]   0   0   0   0   0   0   0   0   0
## [23,]   0   0   0   0   0   0   1   0   0
## [24,]   0   0   0   0   0   1   0   0   0
## [25,]   0   0   0   0   0   1   1   0   0
## [26,]   0   0   0   1   1   0   1   0   0
## [27,]   0   0   1   0   1   1   0   0   0
## [28,]   0   0   0   0   0   0   0   0   0
## [29,]   0   0   0   0   0   0   0   0   0

Now our data are interpreted by R as a matrix.

here("data", "collaboration.txt") %>% 
  read_tsv(col_names = FALSE) %>% 
  as.matrix() %>% 
  class()
## [1] "matrix" "array"

It’s better to store the collaboration adjacency matrix in an object, because we might need to use it again.

(
collab_df <- 
  read_tsv(
    file = here("data", "collaboration.txt"),
    col_names = FALSE
  )
)
## Rows: 29 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (29): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.1.1 From data to graphs

  • In order to apply some SNA functions, we need to use the igraph package, which handles our data as igraph objects, to which its functions can be applied
  • We have to turn our data into an igraph object.
  • There’s a series of primitives starting with graph_from that can be useful for us
  • We need to specify that the matrix should be interpreted as undirected graph
  • we’ll store the information in a separate object
(
  collab_net <- 
    graph_from_adjacency_matrix(
      adjmatrix = as.matrix(collab_df), 
      mode = "undirected"
    )
)
## IGRAPH c303b09 UN-- 29 82 -- 
## + attr: name (v/c)
## + edges from c303b09 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## + ... omitted several edges

The summary shows us: * that the network is undirected (U) * 29 nodes * 82 links * the list of the edges

An igraph object is a complex R object made of lists

3.1.2 Visualization

collab_net %>% 
  ggraph(layout = "fr") +
  geom_edge_link(
    colour = "grey30",
    width = .5,
    alpha = .5
  ) +
  geom_node_point(
    size = 1
  )
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

3.2 Support

3.2.1 Loading data

  • Data on support relationships has a different structure
  • It’s an edgelist (see slides)
  • We still need to assign data to a matrix object (not an adjacency matrix, though)

Let’s first load the data

(
support_df <- 
  read_delim(
    file = here("data", "support.csv"),
    delim = " ",
    col_names = FALSE
  )
)
(support_net <- graph_from_edgelist(el = as.matrix(support_df)))
## IGRAPH 3f49f1c DN-- 28 99 -- 
## + attr: name (v/c)
## + edges from 3f49f1c (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges
  • Why are nodes 28? We were expecting 29
  • One of the nodes might be an isolate
  • It’s always a good practice to store the list of edges and the list of nodes in two different objects, then integrate them in a graph object, unless you’re working on a connected component of a larger network
  • The nodes are coded in the edge list file as “V1”, “V2”, …, “V29”
  • So, we’ll generate a character vector by combining “V” with numbers between 1-29

We can then generate a graph object in a safer way through graph_from_data_frame, which takes a node list (vector) and and edge list (matrix) as inputs.

(
support_net <- 
  graph_from_data_frame(
     d = support_df, 
     vertices = paste0("V", 1:29)
  )
)
## IGRAPH c9cd376 DN-- 29 99 -- 
## + attr: name (v/c)
## + edges from c9cd376 (vertex names):
##  [1] V1 ->V29 V2 ->V4  V2 ->V14 V2 ->V15 V2 ->V28 V3 ->V10 V3 ->V13 V3 ->V15
##  [9] V3 ->V16 V3 ->V25 V3 ->V26 V3 ->V27 V4 ->V2  V5 ->V24 V6 ->V12 V6 ->V13
## [17] V6 ->V15 V6 ->V16 V6 ->V17 V6 ->V20 V6 ->V23 V7 ->V1  V7 ->V12 V8 ->V2 
## [25] V8 ->V4  V8 ->V10 V8 ->V13 V8 ->V15 V8 ->V25 V8 ->V27 V8 ->V28 V9 ->V2 
## [33] V9 ->V14 V10->V3  V10->V13 V10->V15 V10->V19 V10->V27 V12->V1  V12->V2 
## [41] V12->V10 V12->V20 V13->V16 V14->V2  V14->V8  V14->V9  V14->V25 V15->V6 
## [49] V15->V10 V15->V13 V15->V16 V15->V22 V16->V13 V16->V15 V16->V22 V17->V6 
## [57] V17->V20 V18->V25 V19->V3  V19->V25 V19->V27 V20->V6  V20->V12 V20->V17
## + ... omitted several edges

4 Connectivity

A basic graph-level network property we could be interested in is how connected a network is.

Let’s start from the number of edges

ecount(collab_net)
## [1] 82

We can also retrieve the whole list of edges

E(collab_net)
## + 82/82 edges from c303b09 (vertex names):
##  [1] X1 --X12 X2 --X4  X2 --X7  X2 --X8  X2 --X9  X2 --X10 X2 --X13 X2 --X14
##  [9] X2 --X15 X2 --X16 X3 --X6  X3 --X8  X3 --X9  X3 --X10 X3 --X14 X3 --X19
## [17] X3 --X20 X3 --X23 X3 --X25 X3 --X26 X3 --X27 X3 --X28 X4 --X9  X4 --X10
## [25] X4 --X14 X6 --X9  X6 --X10 X6 --X13 X6 --X15 X6 --X16 X6 --X17 X6 --X19
## [33] X6 --X20 X6 --X23 X7 --X21 X8 --X10 X8 --X19 X8 --X21 X8 --X27 X8 --X28
## [41] X9 --X14 X9 --X19 X9 --X20 X9 --X23 X10--X13 X10--X15 X10--X16 X10--X19
## [49] X10--X20 X10--X21 X10--X23 X10--X26 X10--X27 X10--X28 X13--X15 X13--X16
## [57] X13--X20 X13--X22 X13--X23 X15--X16 X15--X20 X15--X22 X15--X23 X16--X20
## [65] X16--X22 X17--X20 X17--X23 X18--X25 X19--X20 X19--X23 X19--X24 X19--X25
## [73] X19--X26 X19--X27 X19--X28 X20--X23 X20--X27 X23--X27 X24--X26 X25--X26
## + ... omitted several edges

The number of edges depends on the number of vertices

vcount(collab_net)
## [1] 29

We can also retrieve the list of vertices

V(collab_net)
## + 29/29 vertices, named, from c303b09:
##  [1] X1  X2  X3  X4  X5  X6  X7  X8  X9  X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
## [20] X20 X21 X22 X23 X24 X25 X26 X27 X28 X29

4.1 Density

4.1.1 Collaboration

  • In order to infer the level of connectivity from the number of edges, we need to compare it to the theoretically maximum value of possible links
  • We combine each vertex with all other possible vertices
  • Since our network is undirected, we divide by 2, so we don’t count pairs twice
  • We’ll store the calculation output in a new object because we’re going to use it later again and we want to stay D.R.Y.
(max_links <- vcount(collab_net) * (vcount(collab_net) - 1) / 2)
## [1] 406

Now, we can calculate the graph density as the ratio between the number of actual links and the number of possible links.

ecount(collab_net) / max_links
## [1] 0.2019704

Actually, we don’t need to perform this calculation every time. There is a specific function in igraph

edge_density(collab_net)
## [1] 0.2019704
  • Is the density value high or low? Is our collaboration network dense or sparse?
  • Let’s compare it to another network.

4.1.2 Support

Calculating density:

edge_density(support_net)
## [1] 0.1219212
  • Is the density value high or low?
  • Are social networks capable of being fully connected?
  • Are all relationships equal in terms of possible density?

5 Centrality

How (heterogeneously) connected are nodes?

5.1 Degree centrality

5.1.1 Collaboration

The degree of a graph vertex is its number of edges (= size of its personal network)

Let’s look at the degree of our collaboration network

igraph::degree(collab_net)
##  X1  X2  X3  X4  X5  X6  X7  X8  X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 
##   1   9  12   4   0  10   2   7   8  15   0   1   8   4   8   7   3   1  12  11 
## X21 X22 X23 X24 X25 X26 X27 X28 X29 
##   3   3  10   2   5   6   8   4   0
  • This is the degree distribution
  • It’s a numeric vector (each value comes with a label), so various operations can be applied on it
sort(igraph::degree(collab_net), decreasing = TRUE)
## X10  X3 X19 X20  X6 X23  X2  X9 X13 X15 X27  X8 X16 X26 X25  X4 X14 X28 X17 X21 
##  15  12  12  11  10  10   9   8   8   8   8   7   7   6   5   4   4   4   3   3 
## X22  X7 X24  X1 X12 X18  X5 X11 X29 
##   3   2   2   1   1   1   0   0   0

If we are interested in the relative degree:

degree_distribution(collab_net)
##  [1] 0.10344828 0.10344828 0.06896552 0.10344828 0.10344828 0.03448276
##  [7] 0.03448276 0.06896552 0.13793103 0.03448276 0.06896552 0.03448276
## [13] 0.06896552 0.00000000 0.00000000 0.03448276

It’s easier to visualize it by using basic R plotting functions

collab_net %>% 
  igraph::degree() %>% 
  hist(
     col = "firebrick",
     xlab = "Degree",
     ylab = "Frequency",
     main = "Professional collaboration",
     breaks = min(.):max(.)
  )

A more elegant graph can be plotted with ggplot2

collab_net %>% 
  igraph::degree() %>% 
  as.tibble() %>% 
  ggplot() +
  geom_histogram(
    aes(value),
    binwidth = 1,
    color = "red",
    fill = "white"
  ) +
  theme_bw() +
  xlab("Degree") +
  ylab("Frequency")
## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` instead.
## ℹ The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

5.1.2 Support

If the network is directed: * in-degree (popularity): number of arcs directed to a vertex * out-degree (activity): number of arcs directed away from a vertex

Let’s look at the support network

igraph::degree(support_net, mode = "in")
##  V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 
##   4   6   3   4   1   5   1   2   2   7   0   5   7   2   7   5   3   0   5   5 
## V21 V22 V23 V24 V25 V26 V27 V28 V29 
##   1   2   2   2   7   1   6   2   2

Visualization of the in-degree

support_net %>% 
  igraph::degree(mode = "in") %>% 
  hist(,
     col = "red",
     xlab = "In-degree",
     ylab = "Frequency",
     main = "Support",
     breaks = min(.):max(.)
  )

Visualization of the out-degree

support_net %>% 
  igraph::degree(mode = "out") %>% 
  hist(
     col = "blue",
     xlab = "Out-degree",
     ylab = "Frequency",
     main = "Support",
     breaks = min(.):max(.)
  )

Let’s plot both degree directions in the same chart: * We need ggplot(), which applies to data.frames or tibbles * Create a tibble with two columns (in- and out-degree) * To plot two variables in the same chart, we need to make it a tidy, longer dataset: 1. Each variable must have its own column. 2. Each observation must have its own row. 3. Each value must have its own cell. * change dataset form to a longer form with degree in one variable and direction as the other variable * now we can apply ggplot

tibble(
  indegree = igraph::degree(support_net, mode = "in"),
  outdegree = igraph::degree(support_net, mode = "out")
) %>%
  pivot_longer(
    cols = everything(),
    names_to = "direction", 
    values_to = "degree"
  ) %>%
  ggplot(aes(x = degree, fill = direction)) +
  geom_bar(position = "identity", alpha = .6) +
  theme_bw() +
  ggtitle("Support: both degree directions")

5.2 Node attributes

  • We might want to analyse whether different degree values are linked to the subjects’ individual properties.
  • We need to analyse node attributes

First, we need to load data on nodal attributes.

(node_attributes <- read_csv2(file = here("data", "node-attributes.csv")))

Now we can integrate the dataset with node attributes to the graph objects.

vertex_attr(support_net) <- vertex_attr(collab_net) <- node_attributes

Since we’re going to use tbl_graph objects again, we’ll define new objects

collab_tblgr <- as_tbl_graph(collab_net)
support_tblgr <- as_tbl_graph(support_net)

Node attributes can be inspected

collab_tblgr %>% 
  as_tibble() %>% 
  summarise(
    mean_age = mean(age),
    sd_age = sd(age)
  )

… and plotted

collab_tblgr %>% 
  as_tibble() %>% 
  ggplot() +
  geom_histogram(
    mapping = aes(age),
    bins = 50
  )

Plot support w/ attributes

support_net %>%
  ggraph(layout = "auto") +
  geom_node_point(
    aes(
      size = igraph::degree(support_net),
      color = gender
    ),
    show.legend = FALSE
  ) +
  # geom_edge_link() +
  geom_edge_fan(
    arrow = arrow(length = unit(2, "mm")),
    end_cap = circle(3, "mm"),
    color = "grey30",
    width = .5,
    alpha = .5
  ) +
  theme_void()

Adding degree centrality as a node attribute. We’re going to use a new operator from the magrittr package

collab_tblgr %<>% 
  mutate(degree = centrality_degree())

support_tblgr %<>% 
  mutate(degree = centrality_degree())

We can explore correlations

support_tblgr %>% 
  as_tibble() %>% 
  ggplot(aes(as_factor(age), degree)) +
  geom_tile(color = "black", fill = "red")

collab_tblgr %>% 
  as_tibble() %>% 
  ggplot() +
  geom_boxplot(aes(factor(gender), degree))

5.3 Eigenvector centrality

Let’s load a new network dataset

(
net1 <- 
  here("data", "cent_example_1.rds") %>% 
  read_rds()
)
## IGRAPH e0b5591 U--- 11 17 -- 
## + attr: cent (v/n)
## + edges from e0b5591:
##  [1] 1--11 2-- 4 3-- 5 3--11 4-- 8 5-- 9 5--11 6-- 7 6-- 8 6--10 6--11 7-- 9
## [13] 7--10 7--11 8-- 9 8--10 9--10
net1 %<>% 
  as_tbl_graph() %>% 
  mutate(name = 1:vcount(.))

net1_plot <- 
  net1 %>% 
  ggraph(layout = "auto") +
  geom_edge_link(
    color = "grey",
    width = .5
  ) +
  geom_node_text(aes(label = name))
## Using "stress" as default layout
net1_plot +
  geom_node_point(
    aes(size = igraph::degree(net1)),
    alpha = .4,
    show.legend = F
  )

Degree distribution:

sort(igraph::degree(net1), decreasing = T)
## 11  6  7  8  9 10  5  3  4  1  2 
##  5  4  4  4  4  4  3  2  2  1  1

Who would you choose for a policy intervention to let something diffuse more rapidly?

We need to calculate eigenvector centrality (see slides)

eigen_centrality(net1)
## $vector
##         1         2         3         4         5         6         7         8 
## 0.2259630 0.0645825 0.3786244 0.2415182 0.5709057 0.9846544 1.0000000 0.8386195 
##         9        10        11 
## 0.9113529 0.9986474 0.8450304 
## 
## $value
## [1] 3.739685
## 
## $options
## $options$bmat
## [1] "I"
## 
## $options$n
## [1] 11
## 
## $options$which
## [1] "LA"
## 
## $options$nev
## [1] 1
## 
## $options$tol
## [1] 0
## 
## $options$ncv
## [1] 0
## 
## $options$ldv
## [1] 0
## 
## $options$ishift
## [1] 1
## 
## $options$maxiter
## [1] 3000
## 
## $options$nb
## [1] 1
## 
## $options$mode
## [1] 1
## 
## $options$start
## [1] 1
## 
## $options$sigma
## [1] 0
## 
## $options$sigmai
## [1] 0
## 
## $options$info
## [1] 0
## 
## $options$iter
## [1] 8
## 
## $options$nconv
## [1] 1
## 
## $options$numop
## [1] 26
## 
## $options$numopb
## [1] 0
## 
## $options$numreo
## [1] 11

In case of a directed network, indegree is considered

eigen_centrality(support_net, directed = T)$vector
##  [1] 2.613812e-01 1.879366e-01 3.558680e-01 9.851569e-02 7.909163e-03
##  [6] 5.092964e-01 5.809195e-03 3.359210e-02 3.817411e-02 5.997657e-01
## [11] 1.836642e-17 3.931440e-01 1.000000e+00 6.194062e-02 8.253794e-01
## [16] 8.638070e-01 3.272524e-01 1.836642e-17 6.018240e-01 4.284375e-01
## [21] 2.120614e-02 4.627346e-01 2.568822e-01 2.887195e-02 4.777923e-01
## [26] 9.748625e-02 5.934425e-01 6.068543e-02 7.741179e-02

Let’s plot the network by eigenvector centrality

net1_plot +
  geom_node_point(
    aes(
      size = eigen_centrality(net1)$vector,
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Let’s compare degree and eigenvector centrality

net1 %>% 
  as_tbl_graph() %>% 
  mutate(
    deg = centrality_degree(),
    eigen = centrality_eigen()
  ) %>% 
  as_tibble() %>%
  select(name, deg, eigen) %>% 
  arrange(-deg)

5.4 Betweenness centrality

Let’s calculate betweenness centrality (see slides)

igraph::betweenness(net1, directed = FALSE, normalized = TRUE)
##          1          2          3          4          5          6          7 
## 0.00000000 0.00000000 0.00000000 0.20000000 0.08518519 0.21851852 0.05925926 
##          8          9         10         11 
## 0.36296296 0.16296296 0.02962963 0.32592593

Let’s plot the network by betweenness centrality

net1_plot +
  geom_node_point(
    aes(
      size = igraph::betweenness(net1, directed = FALSE, normalized = TRUE),
      color = "red",
      alpha = .5
    ),
    show.legend = FALSE
  )

Eigenvector and betweenness centrality are not associated

net1 %>% 
  as_tbl_graph() %>% 
  mutate(
    deg = centrality_degree(),
    eigen = centrality_eigen(),
    btw = centrality_betweenness(directed = FALSE)
  ) %>% 
  as_tibble() %>%
  # select(name, deg, eigen, btw) %>% 
  # arrange(-deg)
  ggplot(aes(eigen, btw)) +
  geom_point(color = "black", fill = "transparent")

5.5 Exercise

Loading data

(
advice_df <- 
  here("data", "advice-coworking.txt") %>% 
  read_tsv(col_names = FALSE)
)

Turning the data frame into a graph

(
advice_net <- 
  advice_df %>% 
  as.matrix() %>% 
  graph_from_adjacency_matrix()
)
## IGRAPH 531c8b5 DN-- 29 120 -- 
## + attr: name (v/c)
## + edges from 531c8b5 (vertex names):
##  [1] X1 ->X2  X1 ->X12 X2 ->X4  X2 ->X8  X2 ->X13 X2 ->X15 X2 ->X16 X2 ->X28
##  [9] X3 ->X6  X3 ->X10 X3 ->X15 X3 ->X16 X3 ->X19 X3 ->X20 X3 ->X23 X3 ->X25
## [17] X3 ->X26 X4 ->X1  X4 ->X2  X4 ->X15 X4 ->X27 X4 ->X28 X5 ->X9  X5 ->X24
## [25] X6 ->X13 X6 ->X15 X6 ->X16 X6 ->X17 X6 ->X20 X6 ->X21 X6 ->X23 X7 ->X2 
## [33] X8 ->X2  X8 ->X4  X8 ->X13 X8 ->X15 X8 ->X28 X9 ->X2  X9 ->X4  X9 ->X6 
## [41] X9 ->X14 X9 ->X15 X9 ->X23 X10->X3  X10->X13 X10->X15 X10->X16 X10->X19
## [49] X10->X21 X10->X25 X11->X15 X12->X2  X12->X6  X12->X7  X13->X15 X13->X16
## [57] X14->X2  X14->X4  X14->X9  X14->X25 X15->X6  X15->X13 X15->X16 X15->X21
## + ... omitted several edges

Adding node attributes

vertex_attr(advice_net) <- node_attributes

Tibble graph object

(advice_tblgr <- as_tbl_graph(advice_net))
## # A tbl_graph: 29 nodes and 120 edges
## #
## # A directed simple graph with 1 component
## #
## # A tibble: 29 × 12
##    ...1 id    education         age  age3 gender seniority seniority_rec family 
##   <dbl> <chr> <chr>           <dbl> <dbl> <chr>      <dbl>         <dbl> <chr>  
## 1     1 V1    lower secondary    40     2 M             17             1 single 
## 2     2 V2    upper secondary    41     2 M             46             3 cohabi…
## 3     3 V3    tertiary           25     0 F             39             3 single 
## 4     4 V4    tertiary           28     0 M             25             2 in a s…
## 5     5 V5    tertiary           28     0 M              7             0 in a s…
## 6     6 V6    tertiary           31     1 M             43             3 married
## # ℹ 23 more rows
## # ℹ 3 more variables: children <chr>, soc_capital <dbl>, satisfaction <dbl>
## #
## # A tibble: 120 × 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1    12
## 3     2     4
## # ℹ 117 more rows

Possible descriptive analysis:

(
advice_tblgr %<>% 
  mutate(
    in_deg_cen = centrality_degree(mode = "in"),
    out_deg_cen = centrality_degree(mode = "out"),
    eigen_cen = centrality_eigen(directed = TRUE),
    btw_cen = centrality_betweenness()
  )
)
## # A tbl_graph: 29 nodes and 120 edges
## #
## # A directed simple graph with 1 component
## #
## # A tibble: 29 × 16
##    ...1 id    education         age  age3 gender seniority seniority_rec family 
##   <dbl> <chr> <chr>           <dbl> <dbl> <chr>      <dbl>         <dbl> <chr>  
## 1     1 V1    lower secondary    40     2 M             17             1 single 
## 2     2 V2    upper secondary    41     2 M             46             3 cohabi…
## 3     3 V3    tertiary           25     0 F             39             3 single 
## 4     4 V4    tertiary           28     0 M             25             2 in a s…
## 5     5 V5    tertiary           28     0 M              7             0 in a s…
## 6     6 V6    tertiary           31     1 M             43             3 married
## # ℹ 23 more rows
## # ℹ 7 more variables: children <chr>, soc_capital <dbl>, satisfaction <dbl>,
## #   in_deg_cen <dbl>, out_deg_cen <dbl>, eigen_cen <dbl>, btw_cen <dbl>
## #
## # A tibble: 120 × 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1    12
## 3     2     4
## # ℹ 117 more rows
  • note that eigenvector centrality is calculated on the left eigenvector (in-degree, which makes sense because it’s advice-seeking). if you want out-degree, you need the right eigenvector, so run the function on the transpose of the adjacency matrix eigen_centrality(t(get.adjacency(advice_net, sparse=FALSE)))

Measurements could be scaled

scale <- function(cent) {
  min_cent <- min(cent)
  max_cent <- max(cent)
  scaled_cent <- (cent - min_cent) / (max_cent - min_cent)
  return(scaled_cent)
}

advice_tblgr %<>% 
  mutate(
    across(ends_with("cen") & !starts_with("eigen"), scale)
  )

advice_tblgr %>% 
  as_tibble() %>% 
  select(id, ends_with("cen")) %>% 
  arrange(-in_deg_cen)

Same on support

(
advice_tblgr %<>% 
  mutate(
    in_deg_cen = centrality_degree(mode = "in"),
    out_deg_cen = centrality_degree(mode = "out"),
    eigen_cen = centrality_eigen(directed = TRUE),
    btw_cen = centrality_betweenness()
  )
)
## # A tbl_graph: 29 nodes and 120 edges
## #
## # A directed simple graph with 1 component
## #
## # A tibble: 29 × 16
##    ...1 id    education         age  age3 gender seniority seniority_rec family 
##   <dbl> <chr> <chr>           <dbl> <dbl> <chr>      <dbl>         <dbl> <chr>  
## 1     1 V1    lower secondary    40     2 M             17             1 single 
## 2     2 V2    upper secondary    41     2 M             46             3 cohabi…
## 3     3 V3    tertiary           25     0 F             39             3 single 
## 4     4 V4    tertiary           28     0 M             25             2 in a s…
## 5     5 V5    tertiary           28     0 M              7             0 in a s…
## 6     6 V6    tertiary           31     1 M             43             3 married
## # ℹ 23 more rows
## # ℹ 7 more variables: children <chr>, soc_capital <dbl>, satisfaction <dbl>,
## #   in_deg_cen <dbl>, out_deg_cen <dbl>, eigen_cen <dbl>, btw_cen <dbl>
## #
## # A tibble: 120 × 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1    12
## 3     2     4
## # ℹ 117 more rows
advice_tblgr %>% 
  as_tibble() %>% 
  select(ends_with("cen"))

note: social capital means outside social capital

Various possible bivariates: * seniority is associated with both support and advice eigenvector centrality (the more senior, the more i am considered a source of support and advice) * same for education * social capital associated to advice, less support * same for satisfaction * no such thing for age * no clear picture on betweenness centrality

advice_tblgr %>% 
  as_tibble() %>% 
  ggplot(aes(satisfaction, btw_cen)) +
  # geom_boxplot()
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 5
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at 5
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 1
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0

Betweenness centrality on advice network is quite uniform for values > 0 (network is not very centralized)

advice_tblgr %>% 
  as_tibble() %>% 
  ggplot(aes(eigen_cen)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

6 Analysing ego-centric data

Node attributes

(ego_attr <- read_csv2(file = here("data", "ego-class.csv")))
## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
## Rows: 229 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (12): random, gender, marital, educ, occ.typ, tel.int, dis.phys.count.ca...
## dbl  (3): ego_id, age, mmse
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Alters’ attributes

(alter_attr <- read_csv(file = here("data", "alter-data.csv")))
## Rows: 2729 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): freq.rec, week.daily, alter.sex, alter.cores, role, role.rec, intgen
## dbl (2): ego_id, alter_id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Alter-alter ties

(alter_alter <- read_csv(here("data", "alter-ties.csv")))
## Rows: 8303 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): ego_id, altsource_id, alttarg_id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

More efficient: creating an egor object and generating a list of igraph networks

(
whole_df <- 
  threefiles_to_egor(
    egos = ego_attr,
    alters.df = alter_attr,
    edges = alter_alter,
    ID.vars = list(
      ego = "ego_id",
      alter = "alter_id",
      source = "altsource_id",
      target = "alttarg_id"
    )
  )
)
## # EGO data (active): 229 × 15
##   .egoID random   age gender marital    educ  occ.typ tel.int dis.phys.count.cat
## *  <dbl> <chr>  <dbl> <chr>  <chr>      <chr> <chr>   <chr>   <chr>             
## 1      1 No        84 Male   Not marri… midd… 3. Ret… No      One               
## 2      2 No        81 Male   Married    prim… 1. Ele… No      One               
## 3      3 No        83 Male   Married    prim… 3. Ret… No      Multiple          
## 4      4 No        77 Female Married    prim… 1. Ele… No      One               
## 5      5 No        93 Male   Married    prim… 1. Ele… No      Multiple          
## # ℹ 224 more rows
## # ℹ 6 more variables: hcare.count.cat <chr>, adl.rec <chr>, gds.rec <chr>,
## #   demen <chr>, rsa <chr>, mmse <dbl>
## # ALTER data: 2,729 × 9
##   .altID .egoID freq.rec week.daily alter.sex alter.cores role  role.rec intgen
## *  <dbl>  <dbl> <chr>    <chr>      <chr>     <chr>       <chr> <chr>    <chr> 
## 1    101      1 yearly   No         Male      No          child imm.fam  Yes   
## 2    102      1 yearly   No         Male      No          child imm.fam  Yes   
## 3    103      1 yearly   No         Male      No          child imm.fam  Yes   
## # ℹ 2,726 more rows
## # AATIE data: 8,303 × 3
##   .egoID .srcID .tgtID
## *  <dbl>  <dbl>  <dbl>
## 1      1    101    102
## 2      1    101    103
## 3      1    101    104
## # ℹ 8,300 more rows

What kind of object is it?

Different parts can be inspected

whole_df[[3]]

Now we can easily convert all ego-networks into igraph objects

graph_list <- as_igraph(whole_df)

We can inspect its structure

As a list, we can inspect each ego network

graph_list[[203]]
## IGRAPH e51eb76 UN-- 7 10 -- 
## + attr: .egoID (g/n), name (v/c), freq.rec (v/c), week.daily (v/c),
## | alter.sex (v/c), alter.cores (v/c), role (v/c), role.rec (v/c),
## | intgen (v/c)
## + edges from e51eb76 (vertex names):
##  [1] 20601--20602 20601--20603 20601--20604 20601--20605 20602--20603
##  [6] 20602--20604 20602--20605 20603--20604 20603--20605 20604--20605

Calculating size of ego-networks

whole_df[[1]] %<>% mutate(net_size = map_int(graph_list, vcount))

whole_df[[1]] %>% 
  summarise(
    mean_size = mean(net_size),
    sd_size = sd(net_size)
  )

6.1 Clustering

  • clustering is one of the most important processes in network evolution
  • the outcome of clustering is the formation of cohesive subgroups

How many clusters? Discuss

net1_plot

Cliques (considers undirected networks with reciprocated ties only, otherwise a clique should be maximal reciprocation)

It doesn’t make much sense. At least 4

cliques(net1, min = 3)
## [[1]]
## + 3/11 vertices, named, from e0b5591:
## [1] 6  8  10
## 
## [[2]]
## + 3/11 vertices, named, from e0b5591:
## [1] 8  9  10
## 
## [[3]]
## + 3/11 vertices, named, from e0b5591:
## [1] 6  7  10
## 
## [[4]]
## + 3/11 vertices, named, from e0b5591:
## [1] 6  7  11
## 
## [[5]]
## + 3/11 vertices, named, from e0b5591:
## [1] 7  9  10
## 
## [[6]]
## + 3/11 vertices, named, from e0b5591:
## [1] 3  5  11
cliques(collab_net, min = 6)
## [[1]]
## + 6/29 vertices, from c303b09:
## [1]  3 10 19 20 23 27
## 
## [[2]]
## + 6/29 vertices, from c303b09:
## [1]  6 10 13 15 20 23
## 
## [[3]]
## + 6/29 vertices, from c303b09:
## [1]  3  6  9 19 20 23
## 
## [[4]]
## + 6/29 vertices, from c303b09:
## [1]  3  6 10 19 20 23
## 
## [[5]]
## + 6/29 vertices, from c303b09:
## [1]  6 10 13 15 16 20
  • Another way is to partition the network into factions, a predetermined number of groups
  • It’s more useful to avoid a priori definitions
  • Bottom-up classification of social/political reality
    • categories are concepts that we analysts (or lay people) apply to single objects by deduction
    • however, they emerge from induction-based inferences by analogy (abstract away from idiosyncratic elements) (Harrison C. White)
    • this is what community detection algorithms do: partitioning a graph into dense network regions loosely connected to each other
  • Several algorithms can do that: some of them aim to optimize the partition modularity, Q (i.e., the number of within-group edges compared to a random distribution of the groups)
  • The most popular one is the Girvan-Newman algorithm (Girvan & Newman, 2002), based on edge betweenness, i.e. whether an edge falls along a pair’s geodesics
    1. set the max number of cohesive subsets required, k
    2. find the edge with highest edge betweenness
    3. delete it and count the number of components
    4. if the number of components > k, stop, otherwise back to 2.
  • For directed networks, considers undirected ties (reciprocated ties)
cluster_edge_betweenness(collab_net, directed = FALSE)
## IGRAPH clustering edge betweenness, groups: 16, mod: 0.058
## + groups:
##   $`1`
##   [1]  1 12
##   
##   $`2`
##   [1] 2
##   
##   $`3`
##    [1]  3  6  9 10 13 15 16 19 20 23 27
##   
##   $`4`
##   + ... omitted several groups/vertices

I can retrieve the number of clusters, the node vector in each single cluster as a list item, the modularity

Fast & Greedy: doesn’t search the network so thoroughly, it partitions the network then re-shuffles until it reaches high modularity

cluster_fast_greedy(collab_net)
## IGRAPH clustering fast greedy, groups: 7, mod: 0.3
## + groups:
##   $`1`
##   [1]  2  4  7  9 14 21
##   
##   $`2`
##    [1]  3  8 10 18 19 24 25 26 27 28
##   
##   $`3`
##   [1]  6 13 15 16 17 20 22 23
##   
##   $`4`
##   + ... omitted several groups/vertices

Directed networks: based on random walks

cluster_infomap(support_net)
## IGRAPH clustering infomap, groups: 7, mod: 0.47
## + groups:
##   $`1`
##   [1]  1  7 21 29
##   
##   $`2`
##   [1]  2  4  8  9 14 28
##   
##   $`3`
##   [1]  3 18 19 25 26 27
##   
##   $`4`
##   + ... omitted several groups/vertices

For larger undirected networks, consider Louvain or Leiden

cluster_louvain(collab_net)
## IGRAPH clustering multi level, groups: 7, mod: 0.3
## + groups:
##   $`1`
##   [1]  1 12
##   
##   $`2`
##   [1]  2  4  7  9 14
##   
##   $`3`
##    [1]  3  8 10 18 19 21 24 25 26 27 28
##   
##   $`4`
##   + ... omitted several groups/vertices

plot

collab_tblgr %>% 
  mutate(comp = group_components()) %>% 
  filter(comp == 1) %>% 
  mutate(subgroups = group_fast_greedy()) %>% 
  ggraph(layout = "fr") +
  geom_edge_fan(
    width = .5,
    color = "grey30"
  ) +
  geom_node_point(
    aes(color = factor(subgroups)),
    size = 5
  )

Calculating number of cohesive subgroups in each ego-network through Girvan-Newman algorithm + integrating it into the egor object

whole_df[[1]] %<>%
  mutate(
    clusters = graph_list %>% 
      map_int(\(egonet) cluster_edge_betweenness(egonet) %>% length)
  )

count(whole_df[[1]], clusters)

Testing the hypothetical association between number of cohesive sub-groups and cognitive abilities on the Sociable dataset

whole_df[[1]] %>%
  ggplot(aes(mmse)) +
#  scale_fill_grey() +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(xintercept = median(whole_df[[1]]$mmse, na.rm = T), color = "black", linetype = "dashed") +
  theme_bw() +
  labs(x = "Cognitive functioning (MMSE)", y = "Frequency")
## Warning: Removed 3 rows containing non-finite values (`stat_bin()`).

whole_df[[1]] %>%
  ggplot(aes(clusters)) +
  geom_bar() + 
  geom_vline(xintercept = median(whole_df[[1]]$clusters, na.rm = TRUE), color = "red", linetype = "dashed") +
  scale_x_continuous(breaks = 0:max(whole_df[[1]]$clusters)) +
  labs(y = "")

Association of MMSE with number of cohesive subgroups

whole_df[[1]] %>% 
  ggplot(aes(clusters, mmse)) +
  geom_point() +
  geom_smooth() +
  geom_jitter() +
  scale_x_continuous(breaks = 0:16) +
  geom_hline(yintercept = 27, color = "red") +
  geom_hline(yintercept = median(whole_df[[1]]$mmse, na.rm = T), linetype = "dashed") +
  geom_vline(xintercept = median(whole_df[[1]]$clusters), linetype = "dashed") +
  theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).

Final model adjusting for depression

m <- mmse ~ clusters + net_size + age + educ + gender
ols <- lm(m, data = whole_df[[1]])
sjPlot::tab_model(ols, vcov.fun = "HC", show.se = TRUE)
## Registered S3 methods overwritten by 'lme4':
##   method               from
##   simulate.formula     ergm
##   simulate.formula_lhs ergm
  mmse
Predictors Estimates std. Error CI p
(Intercept) 37.01 8.02 21.21 – 52.80 <0.001
clusters 0.64 0.20 0.24 – 1.04 0.002
net size 0.11 0.07 -0.03 – 0.25 0.116
age -0.18 0.09 -0.36 – -0.01 0.042
educ [middle] -2.40 1.34 -5.05 – 0.25 0.075
educ [primary] -1.83 1.09 -3.98 – 0.32 0.094
educ [uni or higher] 1.08 1.19 -1.25 – 3.42 0.361
gender [Male] 0.02 0.86 -1.68 – 1.71 0.985
Observations 226
R2 / R2 adjusted 0.145 / 0.117

Visualize estimates

p <- jtools::plot_summs(
  ols,
  coefs = c(
    "cohesive subgroups" = "clusters",
    "network size" = "net_size"
    # ,
    # "age" = "age",
    # "gender: male" = "gendermale",
    # "education: lower secondary" = "education2",
    # "education: higher secondary" = "education3",
    # "education: tertiary" = "education4",
    # "assisted living facility" = "rsa1",
    # "GDS > 1" = "gds.2yes"
  ),
  colors = "black",
  scale = F,
  robust = "HC"
  # ,
  # inner_ci_level = .9
  # ,
  # colors = "rainbow"
)
## Loading required namespace: broom.mixed
p + 
  theme_bw() +
  labs(x = "", y = "")

7 Network processes and statistical inference

7.1 Reciprocity

We will use a suite of packages including sna, network, and ergm which we’ll use the make statistical inference. We load it here and not at the beginning of the notebook because it can rise conflicts with some igraph functions that we have used so far.

Creating a network object, which is needed by functions included in the sna and ergm packages

(
sup_net <- 
  network(
    x = as_adjacency_matrix(support_net), 
    node_attributes
  )
)
##  Network attributes:
##   vertices = 29 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 99 
##     missing edges= 0 
##     non-missing edges= 99 
## 
##  Vertex attribute names: 
##     ...1 age age3 children education family gender id satisfaction seniority seniority_rec soc_capital vertex.names 
## 
## No edge attributes

Inspecting attributes in network

sup_net %v% "age"
##  [1] 40 41 25 28 28 31 39 28 31 28 33 52 31 27 31 31 32 31 34 40 28 31 24 27 31
## [26] 24 34 27 36

Let’s calculate the number of complete dyads

mutuality(sup_net) # number of complete dyads (pairs with reciprocal ties)
## Mut 
##  25
sna::dyad.census(sup_net) # 812 dyads, 406 pairs of nodes (29 * 28 / 2), 99 dyads with at least one tie, of which 25 are symmetrical, which means 50 ties are in symmetric dyads + 49 which are not in symmetric dyads = 99 dyads with a tie
##      Mut Asym Null
## [1,]  25   49  332

What should we compare it to?

  • fix number of nodes
  • assume a data-generating process

Let’s compare it with a random graph with 29 nodes. We need to specify a probability for ties to occur. let’s pick random chance

(
rnd_net <- 
  rgraph(
    n = 29, 
    m = 1, 
    tprob = .5
  )
)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    0    0    0    1    1    0    0    1    1     0     0     1     0
##  [2,]    1    0    0    0    1    0    0    0    0     1     0     1     0
##  [3,]    0    0    0    0    1    0    1    0    0     0     1     0     0
##  [4,]    1    1    1    0    1    0    0    0    0     1     0     1     1
##  [5,]    1    1    1    1    0    1    0    0    0     1     0     0     1
##  [6,]    1    0    0    1    0    0    0    1    1     0     1     1     1
##  [7,]    1    0    0    1    0    1    0    1    1     1     0     1     1
##  [8,]    0    1    0    1    0    0    1    0    0     0     0     1     0
##  [9,]    1    0    1    1    0    1    0    1    0     0     0     0     0
## [10,]    0    0    0    0    0    1    1    1    1     0     1     0     0
## [11,]    0    0    1    1    1    0    1    0    1     0     0     0     1
## [12,]    0    0    1    0    1    1    1    1    1     1     1     0     1
## [13,]    0    0    0    0    1    1    1    0    1     0     1     0     0
## [14,]    0    1    0    0    1    1    0    1    0     1     0     0     1
## [15,]    1    0    1    1    1    0    0    1    1     0     1     1     1
## [16,]    0    1    0    0    1    1    1    1    0     1     1     0     1
## [17,]    0    1    0    1    0    1    0    0    0     0     0     1     1
## [18,]    1    1    1    1    1    1    0    0    0     0     0     1     1
## [19,]    0    1    1    0    0    1    1    1    1     1     1     0     1
## [20,]    1    1    0    0    0    1    1    1    1     1     1     0     0
## [21,]    0    0    1    1    0    0    0    1    1     0     0     1     0
## [22,]    1    1    0    1    0    0    1    1    0     0     0     1     1
## [23,]    0    1    0    0    0    0    1    0    0     1     1     0     1
## [24,]    1    0    1    1    0    0    1    0    1     1     1     1     1
## [25,]    0    0    0    1    0    0    0    1    1     1     0     0     1
## [26,]    1    1    0    1    1    0    0    0    0     0     1     0     1
## [27,]    1    1    0    1    0    1    1    1    0     0     1     1     1
## [28,]    0    0    1    0    1    1    0    0    0     1     0     0     0
## [29,]    1    0    1    0    0    0    1    0    1     0     1     1     1
##       [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
##  [1,]     0     0     0     0     1     1     1     1     1     0     1     1
##  [2,]     0     1     0     0     0     1     0     1     1     0     0     0
##  [3,]     1     0     0     0     0     0     0     1     1     1     1     1
##  [4,]     1     1     0     0     0     0     1     0     1     0     0     1
##  [5,]     0     0     1     1     1     1     0     0     0     0     0     0
##  [6,]     0     0     0     1     0     1     0     1     0     0     0     0
##  [7,]     0     1     0     1     1     0     0     0     0     1     1     1
##  [8,]     0     1     1     0     1     1     0     1     1     1     0     1
##  [9,]     1     1     1     0     1     0     0     0     1     0     1     0
## [10,]     1     1     0     0     0     0     0     0     1     1     0     0
## [11,]     0     1     0     0     1     0     1     0     0     0     0     0
## [12,]     1     1     1     1     0     1     1     1     0     0     1     1
## [13,]     0     1     1     1     0     0     1     1     0     0     0     0
## [14,]     0     1     0     1     1     1     0     1     0     0     1     0
## [15,]     1     0     1     0     1     1     1     0     0     0     0     0
## [16,]     1     0     0     0     0     1     1     1     1     0     0     1
## [17,]     0     0     1     0     1     0     0     1     1     0     0     0
## [18,]     0     0     0     0     0     0     1     1     1     1     1     0
## [19,]     0     1     1     1     0     0     0     0     1     1     1     0
## [20,]     0     1     0     1     1     1     0     0     0     0     1     1
## [21,]     1     0     1     0     1     1     1     0     1     0     1     1
## [22,]     1     1     0     1     0     0     1     1     0     1     1     0
## [23,]     0     0     1     1     0     1     0     1     1     0     1     0
## [24,]     0     0     0     0     1     1     1     1     0     1     0     0
## [25,]     0     0     1     1     1     1     1     1     1     0     0     0
## [26,]     1     1     1     0     0     1     1     0     0     0     1     0
## [27,]     0     0     0     0     1     0     0     0     1     1     0     1
## [28,]     1     0     0     0     1     0     0     1     1     0     1     1
## [29,]     1     0     0     1     0     1     1     0     1     0     0     1
##       [,26] [,27] [,28] [,29]
##  [1,]     0     0     1     0
##  [2,]     1     0     0     0
##  [3,]     0     0     1     0
##  [4,]     1     1     1     1
##  [5,]     0     1     1     0
##  [6,]     0     0     0     1
##  [7,]     0     1     1     0
##  [8,]     1     0     0     1
##  [9,]     1     0     0     1
## [10,]     1     1     0     0
## [11,]     1     0     1     1
## [12,]     0     1     0     1
## [13,]     1     0     1     1
## [14,]     0     1     1     1
## [15,]     1     0     0     0
## [16,]     1     0     0     1
## [17,]     1     1     0     1
## [18,]     1     0     0     0
## [19,]     1     0     0     1
## [20,]     1     0     0     1
## [21,]     0     1     0     0
## [22,]     1     0     0     0
## [23,]     1     1     1     0
## [24,]     0     1     1     0
## [25,]     0     1     0     0
## [26,]     0     1     0     1
## [27,]     0     0     1     1
## [28,]     0     0     0     1
## [29,]     1     1     1     0

The number of reciprocal dyads in our empirical network (25) is clearly less than what expected by random chance (92)

sna::dyad.census(rnd_net)
##      Mut Asym Null
## [1,] 101  202  103

Problem: 0.5 is clearly unrealistic, so let’s take the density of our empirical network as the baseline probability of ties to occurr

Let’s simulate a whole distribution of random graphs conditioned to n = 29 and probability = .12

rnd_net <- 
  rgraph(
    n = 29, 
    m = 1000, 
    tprob = gden(sup_net)
  )

Our empirically observed statistic doesn’t even lie within the stochastic distribution!

hist(mutuality(rnd_net), xlim = c(0, 30))
abline(v = mean(mutuality(rnd_net)), col="red")
abline(v = mutuality(sup_net), col="blue")

What did we do?

  • We tested our scientific hypothesis through a binary statistical hypothesis test (reciprocity > than chance?)

  • We tested it against a random graph model with the same probability

  • This is called uniform conditional model: instead of just generating random networks (U|L, unrealistic), we generate a uniform distribution of random graphs conditioned on the number of edges

  • We conditioned the probability of a tie to a value smaller than random chance

  • Two limitations:

    1. we need to assume a more complex form of stochastic dependency: it is not realistic to assume that \(P(x_{ij})\) is independent of \(P(x_{ji})\)?
    • (if \(P(x_{ij})\) were independent of \(P(x_{ji})\), then \(P(x_{ij}) \cap P(x_{ji}) = P(x_{ij}) * P(x_{ji})\))

    • (if not, \(P(x_{ij}) \cap P(x_{ji}) = P(x_{ij} | x_{ji}) * P(x_{ji})\))

    1. just a binary hypothesis test, no opportunity for multivariate analysis

8 Exponential Random Graph Modelling (ERGM)

(See slides and reading material)

We’ll set a seed to allow for reproducibility

set.seed(123)

Let’s start by specifying an ERGM of the support network simply as a function of the likelihood of edges to occur (similar to the role of the intercept in a regression model)

m0 <- sup_net ~ edges

All terms that we can specify an ERGM with can be inspected by calling ergm.terms.

summary(m0)
## edges 
##    99

Then it computes standard errors as uncertainty measures

fit0 <- ergm(m0)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Stopping at the initial estimate.
## Evaluating log-likelihood at the estimate.

Since our model is very simple (no stochastic dependencies), maximum pseudo-likelihood estimation is enough (similar to a logistic regression).

Let’s inspect the fitted model:

summary(fit0)
## Call:
## ergm(formula = m0)
## 
## Maximum Likelihood Results:
## 
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -1.9744     0.1073      0  -18.41   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  602.1  on 811  degrees of freedom
##  
## AIC: 604.1  BIC: 608.8  (Smaller is better. MC Std. Err. = 0)

Estimated parameters can be inspected as follows:

coef(fit0)
##     edges 
## -1.974362

The log-odds (natural logarithm of the ratio between a probability and its complement) of a tie to be present in the network (according to our model) is -1.97 * the change-statistic (\(\delta\)) in the number of ties, which is equal to 1 in this case, as every tie addition increases the statistic of the edges term by 1

coef(fit0) * 1
##     edges 
## -1.974362

To compute \(P(x_{ij})\), we calculate the inverse of the logit

exp(coef(fit0)) / (1 + exp(coef(fit0))) * 1
##     edges 
## 0.1219212

Same as the density! In fact, the model we estimated is equivalent to the simple U|L model shown above.

We do this by specifying the model with a further term which calculates the number of complete dyads in the data

m1 <- update.formula(m0, . ~ . + mutual)
summary(m1)
##  edges mutual 
##     99     25

Fitting the model to the data

fit1 <- ergm(m1)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0233.
## Convergence test p-value: 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

MPLE not enough now, because of dependency of observations: we need to simulate a stochastic network formation process via Markov Chain Monte Carlo to be able to compute Maximum Likelihood estimates (see slides and reading)

summary(fit1)
## Call:
## ergm(formula = m1)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges   -2.6076     0.1542      0 -16.913   <1e-04 ***
## mutual   2.6480     0.3425      0   7.731   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  548.1  on 810  degrees of freedom
##  
## AIC: 552.1  BIC: 561.5  (Smaller is better. MC Std. Err. = 0.7841)
m2 <- update.formula(m1, . ~ . + gwesp(decay = .5, fixed = T))
m3 <- update.formula(m2, . ~ . + twopath)
fit3 <- ergm(m3)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.9883.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0520.
## Convergence test p-value: 0.4609. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0133.
## Convergence test p-value: 0.0632. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0083.
## Convergence test p-value: 0.2497. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0006.
## Convergence test p-value: 0.0004. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(fit3)
## Call:
## ergm(formula = m3)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                 Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges           -2.25234    0.35384      0  -6.365   <1e-04 ***
## mutual           1.98174    0.35145      0   5.639   <1e-04 ***
## gwesp.fixed.0.5  1.07622    0.16372      0   6.574   <1e-04 ***
## twopath         -0.24153    0.05689      0  -4.245   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  493.5  on 808  degrees of freedom
##  
## AIC: 501.5  BIC: 520.3  (Smaller is better. MC Std. Err. = 0.8871)

We still have strong evidence of reciprocity, even by controlling by transitive closure.

We might want to control for the effect of age, as a social selection process based on node attributes

m4 <- update.formula(m3, . ~ . + nodeicov("age") + nodeocov("age") + nodematch("age"))
fit4 <- ergm(m4)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.2442.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0580.
## Convergence test p-value: 0.5357. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0314.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(fit4)
## Call:
## ergm(formula = m4)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                 Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges           -3.00414    0.79149      0  -3.796 0.000147 ***
## mutual           2.08797    0.35956      0   5.807  < 1e-04 ***
## gwesp.fixed.0.5  1.03909    0.16469      0   6.309  < 1e-04 ***
## twopath         -0.23517    0.05704      0  -4.123  < 1e-04 ***
## nodeicov.age     0.03261    0.01615      0   2.020 0.043424 *  
## nodeocov.age    -0.01266    0.01890      0  -0.670 0.502804    
## nodematch.age    0.52107    0.20087      0   2.594 0.009485 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1125.7  on 812  degrees of freedom
##  Residual Deviance:  484.3  on 805  degrees of freedom
##  
## AIC: 498.3  BIC: 531.2  (Smaller is better. MC Std. Err. = 0.5778)

Still strong evidence of reciprocity, even controlling by the confounding effects of transitive closure and age-based selection.