# Final Exam (R)

# Instructions
- This is an open internet exam. You can use any materials you like, but you are not allowed to communicate with other people during the exam.
- The cell below will load the network data you will use for the exam. You can run the cell to generate the data, but do not modify it.
- If you are using magic commands, you need to run `%load_ext rpy2.ipython` first, and then add `%%R` at the top of each R code cell.

In [16]:
# Load the rpy2 extension
try:
  %load_ext rpy2.ipython
except:
  %pip install rpy2

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [3]:
%%R 
options(repos="https://cloud.r-project.org")
packages <- c("igraph", "ergm", "intergraph", "network", "latentnet")
install.packages(setdiff(packages, rownames(installed.packages())))  

library(igraph)
library(intergraph)
library(ergm)
library(network)
library(latentnet)


Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Loading required package: network

'network' 1.19.0 (2024-12-08), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information


Attaching package: 'network'

The following objects are masked from 'package:igraph':

    %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
    get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
    is.directed, list.edge.attributes, list.vertex.attributes,
    set.edge.attribute, set.vertex.attribute


'ergm' 4.10.1 (2025-08-26), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other informa

In [4]:
%%R
# Load data and build a 'network' object with vertex attributes
library(network)
library(ergm)
library(latentnet)

In [5]:
%%R
# Read CSV files (relative to this notebook/script)
nodes_df <- read.csv("C:\\Users\\Li Yuxin\\Downloads\\graph_nodes.csv", stringsAsFactors = FALSE)
edges_df <- read.csv("C:\\Users\\Li Yuxin\\Downloads\\graph_edges.csv", stringsAsFactors = FALSE)

# Initialize a directed network with all nodes (including nodes with no edges)
n <- nrow(nodes_df)
G <- network.initialize(n, directed = TRUE, loops = FALSE, multiple = FALSE)

# Set vertex names to the node IDs (so labels match your IDs)
set.vertex.attribute(G, "vertex.names", nodes_df$ID)

# Attach all node attributes from the nodes data frame
for (col in setdiff(names(nodes_df), "ID")) {
  set.vertex.attribute(G, col, nodes_df[[col]])
}

# Map edge endpoints (IDs) to vertex indices and add edges
tail_idx <- match(edges_df$ID1, nodes_df$ID)
head_idx <- match(edges_df$ID2, nodes_df$ID)
add.edges(G, tail_idx, head_idx)

# Minimal confirmation output
cat(network.size(G), " nodes, ", network.edgecount(G), " edges\n")

# Print first few nodes and their attributes
for (i in 1:5) {
  cat("Node", get.vertex.attribute(G, "vertex.names")[i], "details: ")
  for (attr in names(nodes_df)[-1]) {
    v_attr <- get.vertex.attribute(G, attr)[i]
    cat(attr, "=", v_attr, ", ", sep="")
  }
  cat("\n")
}

# You should be seeing the following output:
# 120 nodes, 3583 edges
# Node 1 details: gender=female, class=4, age=17, GPA=2.5, 
# Node 2 details: gender=female, class=6, age=17, GPA=3.31, 
# Node 3 details: gender=female, class=2, age=17, GPA=2.59, 
# Node 4 details: gender=male, class=1, age=17, GPA=3.85, 
# Node 5 details: gender=female, class=1, age=16, GPA=3.45, 

120  nodes,  3583  edges
Node 1 details: gender=female, class=4, age=17, GPA=2.5, 
Node 2 details: gender=female, class=6, age=17, GPA=3.31, 
Node 3 details: gender=female, class=2, age=17, GPA=2.59, 
Node 4 details: gender=male, class=1, age=17, GPA=3.85, 
Node 5 details: gender=female, class=1, age=16, GPA=3.45, 


**7. Fit an Exponential Random Graph Model (ERGM) to the directed network with the term 'edges' and with the term 'nodematch' for the gender attribute. Which of the following statements is/are true?**
 
a. The probability of an edge existing between two nodes of the same gender is less than 0.30.

b. The coefficient for the gender nodematch is positive and statistically significant (p < 0.05).

c. The model indicates that connections between individuals of the same gender are more likely to occur than between individuals of different genders.

d. The probability of an edge existing between two nodes of different gender is less than 0.15.

In [8]:
%%R
# You can add your code to fit the model here

model_ergm <- ergm(G ~ edges + nodematch("gender"))

Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
1: In ergm(G ~ edges + nodematch("gender")) :
  strings not representable in native encoding will be translated to UTF-8
2: In ergm(G ~ edges + nodematch("gender")) : input string '
<f5><dc>
' cannot be translated to UTF-8, is it valid in 'UTF-8'?
3: In ergm(G ~ edges + nodematch("gender")) : input string '
<f5><dc>
' cannot be translated to UTF-8, is it valid in 'UTF-8'?
4: In ergm(G ~ edges + nodematch("gender")) : input string '
<f5><dc>
' cannot be translated to UTF-8, is it valid in 'UTF-8'?
5: In ergm(G ~ edges + nodematch("gender")) : input string '
<f5><dc>
' cannot be translated to UTF-8, is it valid in 'UTF-8'?


In [9]:
%%R
# You can add your code to analyse the results here

#############################################
# 2. Extract coefficients and p-values
#############################################

coef_table <- summary(model_ergm)$coefficients

edges_coef      <- coef_table["edges", "Estimate"]
gender_coef     <- coef_table["nodematch.gender", "Estimate"]
gender_p_value  <- coef_table["nodematch.gender", "Pr(>|z|)"]

#############################################
# 3. Probabilities under the ERGM
#    logit P(edge_ij = 1 | rest) =
#        edges_coef + gender_coef * I(gender_i == gender_j)
#############################################

# Probability of an edge between two nodes of the SAME gender:
logit_same_gender <- edges_coef + gender_coef
p_same_gender     <- plogis(logit_same_gender)  # logistic transform

# Probability of an edge between two nodes of DIFFERENT gender:
logit_diff_gender <- edges_coef
p_diff_gender     <- plogis(logit_diff_gender)

#############################################
# 4. Logical checks / interpretations
#############################################

# Is the gender nodematch coefficient positive and significant (p < 0.05)?
gender_positive_and_sig <- (gender_coef > 0) && (gender_p_value < 0.05)

# Does the model indicate same-gender edges are more likely than different-gender edges?
same_more_likely <- p_same_gender > p_diff_gender

#############################################
# 5. Print results
#############################################

cat("Coefficient for nodematch(gender):", gender_coef, "\n")
cat("p-value for nodematch(gender):", gender_p_value, "\n\n")

cat("Probability(edge | same gender):    ", p_same_gender, "\n")
cat("Probability(edge | different gender):", p_diff_gender, "\n\n")

cat("Gender nodematch coefficient is positive and significant (p < 0.05)? ",
    gender_positive_and_sig, "\n")

cat("Model indicates connections between same-gender individuals are more likely than different-gender? ",
    same_more_likely, "\n")


Coefficient for nodematch(gender): 0.8322372 
p-value for nodematch(gender): 5.173612e-96 

Probability(edge | same gender):     0.3279661 
Probability(edge | different gender): 0.1751389 

Gender nodematch coefficient is positive and significant (p < 0.05)?  TRUE 
Model indicates connections between same-gender individuals are more likely than different-gender?  TRUE 


**8. Fit an ERGM to the directed network with the term 'edges', a nodematch term for gender, and a nodematch term for the class. Which of the following statements is/are true?**

a. There is significant gender homophily in the network.

b. The homophily based on class is stronger than that based on gender.

c. Based on the BIC, the model with the class nodematch term is a better model than the model without it.

d. The probability of two students of different gender from the same class having an edge is greater than 0.60.

In [11]:
%%R
# You can add your code to fit the model here

library(ergm)
library(network)

#############################################
# 1. Fit ERGM with edges + nodematch(gender)
#############################################

fit_gender <- ergm(G ~ edges + nodematch("gender"))

#############################################
# 2. Fit ERGM with edges + nodematch(gender) + nodematch(class)
#############################################

fit_gender_class <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))

s_gender       <- summary(fit_gender)
s_gender_class <- summary(fit_gender_class)

#############################################
# 3. Extract coefficients and p-values
#############################################

coef_table2 <- s_gender_class$coefficients

edges_coef     <- coef_table2["edges", "Estimate"]
gender_coef    <- coef_table2["nodematch.gender", "Estimate"]
gender_p_value <- coef_table2["nodematch.gender", "Pr(>|z|)"]
class_coef     <- coef_table2["nodematch.class", "Estimate"]
class_p_value  <- coef_table2["nodematch.class", "Pr(>|z|)"]

#############################################
# 4. Significant gender homophily?
#############################################

gender_homophily_sig <- (gender_coef > 0) && (gender_p_value < 0.05)

#############################################
# 5. Is class homophily stronger than gender homophily?
#############################################

class_homophily_stronger <- (class_coef > 0) && (class_coef > gender_coef)

#############################################
# 6. BIC comparison (FIXED)
#############################################

# summary.ergm() returns a scalar 'bic', not a list
bic_gender       <- s_gender$bic
bic_gender_class <- s_gender_class$bic

# Lower BIC => better model
model_with_class_better <- (bic_gender_class < bic_gender)

#############################################
# 7. Probability of edge:
#    logit P(edge_ij = 1 | attrs) =
#        edges_coef
#        + gender_coef * I(gender_i == gender_j)
#        + class_coef  * I(class_i == class_j)
#
# For different gender, same class:
#    I(gender_i == gender_j) = 0
#    I(class_i == class_j)  = 1
#############################################

logit_diff_gender_same_class <- edges_coef + class_coef
p_diff_gender_same_class     <- plogis(logit_diff_gender_same_class)

#############################################
# 8. Print results
#############################################

cat("=== Coefficients (gender + class model) ===\n")
cat("edges coefficient:          ", edges_coef, "\n")
cat("nodematch(gender) coeff:    ", gender_coef, " (p =", gender_p_value, ")\n")
cat("nodematch(class) coeff:     ", class_coef,  " (p =", class_p_value,  ")\n\n")

cat("Significant gender homophily (positive & p < 0.05)? ",
    gender_homophily_sig, "\n")

cat("Is class-based homophily stronger than gender-based homophily? ",
    class_homophily_stronger, "\n\n")

cat("BIC (edges + gender):             ", bic_gender, "\n")
cat("BIC (edges + gender + class):     ", bic_gender_class, "\n")
cat("Model with class nodematch better by BIC? ",
    model_with_class_better, "\n\n")

cat("P(edge | different gender, SAME class): ",
    p_diff_gender_same_class, "\n")


=== Coefficients (gender + class model) ===
edges coefficient:           -2.338471 
nodematch(gender) coeff:     1.132209  (p = 8.899517e-124 )
nodematch(class) coeff:      2.743148  (p = 0 )

Significant gender homophily (positive & p < 0.05)?  TRUE 
Is class-based homophily stronger than gender-based homophily?  TRUE 

BIC (edges + gender):              15659.34 
BIC (edges + gender + class):      12710.19 
Model with class nodematch better by BIC?  TRUE 

P(edge | different gender, SAME class):  0.599811 


Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 


In [None]:
# You can add your code to analyse the results here

**9. Fit an ERGM to the directed network with the term 'edges', a nodematch term for gender, and a nodematch term for the class. Moroever, add pairwise-covariate terms that equal the difference in the GPA and difference age between two nodes. Which of the following statements is/are true?**

a. The difference in GPA between two students significantly decreases the probabilty of an edge forming

b. The difference in age between two students significantly decreases the probabilty of an edge forming

c. Based on the BIC, the model with the two pairwise covariates included is better than without them.

d. Assumung two students are are from the same class, same gender, same age, and have the same GPA, the probabilty of connecting in the full model is greater than 0.80.

In [12]:
%%R
# You can add your code to fit the model here

############################################################
# Setup
############################################################

# install.packages("ergm")
# install.packages("network")
library(ergm)
library(network)

# G: directed network with vertex attributes:
#  - "gender" (factor or character)
#  - "class"  (factor or character)
#  - "GPA"    (numeric)
#  - "age"    (numeric)

############################################################
# 1. Base model: edges + nodematch(gender) + nodematch(class)
############################################################

fit_base <- ergm(
  G ~ edges +
    nodematch("gender") +
    nodematch("class")
)

summary_base <- summary(fit_base)

############################################################
# 2. Full model: add pairwise covariates (abs differences)
############################################################

fit_full <- ergm(
  G ~ edges +
    nodematch("gender") +
    nodematch("class") +
    absdiff("GPA") +
    absdiff("age")
)

summary_full <- summary(fit_full)

coef_table <- summary_full$coefficients

# For safety, find the correct row names via grep
edges_name      <- "edges"
gender_name     <- "nodematch.gender"
class_name      <- "nodematch.class"
gpa_absdiff_row <- grep("absdiff.GPA", rownames(coef_table), value = TRUE)
age_absdiff_row <- grep("absdiff.age", rownames(coef_table), value = TRUE)

edges_coef       <- coef_table[edges_name,      "Estimate"]
gender_coef      <- coef_table[gender_name,     "Estimate"]
gender_p_value   <- coef_table[gender_name,     "Pr(>|z|)"]
class_coef       <- coef_table[class_name,      "Estimate"]
class_p_value    <- coef_table[class_name,      "Pr(>|z|)"]
gpa_diff_coef    <- coef_table[gpa_absdiff_row, "Estimate"]
gpa_diff_p_value <- coef_table[gpa_absdiff_row, "Pr(>|z|)"]
age_diff_coef    <- coef_table[age_absdiff_row, "Estimate"]
age_diff_p_value <- coef_table[age_absdiff_row, "Pr(>|z|)"]

############################################################
# 3. Does GPA difference significantly DECREASE edge probability?
############################################################

# In an ERGM, a negative coefficient on absdiff(GPA) means:
# larger GPA differences -> lower log-odds of an edge.
gpa_diff_decreases_prob <- (gpa_diff_coef < 0) && (gpa_diff_p_value < 0.05)

############################################################
# 4. Does age difference significantly DECREASE edge probability?
############################################################

age_diff_decreases_prob <- (age_diff_coef < 0) && (age_diff_p_value < 0.05)

############################################################
# 5. BIC: is the full model (with absdiff terms) better?
############################################################

bic_base <- summary_base$bic  # scalar
bic_full <- summary_full$bic  # scalar

# Lower BIC -> better
full_model_better_by_BIC <- (bic_full < bic_base)

############################################################
# 6. Probability of an edge in the full model
#    when:
#      - same class
#      - same gender
#      - same age
#      - same GPA
#
# Model (full):
#   logit P(edge_ij = 1 | x) =
#       edges_coef
#       + gender_coef    * I(gender_i == gender_j)
#       + class_coef     * I(class_i == class_j)
#       + gpa_diff_coef  * |GPA_i - GPA_j|
#       + age_diff_coef  * |age_i - age_j|
#
# Under the given assumptions:
#   I(gender_i == gender_j) = 1
#   I(class_i == class_j)   = 1
#   |GPA_i - GPA_j|         = 0
#   |age_i - age_j|         = 0
#
# => logit = edges_coef + gender_coef + class_coef
############################################################

logit_same_all <- edges_coef + gender_coef + class_coef
p_same_all     <- plogis(logit_same_all)

############################################################
# 7. Print results
############################################################

cat("=== Full model coefficients ===\n")
cat("edges:             ", edges_coef,      "\n")
cat("nodematch(gender): ", gender_coef,     " (p =", gender_p_value, ")\n")
cat("nodematch(class):  ", class_coef,      " (p =", class_p_value,  ")\n")
cat("absdiff(GPA):      ", gpa_diff_coef,   " (p =", gpa_diff_p_value, ")\n")
cat("absdiff(age):      ", age_diff_coef,   " (p =", age_diff_p_value, ")\n\n")

cat("Does GPA difference significantly DECREASE edge probability? ",
    gpa_diff_decreases_prob, "\n")
cat("Does age difference significantly DECREASE edge probability? ",
    age_diff_decreases_prob, "\n\n")

cat("BIC (base: edges + nodematch(gender) + nodematch(class)): ", bic_base, "\n")
cat("BIC (full: + absdiff(GPA) + absdiff(age)):                ", bic_full, "\n")
cat("Is full model better by BIC? ", full_model_better_by_BIC, "\n\n")

cat("P(edge | same class, same gender, same age, same GPA, full model): ",
    p_same_all, "\n")


=== Full model coefficients ===
edges:              -2.219937 
nodematch(gender):  1.129785  (p = 5.380961e-123 )
nodematch(class):   2.743326  (p = 0 )
absdiff(GPA):       -0.0740834  (p = 0.2235931 )
absdiff(age):       -0.08455469  (p = 0.003495692 )

Does GPA difference significantly DECREASE edge probability?  FALSE 
Does age difference significantly DECREASE edge probability?  TRUE 

BIC (base: edges + nodematch(gender) + nodematch(class)):  12710.19 
BIC (full: + absdiff(GPA) + absdiff(age)):                 12719.06 
Is full model better by BIC?  FALSE 

P(edge | same class, same gender, same age, same GPA, full model):  0.8393195 


Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 


In [None]:
# You can add your code to analyse the results here

**10. Fit a Latent Space Model using 'ergmm' from the 'latentnet' package to the directed network with the term 'edges', a nodematch term for gender, a nodematch term for class, and a 2-dimensional latent space with Euclidean distance. Which of the following statements is/are true?**

a. Based on the BIC, the model with latent space provides a better fit than the ERGM without latent space (model from Question 8).

b. The edges term is not statistically significant (p > 0.05) in this model.

c. The latent space model indicates that students from the same class are substantially closer in the latent space than students from different classes (at least 5% smaller mean distance).

d. The latent space model indicates that students of the same gender are substantially closer in the latent space than students of different genders (at least 5% smaller mean distance).


In [15]:
%%R
# You can add your code to fit the model here

library(network)
library(ergm)
library(latentnet)

## Assume:
##   - G is a directed 'network' object
##   - vertex attributes: "gender", "class"

########################################################
## 1. ERGM (no latent space) – for BIC comparison
########################################################

fit_ergm <- ergm(
  G ~ edges + nodematch("gender") + nodematch("class")
)

s_ergm   <- summary(fit_ergm)
bic_ergm <- as.numeric(s_ergm$bic)   # for ERGM this is usually a scalar

########################################################
## 2. Latent space model (2D Euclidean)
########################################################

fit_latent <- ergmm(
  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),
  # keep default control settings or shorten for speed if needed:
  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
)

## IMPORTANT: do NOT request "mle" here (this caused your error).
## Use posterior means only:
s_latent <- summary(fit_latent, point.est = "pmean")

########################################################
## 3. BIC: Is latent-space model better than ERGM?
########################################################
## summary.ergmm() returns a list with component "bic",
## typically with elements c("Y","Z","overall").
bic_latent_overall <- as.numeric(s_latent$bic["overall"])

latent_better_than_ergm <- bic_latent_overall < bic_ergm

cat("BIC(ERGM)             :", bic_ergm, "\n")
cat("BIC(Latent overall)   :", bic_latent_overall, "\n")
cat("Latent-space better?  :", latent_better_than_ergm, "\n\n")

########################################################
## 4. Edges term: “not statistically significant”?
##    Use 95% credible interval for posterior mean.
########################################################

## Inspect structure once (uncomment this to see exact names in your session):
## str(s_latent)

## In current latentnet, the posterior-mean coefficient table for covariates
## is typically in s_latent$coef$pmean (matrix with columns:
##   "Estimate", "2.5%", "97.5%", "Quantile of 0").
coef_pmean <- s_latent$coef$pmean   # adjust if your object names differ

edges_row <- coef_pmean["edges", ]  # row for the edges term

edges_est    <- edges_row["Estimate"]
edges_ci_low <- edges_row["2.5%"]
edges_ci_hi  <- edges_row["97.5%"]

## Treat “not significant” as: 0 inside the 95% credible interval
edges_not_sig <- (edges_ci_low <= 0) && (edges_ci_hi >= 0)

cat("Edges term (posterior mean):\n")
print(edges_row)
cat("Edges term NOT significant (0 in 95% CI)?", edges_not_sig, "\n\n")

########################################################
## 5. Latent distances: same vs different class / gender
########################################################

## Use posterior mean latent positions:
## (from ergmm help: Z.pmean is n x d matrix of positions)
Z <- fit_latent$Z.pmean   # rows = nodes, cols = 2D coordinates

# Extract nodal attributes
g_gender <- get.vertex.attribute(G, "gender")
g_class  <- get.vertex.attribute(G, "class")

# Pairwise Euclidean distances
Dmat <- as.matrix(dist(Z))   # symmetric n x n

# Indicator matrices for same class / same gender
same_class  <- outer(g_class,  g_class,  "==")
same_gender <- outer(g_gender, g_gender, "==")

# Use only upper triangle to avoid double-counting/self-pairs
U <- upper.tri(Dmat)

## Class distances
mean_dist_same_class  <- mean(Dmat[U &  same_class[U]])
mean_dist_diff_class  <- mean(Dmat[U & !same_class[U]])

## Gender distances
mean_dist_same_gender <- mean(Dmat[U &  same_gender[U]])
mean_dist_diff_gender <- mean(Dmat[U & !same_gender[U]])

## “Substantially closer” = at least 5% smaller mean distance
same_class_5pct_closer <- mean_dist_same_class <= 0.95 * mean_dist_diff_class
same_gender_5pct_closer <- mean_dist_same_gender <= 0.95 * mean_dist_diff_gender

cat("Mean distance SAME class   :", mean_dist_same_class,  "\n")
cat("Mean distance DIFF class   :", mean_dist_diff_class,  "\n")
cat("Same class ≥5% closer?     :", same_class_5pct_closer, "\n\n")

cat("Mean distance SAME gender  :", mean_dist_same_gender, "\n")
cat("Mean distance DIFF gender  :", mean_dist_diff_gender, "\n")
cat("Same gender ≥5% closer?    :", same_gender_5pct_closer, "\n")


BIC(ERGM)             : 12710.19 
BIC(Latent overall)   : 9808.597 
Latent-space better?  : TRUE 

Edges term (posterior mean):
NULL
Edges term NOT significant (0 in 95% CI)? NA 

Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
NOTE: It is not certain whether it is appropriate to use latentnet's BIC to select latent space dimension, whether or not to include actor-specific random effects, and to compare clustered models with the unclustered model.
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'dat

RInterpreterError: Failed to parse and evaluate line '# You can add your code to fit the model here\n\nlibrary(network)\nlibrary(ergm)\nlibrary(latentnet)\n\n## Assume:\n##   - G is a directed \'network\' object\n##   - vertex attributes: "gender", "class"\n\n########################################################\n## 1. ERGM (no latent space) – for BIC comparison\n########################################################\n\nfit_ergm <- ergm(\n  G ~ edges + nodematch("gender") + nodematch("class")\n)\n\ns_ergm   <- summary(fit_ergm)\nbic_ergm <- as.numeric(s_ergm$bic)   # for ERGM this is usually a scalar\n\n########################################################\n## 2. Latent space model (2D Euclidean)\n########################################################\n\nfit_latent <- ergmm(\n  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),\n  # keep default control settings or shorten for speed if needed:\n  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)\n)\n\n## IMPORTANT: do NOT request "mle" here (this caused your error).\n## Use posterior means only:\ns_latent <- summary(fit_latent, point.est = "pmean")\n\n########################################################\n## 3. BIC: Is latent-space model better than ERGM?\n########################################################\n## summary.ergmm() returns a list with component "bic",\n## typically with elements c("Y","Z","overall").\nbic_latent_overall <- as.numeric(s_latent$bic["overall"])\n\nlatent_better_than_ergm <- bic_latent_overall < bic_ergm\n\ncat("BIC(ERGM)             :", bic_ergm, "\\n")\ncat("BIC(Latent overall)   :", bic_latent_overall, "\\n")\ncat("Latent-space better?  :", latent_better_than_ergm, "\\n\\n")\n\n########################################################\n## 4. Edges term: “not statistically significant”?\n##    Use 95% credible interval for posterior mean.\n########################################################\n\n## Inspect structure once (uncomment this to see exact names in your session):\n## str(s_latent)\n\n## In current latentnet, the posterior-mean coefficient table for covariates\n## is typically in s_latent$coef$pmean (matrix with columns:\n##   "Estimate", "2.5%", "97.5%", "Quantile of 0").\ncoef_pmean <- s_latent$coef$pmean   # adjust if your object names differ\n\nedges_row <- coef_pmean["edges", ]  # row for the edges term\n\nedges_est    <- edges_row["Estimate"]\nedges_ci_low <- edges_row["2.5%"]\nedges_ci_hi  <- edges_row["97.5%"]\n\n## Treat “not significant” as: 0 inside the 95% credible interval\nedges_not_sig <- (edges_ci_low <= 0) && (edges_ci_hi >= 0)\n\ncat("Edges term (posterior mean):\\n")\nprint(edges_row)\ncat("Edges term NOT significant (0 in 95% CI)?", edges_not_sig, "\\n\\n")\n\n########################################################\n## 5. Latent distances: same vs different class / gender\n########################################################\n\n## Use posterior mean latent positions:\n## (from ergmm help: Z.pmean is n x d matrix of positions)\nZ <- fit_latent$Z.pmean   # rows = nodes, cols = 2D coordinates\n\n# Extract nodal attributes\ng_gender <- get.vertex.attribute(G, "gender")\ng_class  <- get.vertex.attribute(G, "class")\n\n# Pairwise Euclidean distances\nDmat <- as.matrix(dist(Z))   # symmetric n x n\n\n# Indicator matrices for same class / same gender\nsame_class  <- outer(g_class,  g_class,  "==")\nsame_gender <- outer(g_gender, g_gender, "==")\n\n# Use only upper triangle to avoid double-counting/self-pairs\nU <- upper.tri(Dmat)\n\n## Class distances\nmean_dist_same_class  <- mean(Dmat[U &  same_class[U]])\nmean_dist_diff_class  <- mean(Dmat[U & !same_class[U]])\n\n## Gender distances\nmean_dist_same_gender <- mean(Dmat[U &  same_gender[U]])\nmean_dist_diff_gender <- mean(Dmat[U & !same_gender[U]])\n\n## “Substantially closer” = at least 5% smaller mean distance\nsame_class_5pct_closer <- mean_dist_same_class <= 0.95 * mean_dist_diff_class\nsame_gender_5pct_closer <- mean_dist_same_gender <= 0.95 * mean_dist_diff_gender\n\ncat("Mean distance SAME class   :", mean_dist_same_class,  "\\n")\ncat("Mean distance DIFF class   :", mean_dist_diff_class,  "\\n")\ncat("Same class ≥5% closer?     :", same_class_5pct_closer, "\\n\\n")\n\ncat("Mean distance SAME gender  :", mean_dist_same_gender, "\\n")\ncat("Mean distance DIFF gender  :", mean_dist_diff_gender, "\\n")\ncat("Same gender ≥5% closer?    :", same_gender_5pct_closer, "\\n")\n'.
R error message: "Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : \n  'data' must be of a vector type, was 'NULL'"
R stdout:
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
NOTE: It is not certain whether it is appropriate to use latentnet's BIC to select latent space dimension, whether or not to include actor-specific random effects, and to compare clustered models with the unclustered model.
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'
In addition: Warning message:
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.

In [None]:
# You can add your code to analyse the results here

**11. Fit a Latent Space Model using 'ergmm' from the 'latentnet' package to the directed network with the term 'edges', a nodematch term for gender, a nodematch term for class, and a 2-dimensional latent space with Euclidean distance and 3 latent clusters. Which of the following statements is/are true?**

a. Based on the BIC, the model with latent space and clusters provides a better fit than the latent space model without clusters (model from Question 10).

b. Visually, the latent positions show clear clustering.

c. Based on the BIC, this model is the best among all models fitted in Questions 7-11.

d. The estimated latent positions indicate that students of the same class are closer together in the latent space compared to students of different classes (by at least 5% smaller mean distance).

In [17]:
%%R
# You can add your code to fit the model here

library(network)
library(ergm)
library(latentnet)

## G: directed 'network' object with vertex attributes "gender" and "class"

########################################################
## 0. (Optional) refit previous models if missing
########################################################

if (!exists("fit_ergm")) {
  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))
}

if (!exists("fit_latent")) {
  fit_latent <- ergmm(
    G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),
    control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
  )
}

########################################################
## 1. Latent space model with 3 clusters (2D, Euclidean)
########################################################

fit_latent_clustered <- ergmm(
  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2, G = 3),
  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
)

## IMPORTANT: do NOT ask for "mle" (this caused your error).
## Use posterior means only:
s_latent_clustered <- summary(fit_latent_clustered, point.est = "pmean")

########################################################
## 2. BIC comparison: with clusters vs without clusters
########################################################

## ERGM BIC
bic_ergm <- as.numeric(summary(fit_ergm)$bic)

## Latent (no clusters) BIC
bic_latent <- bic.ergmm(fit_latent)
bic_latent_overall <- as.numeric(bic_latent["overall"])

## Latent + 3 clusters BIC
bic_latent_cluster <- bic.ergmm(fit_latent_clustered)
bic_latent_cluster_overall <- as.numeric(bic_latent_cluster["overall"])

## Is latent+clusters better than latent without clusters?
latent_clusters_better <- (bic_latent_cluster_overall < bic_latent_overall)

cat("BIC (ERGM):                   ", bic_ergm, "\n")
cat("BIC (Latent, no clusters):    ", bic_latent_overall, "\n")
cat("BIC (Latent, 3 clusters):     ", bic_latent_cluster_overall, "\n")
cat("Latent+clusters better than latent-no-clusters (BIC)? ",
    latent_clusters_better, "\n\n")

########################################################
## 3. Is this clustered latent model best among ALL now?
##    (ERGM, latent, latent+clusters, plus others if exist)
########################################################

bics_all <- c(
  ERGM          = bic_ergm,
  Latent_noClust = bic_latent_overall,
  Latent_G3      = bic_latent_cluster_overall
)

## If you also fitted full ERGM with absdiffs, include it:
if (exists("fit_full")) {
  bic_full <- as.numeric(summary(fit_full)$bic)
  bics_all <- c(bics_all, ERGM_full = bic_full)
}

best_model_name <- names(bics_all)[which.min(bics_all)]
cluster_model_best <- (best_model_name == "Latent_G3")

cat("BIC for all models:\n")
print(bics_all)
cat("Best model by BIC:            ", best_model_name, "\n")
cat("Is Latent_G3 the best overall?", cluster_model_best, "\n\n")

########################################################
## 4. Visual: do latent positions show clear clustering?
########################################################

## Posterior mean latent positions (n x 2)
Z_cluster <- fit_latent_clustered$Z.pmean

## Posterior cluster memberships (if available)
cluster_assign <- rep(1, nrow(Z_cluster))
if ("Z.K" %in% names(fit_latent_clustered)) {
  ## Often Z.K is an n x G matrix of cluster posterior probabilities
  ZK <- fit_latent_clustered$Z.K
  if (is.matrix(ZK) && nrow(ZK) == nrow(Z_cluster)) {
    cluster_assign <- apply(ZK, 1, which.max)
  }
}

## Plot latent positions colored by cluster
plot(
  Z_cluster,
  col = cluster_assign,
  pch = 19,
  xlab = "Latent dim 1",
  ylab = "Latent dim 2",
  main = "Latent positions with 3 clusters"
)
## Optionally label nodes:
# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)

## YOU visually decide if clustering is clear:
## e.g., after viewing plot, set:
## visually_clear_clustering <- TRUE  # or FALSE
visually_clear_clustering <- NA  # placeholder

########################################################
## 5. Same-class vs different-class distances (5% rule)
########################################################

## Use latent positions from clustered model
Z <- Z_cluster

## Attributes
class_vec <- get.vertex.attribute(G, "class")

## Pairwise Euclidean distances
Dmat <- as.matrix(dist(Z))
U <- upper.tri(Dmat)

same_class_mat <- outer(class_vec, class_vec, "==")

d_same_class <- Dmat[U & same_class_mat]
d_diff_class <- Dmat[U & !same_class_mat]

mean_same_class <- mean(d_same_class, na.rm = TRUE)
mean_diff_class <- mean(d_diff_class, na.rm = TRUE)

same_class_5pct_closer <- (
  length(d_same_class) > 0 &&
  length(d_diff_class) > 0 &&
  mean_same_class <= 0.95 * mean_diff_class
)

cat("Mean dist (same class):       ", mean_same_class, "\n")
cat("Mean dist (diff class):       ", mean_diff_class, "\n")
cat("Same-class ≥5% closer?        ", same_class_5pct_closer, "\n\n")

cat("Visual clustering assessed?   visually_clear_clustering = ",
    visually_clear_clustering, "\n")


BIC (ERGM):                    12710.19 
BIC (Latent, no clusters):     9808.597 
BIC (Latent, 3 clusters):      9625.002 
Latent+clusters better than latent-no-clusters (BIC)?  TRUE 

BIC for all models:
          ERGM Latent_noClust      Latent_G3      ERGM_full 
     12710.194       9808.597       9625.002      12719.057 
Best model by BIC:             Latent_G3 
Is Latent_G3 the best overall? TRUE 

Error in rep(1, nrow(Z_cluster)) : invalid 'times' argument
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.
Error in rep(1, nrow(Z_cluster)) : invalid 'times' argument


RInterpreterError: Failed to parse and evaluate line '# You can add your code to fit the model here\n\nlibrary(network)\nlibrary(ergm)\nlibrary(latentnet)\n\n## G: directed \'network\' object with vertex attributes "gender" and "class"\n\n########################################################\n## 0. (Optional) refit previous models if missing\n########################################################\n\nif (!exists("fit_ergm")) {\n  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))\n}\n\nif (!exists("fit_latent")) {\n  fit_latent <- ergmm(\n    G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),\n    control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)\n  )\n}\n\n########################################################\n## 1. Latent space model with 3 clusters (2D, Euclidean)\n########################################################\n\nfit_latent_clustered <- ergmm(\n  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2, G = 3),\n  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)\n)\n\n## IMPORTANT: do NOT ask for "mle" (this caused your error).\n## Use posterior means only:\ns_latent_clustered <- summary(fit_latent_clustered, point.est = "pmean")\n\n########################################################\n## 2. BIC comparison: with clusters vs without clusters\n########################################################\n\n## ERGM BIC\nbic_ergm <- as.numeric(summary(fit_ergm)$bic)\n\n## Latent (no clusters) BIC\nbic_latent <- bic.ergmm(fit_latent)\nbic_latent_overall <- as.numeric(bic_latent["overall"])\n\n## Latent + 3 clusters BIC\nbic_latent_cluster <- bic.ergmm(fit_latent_clustered)\nbic_latent_cluster_overall <- as.numeric(bic_latent_cluster["overall"])\n\n## Is latent+clusters better than latent without clusters?\nlatent_clusters_better <- (bic_latent_cluster_overall < bic_latent_overall)\n\ncat("BIC (ERGM):                   ", bic_ergm, "\\n")\ncat("BIC (Latent, no clusters):    ", bic_latent_overall, "\\n")\ncat("BIC (Latent, 3 clusters):     ", bic_latent_cluster_overall, "\\n")\ncat("Latent+clusters better than latent-no-clusters (BIC)? ",\n    latent_clusters_better, "\\n\\n")\n\n########################################################\n## 3. Is this clustered latent model best among ALL now?\n##    (ERGM, latent, latent+clusters, plus others if exist)\n########################################################\n\nbics_all <- c(\n  ERGM          = bic_ergm,\n  Latent_noClust = bic_latent_overall,\n  Latent_G3      = bic_latent_cluster_overall\n)\n\n## If you also fitted full ERGM with absdiffs, include it:\nif (exists("fit_full")) {\n  bic_full <- as.numeric(summary(fit_full)$bic)\n  bics_all <- c(bics_all, ERGM_full = bic_full)\n}\n\nbest_model_name <- names(bics_all)[which.min(bics_all)]\ncluster_model_best <- (best_model_name == "Latent_G3")\n\ncat("BIC for all models:\\n")\nprint(bics_all)\ncat("Best model by BIC:            ", best_model_name, "\\n")\ncat("Is Latent_G3 the best overall?", cluster_model_best, "\\n\\n")\n\n########################################################\n## 4. Visual: do latent positions show clear clustering?\n########################################################\n\n## Posterior mean latent positions (n x 2)\nZ_cluster <- fit_latent_clustered$Z.pmean\n\n## Posterior cluster memberships (if available)\ncluster_assign <- rep(1, nrow(Z_cluster))\nif ("Z.K" %in% names(fit_latent_clustered)) {\n  ## Often Z.K is an n x G matrix of cluster posterior probabilities\n  ZK <- fit_latent_clustered$Z.K\n  if (is.matrix(ZK) && nrow(ZK) == nrow(Z_cluster)) {\n    cluster_assign <- apply(ZK, 1, which.max)\n  }\n}\n\n## Plot latent positions colored by cluster\nplot(\n  Z_cluster,\n  col = cluster_assign,\n  pch = 19,\n  xlab = "Latent dim 1",\n  ylab = "Latent dim 2",\n  main = "Latent positions with 3 clusters"\n)\n## Optionally label nodes:\n# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)\n\n## YOU visually decide if clustering is clear:\n## e.g., after viewing plot, set:\n## visually_clear_clustering <- TRUE  # or FALSE\nvisually_clear_clustering <- NA  # placeholder\n\n########################################################\n## 5. Same-class vs different-class distances (5% rule)\n########################################################\n\n## Use latent positions from clustered model\nZ <- Z_cluster\n\n## Attributes\nclass_vec <- get.vertex.attribute(G, "class")\n\n## Pairwise Euclidean distances\nDmat <- as.matrix(dist(Z))\nU <- upper.tri(Dmat)\n\nsame_class_mat <- outer(class_vec, class_vec, "==")\n\nd_same_class <- Dmat[U & same_class_mat]\nd_diff_class <- Dmat[U & !same_class_mat]\n\nmean_same_class <- mean(d_same_class, na.rm = TRUE)\nmean_diff_class <- mean(d_diff_class, na.rm = TRUE)\n\nsame_class_5pct_closer <- (\n  length(d_same_class) > 0 &&\n  length(d_diff_class) > 0 &&\n  mean_same_class <= 0.95 * mean_diff_class\n)\n\ncat("Mean dist (same class):       ", mean_same_class, "\\n")\ncat("Mean dist (diff class):       ", mean_diff_class, "\\n")\ncat("Same-class ≥5% closer?        ", same_class_5pct_closer, "\\n\\n")\n\ncat("Visual clustering assessed?   visually_clear_clustering = ",\n    visually_clear_clustering, "\\n")\n'.
R error message: "Error in rep(1, nrow(Z_cluster)) : invalid 'times' argument"
R stdout:
Error in rep(1, nrow(Z_cluster)) : invalid 'times' argument
In addition: Warning message:
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.

In [None]:
%%R
# You can add your code to fit the model here

library(network)
library(ergm)
library(latentnet)

## G: directed 'network' object with vertex attributes "gender" and "class"

########################################################
## 0. (Optional) refit previous models if missing
########################################################

if (!exists("fit_ergm")) {
  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))
}

if (!exists("fit_latent")) {
  fit_latent <- ergmm(
    G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),
    control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
  )
}

########################################################
## 1. Latent space model with 3 clusters (2D, Euclidean)
########################################################

fit_latent_clustered <- ergmm(
  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2, G = 3),
  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
)

## IMPORTANT: do NOT ask for "mle" (this caused your error).
## Use posterior means only:
s_latent_clustered <- summary(fit_latent_clustered, point.est = "pmean")

########################################################
## 2. BIC comparison: with clusters vs without clusters
########################################################

## ERGM BIC
bic_ergm <- as.numeric(summary(fit_ergm)$bic)

## Latent (no clusters) BIC
bic_latent <- bic.ergmm(fit_latent)
bic_latent_overall <- as.numeric(bic_latent["overall"])

## Latent + 3 clusters BIC
bic_latent_cluster <- bic.ergmm(fit_latent_clustered)
bic_latent_cluster_overall <- as.numeric(bic_latent_cluster["overall"])

## Is latent+clusters better than latent without clusters?
latent_clusters_better <- (bic_latent_cluster_overall < bic_latent_overall)

cat("BIC (ERGM):                   ", bic_ergm, "\n")
cat("BIC (Latent, no clusters):    ", bic_latent_overall, "\n")
cat("BIC (Latent, 3 clusters):     ", bic_latent_cluster_overall, "\n")
cat("Latent+clusters better than latent-no-clusters (BIC)? ",
    latent_clusters_better, "\n\n")

########################################################
## 3. Is this clustered latent model best among ALL now?
##    (ERGM, latent, latent+clusters, plus others if exist)
########################################################

bics_all <- c(
  ERGM          = bic_ergm,
  Latent_noClust = bic_latent_overall,
  Latent_G3      = bic_latent_cluster_overall
)

## If you also fitted full ERGM with absdiffs, include it:
if (exists("fit_full")) {
  bic_full <- as.numeric(summary(fit_full)$bic)
  bics_all <- c(bics_all, ERGM_full = bic_full)
}

best_model_name <- names(bics_all)[which.min(bics_all)]
cluster_model_best <- (best_model_name == "Latent_G3")

cat("BIC for all models:\n")
print(bics_all)
cat("Best model by BIC:            ", best_model_name, "\n")
cat("Is Latent_G3 the best overall?", cluster_model_best, "\n\n")

########################################################
## 4. Visual: do latent positions show clear clustering?
########################################################

## Posterior mean latent positions (n x 2)
Z_cluster <- fit_latent_clustered$Z.pmean

## Posterior cluster memberships (if available)
cluster_assign <- rep(1, nrow(Z_cluster))
if ("Z.K" %in% names(fit_latent_clustered)) {
  ## Often Z.K is an n x G matrix of cluster posterior probabilities
  ZK <- fit_latent_clustered$Z.K
  if (is.matrix(ZK) && nrow(ZK) == nrow(Z_cluster)) {
    cluster_assign <- apply(ZK, 1, which.max)
  }
}

## Plot latent positions colored by cluster
plot(
  Z_cluster,
  col = cluster_assign,
  pch = 19,
  xlab = "Latent dim 1",
  ylab = "Latent dim 2",
  main = "Latent positions with 3 clusters"
)
## Optionally label nodes:
# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)

## YOU visually decide if clustering is clear:
## e.g., after viewing plot, set:
## visually_clear_clustering <- TRUE  # or FALSE
visually_clear_clustering <- NA  # placeholder

########################################################
## 5. Same-class vs different-class distances (5% rule)
########################################################

## Use latent positions from clustered model
Z <- Z_cluster

## Attributes
class_vec <- get.vertex.attribute(G, "class")

## Pairwise Euclidean distances
Dmat <- as.matrix(dist(Z))
U <- upper.tri(Dmat)

same_class_mat <- outer(class_vec, class_vec, "==")

d_same_class <- Dmat[U & same_class_mat]
d_diff_class <- Dmat[U & !same_class_mat]

mean_same_class <- mean(d_same_class, na.rm = TRUE)
mean_diff_class <- mean(d_diff_class, na.rm = TRUE)

same_class_5pct_closer <- (
  length(d_same_class) > 0 &&
  length(d_diff_class) > 0 &&
  mean_same_class <= 0.95 * mean_diff_class
)

cat("Mean dist (same class):       ", mean_same_class, "\n")
cat("Mean dist (diff class):       ", mean_diff_class, "\n")
cat("Same-class ≥5% closer?        ", same_class_5pct_closer, "\n\n")

cat("Visual clustering assessed?   visually_clear_clustering = ",
    visually_clear_clustering, "\n")


Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.
Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.


RInterpreterError: Failed to parse and evaluate line '# You can add your code to fit the model here\n\n############################################################\n# Packages\n############################################################\n\nlibrary(ergm)\nlibrary(latentnet)\nlibrary(network)\n\n# G is a directed \'network\' object with vertex attributes:\n#   "gender", "class", "age", "GPA"\n\n############################################################\n# 0. (Optional) Refit earlier models if not already in env\n############################################################\n\n# Base ERGM (no absdiffs)\nif (!exists("fit_ergm")) {\n  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))\n}\n# Full ERGM (with absdiffs)\nif (!exists("fit_full")) {\n  fit_full <- ergm(\n    G ~ edges + nodematch("gender") + nodematch("class") +\n      absdiff("GPA") + absdiff("age")\n  )\n}\n# Latent space (no clusters)\nif (!exists("fit_latent")) {\n  fit_latent <- ergmm(\n    G ~ euclidean(d = 2) + nodematch("gender") + nodematch("class")\n  )\n}\n\n############################################################\n# 1. Latent space model WITH 3 clusters\n############################################################\n\nfit_latent_clustered <- ergmm(\n  G ~ euclidean(d = 2, G = 3) + nodematch("gender") + nodematch("class")\n)\n\n# Summary for MLE (coeffs/p-values) + pmean (positions)\nsum_latent_clustered <- summary(\n  fit_latent_clustered,\n  point.est = c("mle", "pmean"),\n  se = TRUE\n)\n\n############################################################\n# 2. BIC comparison: clustered latent vs non-cluster latent\n############################################################\n\n# BIC for ERGM and full ERGM\nbic_ergm <- summary(fit_ergm)$bic\nbic_full <- summary(fit_full)$bic\n\n# BIC for latent (no clusters)\nbic_latent_all <- unlist(bic.ergmm(fit_latent))\nbic_latent_overall <- bic_latent_all["overall"]\n\n# BIC for latent + 3 clusters\nbic_latent_cluster_all <- unlist(bic.ergmm(fit_latent_clustered))\nbic_latent_cluster_overall <- bic_latent_cluster_all["overall"]\n\n# Is clustered latent model better than non-cluster latent?\ncluster_latent_better_than_latent <-\n  (bic_latent_cluster_overall < bic_latent_overall)\n\n############################################################\n# 3. Is the clustered latent model the BEST among all?\n############################################################\n\nbics_all <- c(\n  ERGM_base      = bic_ergm,\n  ERGM_full      = bic_full,\n  Latent_noClust = bic_latent_overall,\n  Latent_G3      = bic_latent_cluster_overall\n)\n\nbest_model_name <- names(bics_all)[which.min(bics_all)]\ncluster_model_best <- (best_model_name == "Latent_G3")\n\n############################################################\n# 4. Visual clustering of latent positions (2D)\n############################################################\n\n# Posterior mean latent positions (n x 2)\nZ_cluster <- sum_latent_clustered$pmean$Z\n\n# Estimated cluster memberships:\n# In ergmm, mixture component posterior probs are in Z.K (n x G)\n# We assign each node to the component with highest posterior prob.\nif (!is.null(fit_latent_clustered$Z.K)) {\n  cluster_probs <- fit_latent_clustered$Z.K\n  cluster_assign <- apply(cluster_probs, 1, which.max)\n} else {\n  # fallback: if not present, treat as 1 cluster\n  cluster_assign <- rep(1, nrow(Z_cluster))\n}\n\n# A simple scatter plot to visually inspect clustering\n# (You, the analyst, visually decide if there is clear clustering)\nplot(\n  Z_cluster,\n  col = cluster_assign,\n  pch = 19,\n  xlab = "Latent dimension 1",\n  ylab = "Latent dimension 2",\n  main = "Latent positions with 3 clusters (ergmm)"\n)\n# Optionally, add labels:\n# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)\n\n# After looking at the plot, you can manually set:\n# visually_clear_clustering <- TRUE/FALSE\n# For automated pipelines, you might leave this as NA:\nvisually_clear_clustering <- NA  # set by human after inspection\n\n############################################################\n# 5. Same-class vs different-class distances in clustered latent model\n############################################################\n\n# Get attributes\nclass_vec <- get.vertex.attribute(G, "class")\n\n# Pairwise distances\ndist_mat <- as.matrix(dist(Z_cluster))  # Euclidean\n\nupper <- upper.tri(dist_mat)\nsame_class_mat <- outer(class_vec, class_vec, "==")\n\nd_same_class <- dist_mat[upper & same_class_mat]\nd_diff_class <- dist_mat[upper & !same_class_mat]\n\nmean_same_class <- mean(d_same_class, na.rm = TRUE)\nmean_diff_class <- mean(d_diff_class, na.rm = TRUE)\n\n# "Closer" by at least 5%:\nsame_class_closer_5pct <- (\n  length(d_same_class) > 0 &&\n  length(d_diff_class) > 0 &&\n  mean_same_class <= 0.95 * mean_diff_class\n)\n\n############################################################\n# 6. Print the key checks\n############################################################\n\ncat("=== BIC comparison ===\\n")\ncat("BIC (ERGM base):             ", bic_ergm, "\\n")\ncat("BIC (ERGM full):             ", bic_full, "\\n")\ncat("BIC (Latent, no clusters):   ", bic_latent_overall, "\\n")\ncat("BIC (Latent, G = 3 clusters):", bic_latent_cluster_overall, "\\n\\n")\n\ncat("Latent + clusters better than latent without clusters (by BIC)? ",\n    cluster_latent_better_than_latent, "\\n\\n")\n\ncat("Best model among all (by BIC): ", best_model_name, "\\n")\ncat("Is latent + clusters model the best of all? ",\n    cluster_model_best, "\\n\\n")\n\ncat("Mean distance (same class):   ", mean_same_class, "\\n")\ncat("Mean distance (diff class):   ", mean_diff_class, "\\n")\ncat("Same-class at least 5% closer in latent space? ",\n    same_class_closer_5pct, "\\n\\n")\n\ncat("Visual clustering inspected? (set manually): visually_clear_clustering = ",\n    visually_clear_clustering, "\\n")\n'.
R error message: 'Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : \n  MLE was not computed for this fit.'
R stdout:
Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.
In addition: Warning message:
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.

In [None]:
%%R
# You can add your code to fit the model here

library(network)
library(ergm)
library(latentnet)

## G: directed 'network' object with vertex attributes "gender" and "class"

########################################################
## 0. (Optional) refit previous models if missing
########################################################

if (!exists("fit_ergm")) {
  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))
}

if (!exists("fit_latent")) {
  fit_latent <- ergmm(
    G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2),
    control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
  )
}

########################################################
## 1. Latent space model with 3 clusters (2D, Euclidean)
########################################################

fit_latent_clustered <- ergmm(
  G ~ edges + nodematch("gender") + nodematch("class") + euclidean(d = 2, G = 3),
  control = control.ergmm(burnin = 10000, sample.size = 4000, interval = 10)
)

## IMPORTANT: do NOT ask for "mle" (this caused your error).
## Use posterior means only:
s_latent_clustered <- summary(fit_latent_clustered, point.est = "pmean")

########################################################
## 2. BIC comparison: with clusters vs without clusters
########################################################

## ERGM BIC
bic_ergm <- as.numeric(summary(fit_ergm)$bic)

## Latent (no clusters) BIC
bic_latent <- bic.ergmm(fit_latent)
bic_latent_overall <- as.numeric(bic_latent["overall"])

## Latent + 3 clusters BIC
bic_latent_cluster <- bic.ergmm(fit_latent_clustered)
bic_latent_cluster_overall <- as.numeric(bic_latent_cluster["overall"])

## Is latent+clusters better than latent without clusters?
latent_clusters_better <- (bic_latent_cluster_overall < bic_latent_overall)

cat("BIC (ERGM):                   ", bic_ergm, "\n")
cat("BIC (Latent, no clusters):    ", bic_latent_overall, "\n")
cat("BIC (Latent, 3 clusters):     ", bic_latent_cluster_overall, "\n")
cat("Latent+clusters better than latent-no-clusters (BIC)? ",
    latent_clusters_better, "\n\n")

########################################################
## 3. Is this clustered latent model best among ALL now?
##    (ERGM, latent, latent+clusters, plus others if exist)
########################################################

bics_all <- c(
  ERGM          = bic_ergm,
  Latent_noClust = bic_latent_overall,
  Latent_G3      = bic_latent_cluster_overall
)

## If you also fitted full ERGM with absdiffs, include it:
if (exists("fit_full")) {
  bic_full <- as.numeric(summary(fit_full)$bic)
  bics_all <- c(bics_all, ERGM_full = bic_full)
}

best_model_name <- names(bics_all)[which.min(bics_all)]
cluster_model_best <- (best_model_name == "Latent_G3")

cat("BIC for all models:\n")
print(bics_all)
cat("Best model by BIC:            ", best_model_name, "\n")
cat("Is Latent_G3 the best overall?", cluster_model_best, "\n\n")

########################################################
## 4. Visual: do latent positions show clear clustering?
########################################################

## Posterior mean latent positions (n x 2)
Z_cluster <- fit_latent_clustered$Z.pmean

## Posterior cluster memberships (if available)
cluster_assign <- rep(1, nrow(Z_cluster))
if ("Z.K" %in% names(fit_latent_clustered)) {
  ## Often Z.K is an n x G matrix of cluster posterior probabilities
  ZK <- fit_latent_clustered$Z.K
  if (is.matrix(ZK) && nrow(ZK) == nrow(Z_cluster)) {
    cluster_assign <- apply(ZK, 1, which.max)
  }
}

## Plot latent positions colored by cluster
plot(
  Z_cluster,
  col = cluster_assign,
  pch = 19,
  xlab = "Latent dim 1",
  ylab = "Latent dim 2",
  main = "Latent positions with 3 clusters"
)
## Optionally label nodes:
# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)

## YOU visually decide if clustering is clear:
## e.g., after viewing plot, set:
## visually_clear_clustering <- TRUE  # or FALSE
visually_clear_clustering <- NA  # placeholder

########################################################
## 5. Same-class vs different-class distances (5% rule)
########################################################

## Use latent positions from clustered model
Z <- Z_cluster

## Attributes
class_vec <- get.vertex.attribute(G, "class")

## Pairwise Euclidean distances
Dmat <- as.matrix(dist(Z))
U <- upper.tri(Dmat)

same_class_mat <- outer(class_vec, class_vec, "==")

d_same_class <- Dmat[U & same_class_mat]
d_diff_class <- Dmat[U & !same_class_mat]

mean_same_class <- mean(d_same_class, na.rm = TRUE)
mean_diff_class <- mean(d_diff_class, na.rm = TRUE)

same_class_5pct_closer <- (
  length(d_same_class) > 0 &&
  length(d_diff_class) > 0 &&
  mean_same_class <= 0.95 * mean_diff_class
)

cat("Mean dist (same class):       ", mean_same_class, "\n")
cat("Mean dist (diff class):       ", mean_diff_class, "\n")
cat("Same-class ≥5% closer?        ", same_class_5pct_closer, "\n\n")

cat("Visual clustering assessed?   visually_clear_clustering = ",
    visually_clear_clustering, "\n")


Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.
Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.


RInterpreterError: Failed to parse and evaluate line '# You can add your code to fit the model here\n\n############################################################\n# Packages\n############################################################\n\nlibrary(ergm)\nlibrary(latentnet)\nlibrary(network)\n\n# G is a directed \'network\' object with vertex attributes:\n#   "gender", "class", "age", "GPA"\n\n############################################################\n# 0. (Optional) Refit earlier models if not already in env\n############################################################\n\n# Base ERGM (no absdiffs)\nif (!exists("fit_ergm")) {\n  fit_ergm <- ergm(G ~ edges + nodematch("gender") + nodematch("class"))\n}\n# Full ERGM (with absdiffs)\nif (!exists("fit_full")) {\n  fit_full <- ergm(\n    G ~ edges + nodematch("gender") + nodematch("class") +\n      absdiff("GPA") + absdiff("age")\n  )\n}\n# Latent space (no clusters)\nif (!exists("fit_latent")) {\n  fit_latent <- ergmm(\n    G ~ euclidean(d = 2) + nodematch("gender") + nodematch("class")\n  )\n}\n\n############################################################\n# 1. Latent space model WITH 3 clusters\n############################################################\n\nfit_latent_clustered <- ergmm(\n  G ~ euclidean(d = 2, G = 3) + nodematch("gender") + nodematch("class")\n)\n\n# Summary for MLE (coeffs/p-values) + pmean (positions)\nsum_latent_clustered <- summary(\n  fit_latent_clustered,\n  point.est = c("mle", "pmean"),\n  se = TRUE\n)\n\n############################################################\n# 2. BIC comparison: clustered latent vs non-cluster latent\n############################################################\n\n# BIC for ERGM and full ERGM\nbic_ergm <- summary(fit_ergm)$bic\nbic_full <- summary(fit_full)$bic\n\n# BIC for latent (no clusters)\nbic_latent_all <- unlist(bic.ergmm(fit_latent))\nbic_latent_overall <- bic_latent_all["overall"]\n\n# BIC for latent + 3 clusters\nbic_latent_cluster_all <- unlist(bic.ergmm(fit_latent_clustered))\nbic_latent_cluster_overall <- bic_latent_cluster_all["overall"]\n\n# Is clustered latent model better than non-cluster latent?\ncluster_latent_better_than_latent <-\n  (bic_latent_cluster_overall < bic_latent_overall)\n\n############################################################\n# 3. Is the clustered latent model the BEST among all?\n############################################################\n\nbics_all <- c(\n  ERGM_base      = bic_ergm,\n  ERGM_full      = bic_full,\n  Latent_noClust = bic_latent_overall,\n  Latent_G3      = bic_latent_cluster_overall\n)\n\nbest_model_name <- names(bics_all)[which.min(bics_all)]\ncluster_model_best <- (best_model_name == "Latent_G3")\n\n############################################################\n# 4. Visual clustering of latent positions (2D)\n############################################################\n\n# Posterior mean latent positions (n x 2)\nZ_cluster <- sum_latent_clustered$pmean$Z\n\n# Estimated cluster memberships:\n# In ergmm, mixture component posterior probs are in Z.K (n x G)\n# We assign each node to the component with highest posterior prob.\nif (!is.null(fit_latent_clustered$Z.K)) {\n  cluster_probs <- fit_latent_clustered$Z.K\n  cluster_assign <- apply(cluster_probs, 1, which.max)\n} else {\n  # fallback: if not present, treat as 1 cluster\n  cluster_assign <- rep(1, nrow(Z_cluster))\n}\n\n# A simple scatter plot to visually inspect clustering\n# (You, the analyst, visually decide if there is clear clustering)\nplot(\n  Z_cluster,\n  col = cluster_assign,\n  pch = 19,\n  xlab = "Latent dimension 1",\n  ylab = "Latent dimension 2",\n  main = "Latent positions with 3 clusters (ergmm)"\n)\n# Optionally, add labels:\n# text(Z_cluster, labels = 1:nrow(Z_cluster), pos = 3, cex = 0.7)\n\n# After looking at the plot, you can manually set:\n# visually_clear_clustering <- TRUE/FALSE\n# For automated pipelines, you might leave this as NA:\nvisually_clear_clustering <- NA  # set by human after inspection\n\n############################################################\n# 5. Same-class vs different-class distances in clustered latent model\n############################################################\n\n# Get attributes\nclass_vec <- get.vertex.attribute(G, "class")\n\n# Pairwise distances\ndist_mat <- as.matrix(dist(Z_cluster))  # Euclidean\n\nupper <- upper.tri(dist_mat)\nsame_class_mat <- outer(class_vec, class_vec, "==")\n\nd_same_class <- dist_mat[upper & same_class_mat]\nd_diff_class <- dist_mat[upper & !same_class_mat]\n\nmean_same_class <- mean(d_same_class, na.rm = TRUE)\nmean_diff_class <- mean(d_diff_class, na.rm = TRUE)\n\n# "Closer" by at least 5%:\nsame_class_closer_5pct <- (\n  length(d_same_class) > 0 &&\n  length(d_diff_class) > 0 &&\n  mean_same_class <= 0.95 * mean_diff_class\n)\n\n############################################################\n# 6. Print the key checks\n############################################################\n\ncat("=== BIC comparison ===\\n")\ncat("BIC (ERGM base):             ", bic_ergm, "\\n")\ncat("BIC (ERGM full):             ", bic_full, "\\n")\ncat("BIC (Latent, no clusters):   ", bic_latent_overall, "\\n")\ncat("BIC (Latent, G = 3 clusters):", bic_latent_cluster_overall, "\\n\\n")\n\ncat("Latent + clusters better than latent without clusters (by BIC)? ",\n    cluster_latent_better_than_latent, "\\n\\n")\n\ncat("Best model among all (by BIC): ", best_model_name, "\\n")\ncat("Is latent + clusters model the best of all? ",\n    cluster_model_best, "\\n\\n")\n\ncat("Mean distance (same class):   ", mean_same_class, "\\n")\ncat("Mean distance (diff class):   ", mean_diff_class, "\\n")\ncat("Same-class at least 5% closer in latent space? ",\n    same_class_closer_5pct, "\\n\\n")\n\ncat("Visual clustering inspected? (set manually): visually_clear_clustering = ",\n    visually_clear_clustering, "\\n")\n'.
R error message: 'Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : \n  MLE was not computed for this fit.'
R stdout:
Error in summary.ergmm(fit_latent_clustered, point.est = c("mle", "pmean"),  : 
  MLE was not computed for this fit.
In addition: Warning message:
In backoff.check(model, burnin.sample, burnin.control) :
  Backing off: too few acceptances. If you see this message several times in a row, use a longer burnin.

In [None]:
# You can add your code to analyse the results here