
BRCANetworks
Nagashree Avabhrath, Mikhail Ukrainetz, Madison Moffett, Grant Smith, Lucia Williams, Mark Grimes
2026-04-10
BRCANetworks.RmdNote that we will use the complete PPI and BioPlanet edgefiles available in “/Data_repository_path/” for this vignette. Lucy todo: these can come with the package
This tutorial will load and create networks from the breast cancer cohort (BRCA, N = 122), produced by
Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy Krug, K., Jaehnig, E. J., Satpathy, S., Blumenberg, L., Karpova, A., Anurag, M., et al. Cell 183, 1436PTMsToPathways::name6.e31
And downloaded from Supplemental data S2 from
PhosphoDisco: A Toolkit for Co-regulated Phosphorylation Module Discovery in Phosphoproteomic Data Schraink, T., Blumenberg, L., Hussey, G., George, S., Miller, B., Mathew, N., Gonzalez-Robles, T.J., Sviderskiy, V., Papagiannakopoulos, T., Possemato, R., et al. Mol Cell Proteomics 22, 100596. 10.1016/j.mcpro.2023.100596
Data preprocessing
First, let’s load the PTMsToPathways package so its functions are available:
The BRCA data table described above is provided with the PTMsToPathways package and and can be read in as follows:
file_path <- system.file("extdata", "PhosphoDiscoData_mmc9.txt", package = "PTMsToPathways")
newphos <- utils::read.table(file_path, header = TRUE,
stringsAsFactors = FALSE, sep = "\t", comment.char = "#",
na.strings = "", quote = "", fill = TRUE)It has 4237 rows and 124 columns representing 4237 phosphosites and 122 samples with phosphoproteomic data.
The first two columns are gene_symbol and
variable_sites_names which we will use to create row names
for the PTM table to match the expected input format for the
PTMsToPathways functions. The Raw Data
Processing vignette gives another example of how to process raw data
tables to create the expected input format for the PTMsToPathways
functions. Here are the first few rows and columns of the data
frame:
head(newphos[, 1:5])
>> gene_symbol variable_sites_names X11BR047 X11BR043 X11BR049
>> 1 ACTN1 S6s 0.2936 -1.9643 -0.9256
>> 2 BAZ2A S1783s -1.2947 1.5820 -0.4770
>> 3 BRCA2 S93s -1.5739 0.5702 -1.1614
>> 4 C12orf45 S178s -3.6418 0.3764 -1.4522
>> 5 C17orf49 S96s -1.1685 0.2572 0.1649
>> 6 C6orf106 S215s -2.7001 0.2895 -2.7325Now we will process the first two columns to create row names for the
PTM table. We extract the amino acid and the site number from the
variable_sites_names column.
newphos$Amino.Acid <- sapply(newphos$variable_sites_names, function(x) substring (x, 1, 1))
newphos$Site <- trimws(substring(newphos$variable_sites_names, 2))
newphos$Site <- sub("[a-z]$", "", newphos$Site)
head(newphos[, c("gene_symbol", "variable_sites_names", "Amino.Acid", "Site")])
>> gene_symbol variable_sites_names Amino.Acid Site
>> 1 ACTN1 S6s S 6
>> 2 BAZ2A S1783s S 1783
>> 3 BRCA2 S93s S 93
>> 4 C12orf45 S178s S 178
>> 5 C17orf49 S96s S 96
>> 6 C6orf106 S215s S 215Now use the PTMsToPathways function name.peptide to
create peptide names for the row names of the PTM table.
newphos$Peptide.Name <- mapply(
name.peptide, genes = newphos$gene_symbol,
sites = newphos$Site, aa = newphos$Amino.Acid)Create ptmtable with PTMs as rows and samples as columns
for use in the next steps, and remove the columns we used to create the
row names.
phosdata <- newphos[, 3:ncol(newphos), ]
rownames(phosdata) <- newphos$Peptide.Name
phosdata <- phosdata[, !(names(phosdata) %in% c("gene_symbol", "variable_sites_names", "Amino.Acid", "Site", "Peptide.Name"))]
ptmtable <- phosdata
head(ptmtable[, 1:5])
>> X11BR047 X11BR043 X11BR049 X11BR023 X18BR010
>> ACTN1 p S6 0.2936 -1.9643 -0.9256 -2.6409 -0.8071
>> BAZ2A p S1783 -1.2947 1.5820 -0.4770 1.5900 -1.0283
>> BRCA2 p S93 -1.5739 0.5702 -1.1614 0.5241 -0.3049
>> C12orf45 p S178 -3.6418 0.3764 -1.4522 0.1373 -0.2151
>> C17orf49 p S96 -1.1685 0.2572 0.1649 1.8283 -1.0399
>> C6orf106 p S215 -2.7001 0.2895 -2.7325 1.6096 -1.7117Next, we create clusters and networks from those clusters as in the Creating Networks vignette.
Create Clusters and Co-Cluster Correlation Networks (CCCNs)
This takes about 10 minutes on a laptop, so we provide both the code and the pre-computed results for this step. To re-run the analysis, run, the following:
set.seed(88)
clusterlist.data <- MakeClusterList(ptmtable,
keeplength = 3, toolong = 3.5)
CCCN.data <- MakeCorrelationNetwork(adj.consensus.matrix,
ptm.correlation.matrix)Or load in pre-computed results from within the PTMsToPathways package:
clusterlist.data <- brca_clusterlist_data
CCCN.data <- brca_CCCN_dataWhether computed or loaded, the results are lists that contain the following elements:
# Separate out elements of results list
common.clusters <- clusterlist.data[[1]]
adj.consensus.matrix <- clusterlist.data[[2]]
ptm.correlation.matrix <- clusterlist.data[[3]]
# Separate out elements of results list
ptm.cccn.edges <- CCCN.data[[1]]
gene.cccn.edges <- CCCN.data[[2]]
gene.cccn.nodes <- CCCN.data[[3]]We expect 231 common clusters:
If desired, the clusters can be trimmed to those > 4.
cclength <- sapply(common.clusters, length)
common.clusters4 <- common.clusters[which(cclength>3)]
length(common.clusters4)
# should be 202
>> [1] 204We can visualize these in a large heatmap that is saved to the disk. To demonstrate, let’s look at the output for the first 3 clusters.
dir.create("plots", recursive = TRUE, showWarnings = FALSE)
res <- graph.ptm.by.cluster(
ptmtable = ptmtable,
common.clusters = common.clusters4[1:3], # use all clusters > 4, only first 3
filename = "plots/ptm_all_clusters_l4.pdf",
legend.filename = "plots/ptm_legend4.pdf",
heatkey.filename = "plots/ptm_heatkey4.pdf",
order.rows = "slope",
zlim = 3,
show.row.labels = FALSE,
show.col.labels = TRUE,
col_cex = 0.7
)
>> Heatmap written to: plots/ptm_all_clusters_l4.pdf
>> Cluster legend written to: plots/ptm_legend4.pdf
>> Heat key written to: plots/ptm_heatkey4.pdf
knitr::include_graphics("plots/ptm_all_clusters_l4.pdf")We can evaluate the clusters.
# The EvaluateClusters function evaluates clusters based on
# `intensity` = total signal after removing the NA fraction
# `realsamples` = samples that are not single-gene/PTM samples
# `cleargenes` = genes/PTMs that fit a pattern that ranks by decreasing total signal
# `percent.NA` = percentage of missing values in the cluster sub-table
eval_brca <- EvaluateClusters(
common.clusters4, ptmtable,
data.type = "ratio",
use.slope = FALSE,
index.mode = "density",
verbose = FALSE
)
head(eval_brca)
>> Group no.genes culled.by.slope percent.singlesamplegenes no.samples
>> 201 201 5 0 0 122
>> 204 204 5 0 0 122
>> 130 130 11 0 0 122
>> 13 13 4 0 0 105
>> 194 194 4 0 0 105
>> 145 145 13 0 0 122
>> percent.singlegenesamples total.signal percent.NA intensity Index
>> 201 0 407.1629 0 407.1629 147.6000
>> 204 0 630.5927 0 630.5927 147.6000
>> 130 0 785.0943 0 785.0943 134.1818
>> 13 0 565.5595 0 565.5595 132.5000
>> 194 0 532.2606 0 532.2606 132.5000
>> 145 0 1453.0472 0 1453.0472 132.4615Build Cluster Filtered Networks (CFNs) and Pathway Crosstalk Networks (PCNs)
For PPI edges, here we will demonstrate how to get the STRING-db and GeneMANIA edges from the static downloaded networks provided in “Data_repository_path”.
string_db_filename <- "string_hs_hugo.tsv"
string.all.edges.path <- paste0(Data_repository_path, string_db_filename)
# Use local = TRUE in the following function
stringdb.edges <- GetSTRINGdb.edges (gene.cccn.edges,
gene.cccn.nodes,
local = TRUE,
string.local.path = string.all.edges.path,
combined.score.threshold = 0)
# Similary for GeneMANIA edges
genemania_filename <- "hs_interactions_hugo.tsv"
gm.all.edges.path <- paste0(Data_repository_path, genemania_filename)
# This file contains the following types of interactions:
# "Genetic Interactions" "Pathway" "Physical Interactions" "Predicted"
# We choose all but "Genetic Interactions" to include in the following function.
genemania.edges <- GetGeneMANIA.edges (gm.all.edges.path,
gene.cccn.nodes,
local = TRUE,
genemania.local.path = gm.all.edges.path,
gm.interaction.types = c("Pathway", "Physical Interactions", "Predicted"))
# Kinsub edges obtained as previously using the Kinase Substrate data from PhosphoSitePlus.
input.filename <- "Kinase_Substrate_Dataset.txt"
kinsub.all.edges.path <- paste0(Data_repository_path, input.filename)
kinsub.edges <- GetKinsub.edges(input.filename, gene.cccn.nodes)
# Now we can build the CFN.
network.list <- BuildClusterFilteredNetwork(gene.cccn.edges,
stringdb.edges,
genemania.edges,
kinsub.edges,
db.filepaths = c())
combined.PPIs <- network.list[[1]]
cfn <- network.list[[2]]
# 67376 edges
cfn.merged <- mergeEdges(cfn)
# 27048 edgesWe build the PCN from the BioPlanet pathways as done previously.
bioplanet_filename <- "bioplanet_pathway_June2025.csv"
bioplanet.pathways.path <- paste0(Data_repository_path, bioplanet_filename)
PCN.data <- BuildPathwayCrosstalkNetwork(common.clusters, bioplanet.file,
createfile = FALSE)
pathway.crosstalk.network <- PCN.data[[1]] # 679707 edges
PCNedgelist <- PCN.data[[2]]
pathways.list <- PCN.data[[3]]Now we can explore these networks.
We will examine the modules from Schraink, et al., 2023, Supplemental Table S4
# Read in PhosphoDisco Modules file.
PD_module.file <- paste0(paperpath, "PhosDiscoModules_mmc11.txt")
PD_module.df <- utils::read.table(PD_module.file, header = TRUE,
stringsAsFactors = FALSE, sep = "\t", comment.char = "#",
na.strings = "", quote = "", fill = TRUE)
# Note these have trailing spaces as do the rownames(ptmtable)
# Column names: geneSymbol, variableSites, HDBSCAN.min_cluster_size.4
# 1017 rows
# Note differences from the PTM table imported above.
length(intersect(newphos$variable_sites_names, PD_module.df$variableSites)) # 530
length(intersect(newphos$gene_symbol, PD_module.df$geneSymbol)) # 161
# Make peptide names as above
PD_module.df$Amino.Acid <- sapply(PD_module.df$variableSites, function(x) substring (x, 1, 1))
PD_module.df$Site <- trimws(substring(PD_module.df$variableSites, 2))
PD_module.df$Site <- sub("[a-z]$", "", PD_module.df$Site)
PD_module.df$Peptide.Name <- mapply(
name.peptide, genes = PD_module.df$geneSymbol,
sites = PD_module.df$Site, aa = PD_module.df$Amino.Acid)
# Treat modules like our clusters
PD_module.df2 <- PD_module.df[,c("Peptide.Name", "HDBSCAN.min_cluster_size.4")]
PD_module.list <- dlply(
PD_module.df2,
.(HDBSCAN.min_cluster_size.4),
function(x) x$Peptide.Name
)
PD_module.genes.unique <- lapply(
PD_module.list,
function(x) unique(sub(" .*", "", x)))
# P2P identified different clusters than these modules.
mod63.interesct <- sapply(common.clusters, function(x) intersect(x, PD_module.list$`63`))
# Interesctions were found in common.clusters$ConsensusCluster1; common.clusters$ConsensusCluster3; common.clusters$ConsensusCluster4; common.clusters$ConsensusCluster5; common.clusters$ConsensusCluster6
# First, consider all ptms from major clusters that intersect with module 63
mod63.clust.ptms <- unlist(c(common.clusters$ConsensusCluster1, common.clusters$ConsensusCluster3, common.clusters$ConsensusCluster4, common.clusters$ConsensusCluster5, common.clusters$ConsensusCluster6))
# This code graphs the CFN/CCCN from all these PTMs:
funckey <- getFuncKey(paste0(Data_repository_path, "FunctionKey.txt"))
cfn.cccn <- ptms_to_cfn(mod63.clust.ptms, cfn = cfn.merged, pepsep = ";")
cfn_cccn.nodes <- make.cytoscape.node.file(cfn.cccn, funckey, ptmtable,
include.gene.data = TRUE,
include.coclustered.PTMs = TRUE)
g1 <- GraphCfn(cfn.edges = cfn.cccn, cfn.nodes = cfn_cccn.nodes,
Network.title = "CFN/CCCN All PTMs 1", Network.collection = "PTMsToPathways")
This is a complex graph that shows PTMs clusters as cliques connected by yellow correlation edges surrounding a CFN of interconnected gene nodes.
Let’s focus on CDK1 substrates to complement the work done in Schraink, et al., 2023.
cdk1.kinsub <- filter.edges.1("CDK1", kinsub.edges) # 82 edges
cdk1.substrates <- cdk1.kinsub [which(cdk1.kinsub$source=="CDK1"), "target"]
cfn_cccn.nodes.cdksubs <- cfn_cccn.nodes[cfn_cccn.nodes$Gene.Name %in% cdk1.substrates, ]Two ways to select for CDK1 substrates are presented.
# Method 1: Using RCy3.
selectNodes(cfn_cccn.nodes.cdksubs$id, by = "id", preserve=FALSE)
selectEdgesConnectingSelectedNodes()
createSubnetwork(nodes = getSelectedNodes(), edges = getSelectedEdges(), nodes.by.col = "id", edges.by.col = "name")
# Method 2: Using P2P functions.
cfn_cccn.nodes.cdksubs.edges <- filter.edges.0(nodenames = cfn_cccn.nodes.cdksubs$id, edge.file = cfn.cccn)
g2 <- GraphCfn(cfn.edges = cfn_cccn.nodes.cdksubs.edges, cfn.nodes = cfn_cccn.nodes.cdksubs,
Network.title = "CFN/CCCN All PTMs 5", Network.collection = "PTMsToPathways")
# Gives the exact same network.Now further simplify the graph to show only CCCN PTMs.
cfn_cccn.nodes.cdksubs <- cfn_cccn.nodes[cfn_cccn.nodes$Gene.Name %in% cdk1.substrates, ] # repeat of above to make this stand alone
cdksubs.genes <- unique(cfn_cccn.nodes.cdksubs$Gene.Name)
cdksubs.cfn <- filter.edges.0(cfn_cccn.nodes.cdksubs.genes, cfn.merged)
cdksubs.cfn.cccn <- get.co.clustered.ptms(cdksubs.cfn, ptm.cccn.edges=ptm.cccn.edges)
cdksubs.cfn.cccn.nodes <- make.cytoscape.node.file(cdksubs.cfn.cccn, funckey, ptmtable,
include.gene.data = TRUE,
include.coclustered.PTMs = TRUE)
g3 <- GraphCfn(cfn.edges = cdksubs.cfn.cccn, cfn.nodes = cdksubs.cfn.cccn.nodes,
Network.title = "CFN/CCCN All PTMs 6", Network.collection = "PTMsToPathways")Each of these graphs can be modified to set node size and shape using the following P2P functions that act on the front window in Cytoscape. Note the differences that reflect activation of different cell signaling pathways in tumors with different mutations: X02BR011 (AKT1 missense mutant); X21BR010 (PIK3CA missense mutant); and X05BR045 (TP53 nonsense and MLLT4 frameshift mutants).
setNodeColorToRatios(plotcol="X03BR011")
setNodeColorToRowz(plotcol="X03BR011") # This exaggerates the node size and shape somewhat.
setNodeColorToRatios(plotcol="X21BR010")
setNodeColorToRowz(plotcol="X21BR010")
setNodeColorToRatios(plotcol="X05BR045")
setNodeColorToRowz(plotcol="X05BR045")
# Note that different samples have dramatically different differences in PTMs that are up or down, which is reflected also in total in gene nodes.