Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you create a table with each segment of the Venn diagram #21

Open
jane-murphy opened this issue Jul 13, 2021 · 7 comments
Open

Comments

@jane-murphy
Copy link

Hello

Thank you for your code - it is very nice. I am using the code to look at 3 gene sets with the overlapping and unique genes represented by the Venn diagram. I have used show_elements to look at these genes directly on the Venn diagram but I am wondering whether there is a convenient way to have them in a table?

I am especially interested in a way to view the unique genes in each set (i.e. not overlapping with either of the other 2 sets).

Best wishes,
Jane

@mj163163
Copy link

Hope he can add that function to export the data.

@lucaskbobadilla
Copy link

You can just use dplyr::setdiff in the list of elements you used to create the plot

@Dragonmasterx87
Copy link

Dragonmasterx87 commented Apr 18, 2023

I dont understand why the authors didnt outline how to do this as the whole purpose of the venn dagram is to find associations across gene sets. Anyway here's a more appropriate solution than arbitrarily looking at intersections such as dplyr::setdiff

# You have made a list of associations called x
x <- list(
 a = data_a,
 b = data_b,
 c = data_c
)

venn <- Venn(x)
data <- process_data(venn) # data is a ggVennDiagram object

# Look at all sets of genes forming overlaps
mylist <- data@region[["item"]]

# These will result in numerically named lists. This is stupid, because its difficult to match a list to the 
# Venn diagram unless you count the values which is also stupid. So the more appropriate way of doing this is:
names(mylist) # this will return a null because no names exist
names(mylist) <- data@region[["name"]]
names(mylist)
mylist  #named list of association type and genes

Enjoy!
🐉

@yanlinlin82
Copy link
Owner

Hi, @Dragonmasterx87

Thank you for the workaround code. I haven't fixed this issue because I didn't quite understand the problem (or had no idea of a better design) and had insufficient time to code and test. If you don't mind, please send me a pull request, and I will merge it into the package.

@xiasijian
Copy link

xiasijian commented Jun 3, 2023

I have a try, maybe it is not perfect. This is my R code below.

## ggvenn data
export_ggvenn_data=function(venn_list,save_name){
  max_length=c()
  exclude_list=list()
  for(i in names(venn_list)){
    set1=venn_list[[i]]
    other_name=names(venn_list[-which(names(venn_list)==i)])
    other_value=c()
    for(j in other_name){
      #print(j)
      set2=venn_list[[j]]
      other_value=c(other_value,set2)
        
      ## complementary set
      set1_comple=setdiff(set1,set2)
      exclude_name=paste0(i,"_setdiff_",j)
      exclude_list[[exclude_name]]=set1_comple
      
      ## export length in order to cbind
      max_length=c(max_length,length(set1_comple))
      
    } 
      set1_exclude=setdiff(set1,other_value)
      unique_name=paste0(i,"_unique")
      exclude_list[[unique_name]]=set1_exclude
  }
  
  ## intersect
  overlap_set=Reduce(intersect,venn_list)

  ## if the intersected result is zero,give a element.
  if(length(overlap_set)==0){
    exclude_list[["overlap"]]="zero"
  }else{
    exclude_list[["overlap"]]=overlap_set
  }
  
  
  for(i in 1:length(names(exclude_list))){
    if(i ==1 ){
      
      vec1=exclude_list[[i]]
      length(vec1)=max(max_length)
    }else{
      vec2=exclude_list[[i]]
      length(vec2)=max(max_length)
      vec1=cbind(vec1,vec2)
    }
    
  }
  colnames(vec1)=names(exclude_list)
  vec1=as.data.frame(vec1)
  write.csv(vec1,file=paste0(save_name,"_ggvenn_overlap_data.csv"),row.names=F)
  
}

the venn_list is the input of ggvenn function.

@xiasijian
Copy link

xiasijian commented Feb 20, 2024

Recently I wrote a simple function for this:

venny_table = function(set_list,list_name){
  names(setlist) = list_name
  len_list = length(list_name)
  
  ## all overlap items----------------------------------
  
  overlap_items = Reduce(intersect,
                         list(set1,set2,set3))
  
  intersect_setdiff_for_two_sets = function(set1, set2){
    common = intersect(set1,set2)
    set12 = setdiff(set1,set2)
    set21 = setdiff(set2,set1)
    
    common_save = paste(common,collapse = ";")
    set12_save = paste(set12,collapse = ";")
    set21_save = paste(set21, collapse = ";")
    out = c(set12_save,set21_save,common_save)
    return(out)
  }
  
  
  get_pair_type = function(j,len_list){
    pairs = c()
    for(i in (j+1):len_list){
      
      pair = c(list_name[j],list_name[i])
      pair_name = paste(pair,collapse = "_and_")
      pairs = c(pairs,pair_name)
    }
    return(pairs)
    
  }
  
  ## create comparison-----------------------------
  all_pairs = c()
  for(j in 1:(len_list-1)){
    print(j)
    test = get_pair_type(j = j,len_list=len_list)
    print(test)
    all_pairs = c(all_pairs,test)
  }
  
  ## make comparison table------------------------
  out_list = list()
  for(i in all_pairs){
    name_split = strsplit(i,"_and_")
    set1_name = name_split[[1]][1]
    set2_name =  name_split[[1]][2]
    out = intersect_setdiff_for_two_sets(set1 =setlist[[set1_name]] ,
                                         set2 = setlist[[set2_name]])
    print(out)
    
    out_list[[i]] = out
  }
  
  
  out_df = do.call(cbind,out_list)
  rownames(out_df) = c("left","right","left_and_right")
  
  overlap_save = paste(overlap_items,collapse = ";")
  export_df = rbind(out_df,rep(overlap_save,length(all_pairs)))
  rownames(export_df)[nrow(export_df)] = "overlap"
  
  write.csv(export_df,file = paste0("groups_intersect_setdiff_details.csv"),
            row.names = T)
  
} 



df1 = data.frame(gene_name = c(1,2,3,4,5),
                 biotype = rep("protein_coding",5))

df2 = data.frame(gene_name = c(10,11,12,5,7),
                 biotype = rep("protein_coding",5))

df3 = data.frame(gene_name = c(5,8,9,2,1),
                 biotype = rep("protein_coding",5))

df4 = data.frame(gene_name = c(2,1,5,7,9),
                 biotype = rep("protein_coding",5))

check_col = "gene_name"
set1 = df1[,check_col]
set2 = df2[,check_col]

set3 = df3[,check_col]
set4 = df4[,check_col]
list_name = c("set1","set2","set3","set4")
setlist = list(set1,set2,set3,set4)

venny_table(set_list = set_list ,list_name = list_name )


@xiasijian
Copy link

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants