Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about A and B compartment identification at different resolutions #67

Open
hbandukw opened this issue Aug 2, 2021 · 27 comments

Comments

@hbandukw
Copy link

hbandukw commented Aug 2, 2021

Hello,

I identified A and B compartments in my data at three resolutions (10Kb, 100Kb and 1Mb):

'''
fanc compartments -g $CHROMS_PATH -d $DOMAIN_OUTPUT $FILE -x $CHROMS2EXCLUDE
'''

I am a bit confused about the results.

siControl_1_ABcompartments

At this locus, the compartments at 10kb are identified as B, A, B, A, B, A and identified as just "A" at 100Kb and 1Mb. When I look at the 10kb resolution, it seems like most of the region is "B" so why is it being called as "A" at the lower resolutions? This is happening at most loci e.g. in the figure below, FanC identifies the region as A at 10kb but B at 100 kb and 1Mb.

siControl_1_TADs_Smarca4

Is this normal or not?

@kaukrise
Copy link
Collaborator

kaukrise commented Aug 2, 2021

Hi, thanks for the question.

There are several factors at play here when working with AB calls at different resolutions (which span three orders of magnitude in your case):

  • At high resolutions noise plays a large role - compartment calls are generally much less rust at 10kb than at 1mb. Something we have observed in a lot of datasets is that entries in the correlation matrix tend to be positive in very noisy matrices, which may lead to different compartment calls observed in your data
  • Even if your data were of sufficiently high resolution that 10kb compartment calls would be reasonably robust, it is likely that you will observe substantial local differences in compartment calls, simply because you integrate a lot more data per region at lower resolutions than at high ones

In your case, the plots you show give the strong impression that your data does not support 10kb resolution AB calls, and I would be careful with 100kb calls, too. Keep in mind that AB calls are calculated on the whole chromosome matrix, and not just the entries close to the diagonal where the signal looks sufficient. Off the diagonal the signal can appear almost random in high resolution matrices.

My recommendation is to focus on plots of the AB correlation matrix and its eigenvector (EV), instead of just the high-level AB calls. The EV will give you a much better idea of fluctuations in and strength of your AB calls.

@hbandukw
Copy link
Author

hbandukw commented Aug 2, 2021

So basically, I was hoping to track compartment switches between my Control and Ko samples. So when I have plots for AB correlation matrix for my samples (see below), It is really hard to track any differences visually?

Ko
image
Vs

Ctrl
image

Can you suggest what my options are?

@kaukrise
Copy link
Collaborator

kaukrise commented Aug 2, 2021

I agree that these are difficult to quantify and also to assess visually. This, and the above noise considerations have kept me from using them in my own research.

What I have seen people do is to calculate the difference of the eigenvectors of the two samples and plot that in addition to the data you plotted above. But honestly I am not sure whether that is a mathematically valid approach or makes sense for your samples.

@hbandukw
Copy link
Author

hbandukw commented Aug 2, 2021

hmm I see. Thank you for the advice!

@liz-is
Copy link
Contributor

liz-is commented Aug 2, 2021

Chiming in because I've spent a bunch of time thinking about compartments - I completely agree with Kai that compartment calls are much less robust at high resolutions and that inspecting the actual eigenvectors is important.

One approach that I've found helpful is to convert the eigenvector BED-format file from FAN-C into a bigwig file, so you can then inspect the eigenvectors in your favourite genome browser where you can zoom in/out, load multiple samples to compare, etc. You can also plot smaller regions with fancplot (in the same way as you plotted whole chromosomes above) if you want to then cross-reference with the correlation matrix.

However I would definitely first check that the eigenvectors really seem to reflect compartmentalisation, and not chromosomal position / chromosomal arm. Depending on species and resolution, it may be that the second eigenvector better reflects compartmentalisation. I've also found that assigning the sign of the eigenvector according to GC content can sometimes not give consistent assignments across resolutions / samples, so it's worth checking this too. You can then re-assign the sign of the eigenvector using other data if necessary (e.g. histone modifications, gene density, etc).

@kaukrise
Copy link
Collaborator

kaukrise commented Aug 2, 2021

Hi @liz-is, thanks for chiming in! 100% agree, especially with the point about GC content.

@hbandukw
Copy link
Author

hbandukw commented Aug 2, 2021

Hi @liz-is, thanks for all the useful info.

So just one more thing, when checking the eigenvectors, should I even bother to look at any resolutions other than 1Mb?

@liz-is
Copy link
Contributor

liz-is commented Aug 2, 2021

Impossible to say without knowing more about your data. 10 kb resolution is unlikely to give sensible compartments unless you have extremely deep sequencing, IMO, but 100 kb could be fine. If you have already calculated the eigenvectors I'd say you might as well look at them!

@hbandukw
Copy link
Author

hbandukw commented Aug 3, 2021

Hi @liz-is and @kaukrise , I was looking to get some advice on whether the assigned eigenvectors (@ 100kb and 1Mb) are correctly reflecting compartmentalization.

  1. Chr12 @ 1Mb
    12_siControl-C2C12_1_1mb ab_and_ev

  2. Chr12 @ 100kb
    12_siControl-C2C12_1_100kb_ab_and_ev

Am I correct to think that the eigenvectors are corresponding to compartmentalization?

@liz-is
Copy link
Contributor

liz-is commented Aug 3, 2021

Yeah, they look good to me!

@hbandukw
Copy link
Author

hbandukw commented Aug 6, 2021

Hello,

I converted the domain bed files to bigwigs as suggested and was viewing them alongside some ChIPseqs of relevant histone marks in mice. I am having some strange things happen:

  1. My Replicates are displaying mirror images of eigenvectors at some loci (e.g. screen-shot 1) and not at others (e.g. screen-shot 2 and 3)

Plot 1
Screen Shot 2021-08-06 at 8 20 37 AM

Plot 2
Screen Shot 2021-08-06 at 9 16 33 AM

Plot 3
Screen Shot 2021-08-06 at 9 53 44 AM

Do you know why this is happening?

@liz-is
Copy link
Contributor

liz-is commented Aug 6, 2021

As I mentioned above, assigning the sign of the eigenvector using GC content doesn't always give robust results across samples. Compartment identification is done per-chromosome, so it's possible for the assignments to be consistent on one chromosome but not on another. I suspect that's what's happening here. Also, in Plot 3, even though the replicates are consistent with each other, it seems unlikely that a KO would cause almost a complete switch in compartments, so one of the conditions there is probably also assigned incorrectly.

You can use your histone ChIP-seq data (or gene density, but histone data is probably better) to reassign the sign of the eigenvector for each chromosome and I expect you'll see much more consistent profiles.

@kaukrise
Copy link
Collaborator

kaukrise commented Aug 6, 2021

In the time it took me to boot up my computer and sign in @liz-is beat me to it. :)

I'll just quote my previous post then, which still fits perfectly:

Hi @liz-is, thanks for chiming in! 100% agree, especially with the point about GC content.

@hbandukw
Copy link
Author

hbandukw commented Aug 6, 2021

Hi @liz-is and @kaukrise,

Ah ok! So I have access to public histone ChIPseq data for my control samples (i.e. similar cells + condition) but not from cells that have the specific KO that I have. I am assuming that many people have this problem? what do they do? If I don't have appropriate histone data for my KO, can I do anything else to orient the eigenvector signs?

Thanks again for all your great advice and help!!

@liz-is
Copy link
Contributor

liz-is commented Aug 6, 2021

I'm not sure what other people do to be honest, it seems no one talks much about this issue! What I've done in the past when I had a lot of conditions to compare was to take a control condition that had strong compartmentalisation and good sequencing depth, assign the sign of that (based on gene density in this case, then validated by comparison to chromatin and gene expression data), and assign all the others based on what correlated best to that reference eigenvector (code here if you are interested). This is all based on the assumption that the majority of regions won't have changes in compartmentalisation, of course. This approach usually works pretty well for me, but if you have reason to think that your KO will cause large portions of the genome to change compartment, then that gets very tricky.

@hbandukw
Copy link
Author

Hi @liz-is, sorry about that late reply. I will give what you said a shot. Thanks again for your help.

@hbandukw
Copy link
Author

Hi @liz-is, I am attempting to assign the eigenvector sign based on some histone data. I was wondering if I need to compare the eigenvectors per chromosome or not?

@liz-is
Copy link
Contributor

liz-is commented Aug 13, 2021

I would definitely recommend doing the assignment per chromosome, as the eigenvectors are calculated per-chromosome (FAN-C also assigns the eigenvector sign per-chromosome internally). Whether the assigned sign matches the histone data etc can therefore vary across different chromosomes, so you need to assess each one individually to see if it needs to be flipped.

@hbandukw
Copy link
Author

Sorry I need a little clarification:
When you say "flip", do you mean that I would be flipping the sign of the eigenvector?

So basically let's say I am going through Chromosome 1 of my control file and comparing it to permissive and repressive histone marks. I see that the neg-eigenvectors are correlating with the permissive mark, so then I say that neg-eigenvector == A/active compartment and pos-eigenvector == B compartment. Then I move on to Chromosome 2 and do the same check but this time, I see that pos-eigenvector == A compartment so for chrome B, I choose this assignment and so on.

Is that how it would work? Thanks again for all your help and feedback!

@liz-is
Copy link
Contributor

liz-is commented Aug 13, 2021

Yes, exactly! If you see that regions with a negative eigenvector value have high levels of permissive histone marks and low levels of repressive histone marks, that would indicate that the sign of the eigenvector has been assigned incorrectly. I would then multiply the eigenvector values for that chromosome by -1 to "flip" the eigenvector and use these "corrected" values for downstream analysis such as defining A and B compartment regions.

Does that make sense?

@hbandukw
Copy link
Author

hbandukw commented Aug 13, 2021

Ok so basically we are defining this before analysis: "Active (A) compartments = positive eigenvector values" and "Inactive (B) compartments = negative eigenvector values"
...and then as I assess regions per chromosome, If I see a negative eigenvector have high levels of permissive histone marks and low levels of repressive histone mark, then I am will flip the sign of the eigenvectors in that chrom-set so it corresponds to out definition, correct??

@liz-is
Copy link
Contributor

liz-is commented Aug 13, 2021

Yes, the convention established in the first Hi-C papers is that "positive eigenvector values = A compartment" and "negative eigenvector values = B compartment". I find it convenient to flip the sign of the eigenvector for each chromosome to align with this convention, as it makes it easier to then use them downstream for plotting as genomic tracks, making saddle plots, etc. Of course if you're not using the eigenvector values themselves downstream, only the A and B compartment regions, then you can assign the A / B regions directly and not flip the eigenvector values. What you describe above is how I would do it, though.

@hbandukw
Copy link
Author

Ah ok! I get it now! Thanks!!!

@hbandukw
Copy link
Author

Hello @liz-is, in your code, when compare the fraction of A/B-compartments that overlap with "active chromatin" and "inactive chromatin" (overlap / total_size) in your function "check_chromatin_colours" -
(https://github.com/vaquerizaslab/IngSimmons_et_al_dorsoventral_3D_genome/blob/main/scripts/compartment_analysis.Rmd) -, what do you do if these proportions are very similar?

E.g. For Chr2, when I compare the fraction of A/B compartments overlapping with H3K4me3 (active) and H3K27me3 (repressive) peaks, the proportions come out to be:

  1. A_H3K4me3 vs A_H3K27me3
    3822735 / 76113205 vs 4935985 / 76113205
    0.05022433360939143 vs 0.06485057356341781

  2. B_H3K4me3 vs B_H3K27me3
    1448829 / 105999982 vs 2717943 / 105999982
    0.013668200434222715 vs 0.025640976052241218

What do you do when the proportions are this similar?

@liz-is
Copy link
Contributor

liz-is commented Aug 31, 2021

I was on holiday so maybe you have already decided what to do, but I would say that you really have to interpret these based on the context and other data that you have. For example, the chromatin colours data for Drosophila contains information on >2 chromatin states, so a chromosome may not have much H3K27me3-type heterochromatin but the B compartment may be enriched for H3K9me3-type heterochromatin. If looking at one data type doesn't resolve which compartment is A and which is B, I would try to make a consensus across multiple data types (gene density, GC content, H3K27me3, H3K4me3, H3K9me3, etc).

@DittmanC
Copy link

DittmanC commented Sep 9, 2022

similar situation here, does the fanc tool or any tools suggested for me to flip the compartment and so that i can visualise in fanc compartments? I saw the commend only allows me to use --genome based on average GC content, but not allowing me to use my histone marks as the reference

@liz-is
Copy link
Contributor

liz-is commented Oct 22, 2022

In my code that's linked above there's code to flip based on correlation with any data of interest. I asked once about incorporating something similar in FAN-C directly (#52), but it sounded like it would be tricky to make this computationally efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants