Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of a correlation map #1945

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Example of a correlation map #1945

wants to merge 4 commits into from

Conversation

firasm
Copy link

@firasm firasm commented Feb 3, 2020

Here's a PR of a correlation map that my students (@vcuspinera and @AndresPitta) created.

the output of this example is

visualization-15

Not sure if this is something worth adding to the examples and admittedly this is similar to the Layered heat map with text example.

I think it would be worth adding if I could show only half of the correlation matrix like this example from here
ggplot2-correlation-matrix-heatmap-add-correlation-coefficients-1

heatmap = alt.Chart(corrMatrix_line).encode(
alt.Y('Var1:N', title = ''),
alt.X('Var2:N', title = '', axis=alt.Axis(labelAngle=20))
).mark_rect().encode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move mark_rect() to directly after alt.Chart() and only have a single call to encode() like many of the other examples.

@@ -0,0 +1,52 @@
"""
Correlation matrix
--------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important that the length of the underline matches the length of the title when the docs are compiled in Sphinx. You just need to add a few more dashes.

@firasm
Copy link
Author

firasm commented Feb 7, 2020

Thanks @eitanlees I'll address your comments soon!

@firasm
Copy link
Author

firasm commented Feb 8, 2020

In the commit above, now have examples of a full correlation matrix, as well as a less redundant one with diagonals and upper triangle removed:

visualization-18

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 29, 2020

Sorry - this fell off my radar.

Looking at it, it seems like a fairly immense amount of code to create a relatively straightforward chart, so I'm hesitant to add this example as-is to the main example gallery.

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 29, 2020

Maybe simplify it to something like this?

import altair as alt
from vega_datasets import data

df_iris = data.iris()
corrMatrix = df_iris.corr().reset_index().melt('index')
corrMatrix.columns = ['var1', 'var2', 'correlation']

base = alt.Chart(corrMatrix).transform_filter(
    alt.datum.var1 < alt.datum.var2
).encode(
    x='var1',
    y='var2',
).properties(
    width=alt.Step(100),
    height=alt.Step(100)
)

rects = base.mark_rect().encode(
    color='correlation'
)

text = base.mark_text(
    size=30
).encode(
    text=alt.Text('correlation', format=".2f"),
    color=alt.condition(
        "datum.correlation > 0.5",
        alt.value('white'),
        alt.value('black')
    )
)

rects + text

visualization - 2020-03-29T075750 860

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 29, 2020

Or, if you want both versions of the chart together:

import altair as alt
from vega_datasets import data

df_iris = data.iris()
corrMatrix = df_iris.corr().reset_index().melt('index')
corrMatrix.columns = ['var1', 'var2', 'correlation']

chart = alt.Chart(corrMatrix).mark_rect().encode(
    x=alt.X('var1', title=None),
    y=alt.Y('var2', title=None),
    color=alt.Color('correlation', legend=None),
).properties(
    width=alt.Step(80),
    height=alt.Step(80)
)

chart += chart.mark_text(size=25).encode(
    text=alt.Text('correlation', format=".2f"),
    color=alt.condition(
        "datum.correlation > 0.5",
        alt.value('white'),
        alt.value('black')
    )
)

chart | chart.transform_filter("datum.var1 < datum.var2")

visualization - 2020-03-29T080703 162

@firasm
Copy link
Author

firasm commented Apr 8, 2020

Thanks that is indeed much cleaner! I'm happy with the above and can submit a commit once the term is over...

@harabat
Copy link
Contributor

harabat commented Mar 10, 2021

@jakevdp @firasm Assuming that wanting to sort the labels of a heatmap in non-alphabetical order is not rare (spent a lot of time on this personally), would it make sense to modify this example to allow for a custom sort?

For example, if I want to have the rows and columns sorted in this order: 'petalWidth', 'petalLength', 'sepalWidth', 'sepalLength'

import altair as alt
from vega_datasets import data

# create corr map
source = data.iris()
source_corr = source.corr().reset_index().melt(id_vars='index')

# create dummy ordinal var
sort = {'petalWidth': 0, 'petalLength': 1, 'sepalWidth': 2, 'sepalLength': 3}

heatmap = alt.Chart(source_corr)\
.mark_rect()\
.transform_calculate(
    order_rows='%s [datum.index]' % sort,
    order_cols='%s [datum.variable]' % sort
)\
.transform_filter(alt.datum.order_rows <= alt.datum.order_cols)\
.encode(
    alt.X('index:N', title=None, sort=list(sort.keys())),
    alt.Y('variable:N', title=None, sort=list(sort.keys())),
    alt.Color('value:Q', legend=None)
)\
.properties(width=300, height=300)

text = heatmap\
.mark_text(size=25)\
.encode(
    alt.Text('value:Q', format='.2f'),
    color=alt.condition(
        'datum.value > 0.5',
        alt.value('white'),
        alt.value('black')
    )
)

heatmap + text

heatmap_with_custom_sort

Adapted from this StackOverflow question.

@joelostblom
Copy link
Contributor

I started working on a package to facilitate creating these plots that might be too complex for the gallery, and that you would want to have easily accessible when doing EDA etc. I included correlation plots, even if they looks somewhat different from what is suggested here:

image

You can see some more examples here. I haven't created a release on PyPI yet and I still need to fix some things, but am happily accepting suggestions for what to include. Also @jakevdp, let me know if you want me to name it something else, in case altair_ally sounds too official and you want that pattern reserved for packages in the altair-viz repo.

@pedromorais007
Copy link

Is it possible to change the jakevdp graph layout from blue colors to red colors?

@mattijn
Copy link
Contributor

mattijn commented Jan 7, 2023

@pedromorais007 see this answer: #2779

@pedromorais007
Copy link

Thanks mattijn for your suggestion.
I tried to put the code line:
color=alt.Color('z:Q', scale=alt.Scale(scheme="reds"))
in my correlation altair matrix but the color still in blue. Nothing has changed.

    base = alt.Chart(corrMatrix).transform_filter(alt.datum.var1 < alt.datum.var2).encode(  
        x='var1', 
        y='var2', 
        color=alt.Color('z:Q', scale=alt.Scale(scheme="reds"))
        ).properties(
            width=alt.Step(100), height=alt.Step(100), )   
    rects = base.mark_rect().encode(color='correlation')    
    text = base.mark_text(size=20).encode(
        text=alt.Text('correlation', format=".2f"),
        color=alt.condition("datum.correlation > 0.5", alt.value('white'),alt.value('black'),)
        ) 
    st.altair_chart(rects + text) 

@mattijn
Copy link
Contributor

mattijn commented Jan 7, 2023

with a normal heatmap this works:

import altair as alt
import numpy as np
import pandas as pd

# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x**2 + y**2

# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({"x": x.ravel(), "y": y.ravel(), "z": z.ravel()})

c = alt.Chart(source, height=alt.Step(12), width=alt.Step(12)).mark_rect().encode(
    x="x:O",
    y="y:O", 
    color=alt.Color("z:Q", scale=alt.Scale(scheme='reds'))
)
c + c.mark_text(size=7).encode(text=alt.Text("z"), color=alt.value("white"))

image

I suspect something is overruling the color scheme in streamlit what you seems using (st.altair_chart())

@ChristopherDavisUCI
Copy link
Contributor

@pedromorais007 Does it work the way you want if you remove color='correlation' from rects? Like @mattijn said, I believe that is overruling the color definition in base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants