# DataFrames from D-Wave: Calculating Average Spin Values

This notebook looks at `.csv` files that contain response data from a D-Wave computer requested to solve specific Ising models. 

On D-Wave, an Ising model may be represented using a Binary Quadratic Model `bqm`. The `bqm` used for this system is defined by:
```python
import dimod

h = {k:0 for k in range(50)}
J = {(k, k+1):1 for k in range(49)}
offset = 0.0

bqm = dimod.BinaryQuadraticModel(h, J, 'SPIN')
```

## Ordering the DataFrames
The DataFrames are loaded in and then ordered by `num_occurrences`, such that the high-probability states come first. The most probable states should coincide with the ground state(s) of the system. 

In [16]:
import pandas as pd

df_1 = pd.read_csv('dwave-50spins-1.csv')
df_2 = pd.read_csv('dwave-50spins-2.csv')
df_3 = pd.read_csv('dwave-50spins-3.csv')


# Each df has an 'Unnamed: 0' column which won't be
# used at all, so it gets dropped from all three
for df in [df_1, df_2, df_3]:
    df.drop(columns=['Unnamed: 0'], inplace=True)

df_1.sort_values(by=['num_occurrences'], ascending=False, inplace=True)
df_1.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,43,44,45,46,47,48,49,chain_break_fraction,energy,num_occurrences
0,1,-1,1,-1,1,-1,1,-1,1,-1,...,-1,1,-1,1,-1,1,-1,0.0,-49.0,2449
1,-1,1,-1,1,-1,1,-1,1,-1,1,...,1,-1,1,-1,1,-1,1,0.0,-49.0,905
96,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,51
42,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,47
13,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,44


In [17]:
df_2.sort_values(by=['num_occurrences'], ascending=False, inplace=True)
df_2.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,43,44,45,46,47,48,49,chain_break_fraction,energy,num_occurrences
0,1,-1,1,-1,1,-1,1,-1,1,-1,...,-1,1,-1,1,-1,1,-1,0.0,-49.0,1991
1,-1,1,-1,1,-1,1,-1,1,-1,1,...,1,-1,1,-1,1,-1,1,0.0,-49.0,1093
19,-1,1,-1,1,-1,1,-1,1,-1,1,...,-1,1,-1,1,-1,1,-1,0.0,-47.0,58
59,-1,1,-1,1,-1,1,-1,1,-1,1,...,-1,1,-1,1,-1,1,-1,0.0,-47.0,54
11,-1,1,-1,1,-1,1,-1,1,-1,1,...,-1,1,-1,1,-1,1,-1,0.0,-47.0,50


In [18]:
df_3.sort_values(by=['num_occurrences'], ascending=False, inplace=True)
df_3.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,43,44,45,46,47,48,49,chain_break_fraction,energy,num_occurrences
0,-1,1,-1,1,-1,1,-1,1,-1,1,...,1,-1,1,-1,1,-1,1,0.0,-49.0,2547
1,1,-1,1,-1,1,-1,1,-1,1,-1,...,-1,1,-1,1,-1,1,-1,0.0,-49.0,969
77,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,42
8,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,39
75,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,34


In [19]:
# Reset the index so low indices should return
# low energies
df_1.reset_index(drop=True, inplace=True)
df_2.reset_index(drop=True, inplace=True)
df_3.reset_index(drop=True, inplace=True)

In [20]:
df_3.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,43,44,45,46,47,48,49,chain_break_fraction,energy,num_occurrences
0,-1,1,-1,1,-1,1,-1,1,-1,1,...,1,-1,1,-1,1,-1,1,0.0,-49.0,2547
1,1,-1,1,-1,1,-1,1,-1,1,-1,...,-1,1,-1,1,-1,1,-1,0.0,-49.0,969
2,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,42
3,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,39
4,1,-1,1,-1,1,-1,1,-1,1,-1,...,1,-1,1,-1,1,-1,1,0.0,-47.0,34


For all three DataFrames, either one of the first two rows may represent a ground state solution as both have the lowest energy and highest number of occurrences (the ground state should be the state with the highest likelihood). This does not mean "dismiss the rest of the data": the ground states only make up ~60% of the returned states.

## Comparisons

A quick call of `pandas.DataFrame.compare()` reveals different values between the contents of two specified DataFrames. This method is used to take a glance at the difference in states of the reponse DataFrames.

In [21]:
# "self" is the value of df_1 there,
# and "other" is the value of df_2
df_1[:5].compare(df_2[:5])

Unnamed: 0_level_0,0,0,1,1,2,2,3,3,4,4,...,46,46,47,47,48,48,49,49,num_occurrences,num_occurrences
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
0,,,,,,,,,,,...,,,,,,,,,2449,1991
1,,,,,,,,,,,...,,,,,,,,,905,1093
2,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,...,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,51,58
3,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,...,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,47,54
4,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,...,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,44,50


In [22]:
df_1[:5].compare(df_3[:5])

Unnamed: 0_level_0,0,0,1,1,2,2,3,3,4,4,...,46,46,47,47,48,48,49,49,num_occurrences,num_occurrences
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
0,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,...,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,2449,2547
1,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,...,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,905,969
2,,,,,,,,,,,...,,,,,,,,,51,42
3,,,,,,,,,,,...,,,,,,,,,47,39
4,,,,,,,,,,,...,,,,,,,,,44,34


In [23]:
df_2[:5].compare(df_3[:5])

Unnamed: 0_level_0,0,0,1,1,2,2,3,3,4,4,...,46,46,47,47,48,48,49,49,num_occurrences,num_occurrences
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,...,self,other,self,other,self,other,self,other,self,other
0,1,-1,-1,1,1,-1,-1,1,1,-1,...,1,-1,-1,1,1,-1,-1,1,1991,2547
1,-1,1,1,-1,-1,1,1,-1,-1,1,...,-1,1,1,-1,-1,1,1,-1,1093,969
2,-1,1,1,-1,-1,1,1,-1,-1,1,...,1,-1,-1,1,1,-1,-1,1,58,42
3,-1,1,1,-1,-1,1,1,-1,-1,1,...,1,-1,-1,1,1,-1,-1,1,54,39
4,-1,1,1,-1,-1,1,1,-1,-1,1,...,1,-1,-1,1,1,-1,-1,1,50,34


## Summary Statistics

A helpful way to look at the summary statistics of the specified Ising model would be to gather all the samples across all three DataFrames and populate them in one single DataFrame. This new DataFrame should be populated in a way such that a given row has `num_occurrences` entries in the DataFrame.

In [24]:
concat_df = pd.concat([df_1, df_2])
concat_df = pd.concat([concat_df, df_3])

len(concat_df)

1068

In [25]:
df = pd.DataFrame(
        concat_df.values.repeat(
                concat_df['num_occurrences'], 
                axis=0
            ),
        columns=concat_df.columns
    )
df.drop(columns=['num_occurrences'], inplace=True)

df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,42,43,44,45,46,47,48,49,chain_break_fraction,energy
0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0
1,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0
2,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0
3,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0
4,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0


In [26]:
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,42,43,44,45,46,47,48,49,chain_break_fraction,energy
count,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,...,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0
mean,0.088933,-0.088667,0.089333,-0.092,0.093067,-0.092667,0.091467,-0.0904,0.090667,-0.092,...,0.042,-0.0448,0.0452,-0.0472,0.044133,-0.043467,0.0436,-0.042933,0.0,-48.198267
std,0.996071,0.996095,0.996035,0.995792,0.995693,0.99573,0.995841,0.995939,0.995914,0.995792,...,0.999151,0.999029,0.999011,0.998919,0.999059,0.999088,0.999082,0.999111,0.0,1.232339
min,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-49.0
25%,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-49.0
50%,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,...,1.0,-1.0,1.0,-1.0,1.0,-1.0,1.0,-1.0,0.0,-49.0
75%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,-47.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,-41.0


In [27]:
# Export full sheet
df.to_csv('dwave-full-sheet.csv')
# Export summary statistics
df.describe().to_csv('dwave-stats-sheet.csv')