In [1]:
from abc import ABC
from enum import Enum

import pandas as pd
import numpy as np

# Virtual Subclasses
Below code shows our strategy for how we can leverage virtual subclassing in order to create separate, possibly unrelated Enums, and still make sure that they satisfy the same supertype. The key is that we can simply register the Enums with the same superclass. 

Note that the danger of virtual subclassing is that we violate the [Liskov Substitution Principle](https://en.wikipedia.org/wiki/Liskov_substitution_principle), which is usually a bad idea. Thus, we should be careful to restrict the use of virtual subclasses to cases such as this where we don't care about subclasses satisfying the same interface. Instead, making the spread classes count as the same type serves the purpose of telling the static type checker that we are indeed passing an object of the right type (even though what counts is the right type may depend on the specifics of the subclass, and this not defined by a common interface).

In [2]:
class BaseDataFormat(ABC):
    pass

class StructuredDataFormat(Enum):
    PD_DATAFRAME = pd.DataFrame
    NP_ARRAY = np.ndarray

BaseDataFormat.register(StructuredDataFormat);


This confirms that registering enum as subclass works (for concrete members of the enum, which is what we want):

In [3]:
assert isinstance(
    StructuredDataFormat.PD_DATAFRAME, 
    BaseDataFormat
)

This doesn't apply for the enum itself, but we don't need that anyway.

In [4]:
isinstance(
    StructuredDataFormat, 
    BaseDataFormat
)

False

Let's check the type of the two, just out of curiosity.

In [5]:
type(StructuredDataFormat)

enum.EnumType

In [6]:
type(StructuredDataFormat.NP_ARRAY)

<enum 'StructuredDataFormat'>

## Side Note: How to achieve this without virtual subclassing
Before I settled on virtual subclassing is the preferred solution, implemented the same behavior in the following way:

In [9]:
class BaseDataFormat(Enum):
    """
    The fact that this enum does not have any attributes/members makes it
    abstract, since it is thus not possible to instantiate it.
    While more direct ways of designating this class as an abstractclass, e.g.
    using abc.ABC or abc.abstractmethod would be preferable, this doesn't seem
    easily possible due to some differences between enums and standard classes.
    See for example the following discussion: https://stackoverflow.com/questions/56131308/create-an-abstract-enum-class
    """
    pass

class StructuredDataFormat(BaseDataFormat):
    PD_DATAFRAME = pd.DataFrame
    NP_ARRAY = np.ndarray

In [10]:
assert isinstance(
    StructuredDataFormat.PD_DATAFRAME, 
    BaseDataFormat
)

While this works, and may seem to be of similar complexity if you look at the final version of the code, I did encounter a number of errors with hard-to-decipher error messages when trying slightly different versions. Thus, I settled on moving to virtual subclassing because it is less error-prone, and also because it generalizes easily to other use cases (e.g., for configs we may want to use unrelated data classes, but so make sure that they count as the same type).

# Handling of duplicate values

Let's see what happens if we use the same *value* for another member of the enum.

In [7]:
class StructuredDataFormat(Enum):
    PD_DATAFRAME = pd.DataFrame
    FIRST_MEMBER = 'duplicate value'
    SECOND_MEMBER = 'duplicate value'  # Different member has same value!
    
StructuredDataFormat.FIRST_MEMBER


<StructuredDataFormat.FIRST_MEMBER: 'duplicate value'>

In [8]:
# Looking up duplicate value returns first member
StructuredDataFormat('duplicate value')

<StructuredDataFormat.FIRST_MEMBER: 'duplicate value'>

We see that the duplicate value  doesn't cause any visible problems. We should still try to avoid this, since we don't know what other problems this may cause. Thus, if we do encounter a case where a given member maps to the same data format, we should instead consider whether we can create a more abstract member that encompasses both cases. If that's not possible/desirable, we may want to dig deeper to make sure it is safe to use duplicate values. (While our example does show that the lookup of member by value is incomplete, as it only returns the first member, I think this limitation shouldn't matter, as I can't foresee any use case where we would want to do this reverse lookup anyway.)

Overall, **it's reassuring to know we don't have to be as careful, and I think this justifies not complicating our code by adding custom logic to check for duplicate values.** 
