Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #31 support deserialization of IEnumerable with Add methods. #33

Closed
wants to merge 5 commits into from

Conversation

nikolaygekht
Copy link

Motivation:

The current version of the library serialized any class that supports IEnumerable BUT expected it to be ICollection at deserialization. So, the classes that satisfy C# collection initialization pattern (IEnumerable + Add method) aren't properly deserialized.

Unfortunately, the existence of the ICollection interface is checked at the very last moment, in GenericReader.Read*Enumerable method. So, it is practically impossible to introduce a new object type for its own way to deserialize. I've also tried to replace ICollection with IEnumerable with implementing Add extension method to avoid massive code replication, but it works 2x slower for the ICollections.

So, the final implementation is:

  1. There is a method Add for IEnumerable implemented via extension

  2. At every switch branch for each object type in the collection a new branch for IEnumerable is added, in case the class does not implement the collection.

It added quite a few lines of code (~600 new lines of code) to GenericReader, but I see no option to avoid it without refactoring the whole approach that forces to replicate the code now.

The implementation for ICollection works as fast as before (as nothing changed in this branch of execution)

Implementation of IEnumerable + Add method call works ~8x slower than pure ICollection. There are options for how to optimize that further, however, the current implementations seem to be MVP for me and I would rather play first with code emitting which will help to avoid even more massive code replication.

Changes:
In the main library:

  • A new small class IEnumerableExtension that provides Add method for IEnumerable

  • Changes in GenericReader, methods ReadHEnumerable and ReadTypedEnumerable to add IEnumerable option at every switch branch where ICollection is used

In tests

  • A test collection class is added
  • Three tests (int, string, and object) are added to ListSerializationTest

p.s. "remove local development infrastructure files" commit removes files I have added to debug solution. Unfortunately, debugging via unit tests is almost not possible, as thousands of exceptions for the test case generator class prevent the successful start of the unit tests under the debugger (another point to think of, by the way). So, I've added a simple console application locally to debug, test performance, and other needs. This commit removes simply this test application from the project.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void Add<T>(this IEnumerable<T> enumerable, T value)
{
if (!gAdders.TryGetValue(enumerable.GetType(), out Action<object, object> action))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use GetOrAdd here to stay similar with the other parts of the codebase?

Also, need to think about the unnecessary penalty of doing a concurrent dictionary lookup each time you need to Add an item - we should be able to hoist this out of the Add loop (i.e. try get the action just once and also skip the Add loop if an appropriate Add method isn't found).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem with the change to GetOrAdd. Will do.

What about penalty... well, an alternative option is not to use the extension method at all.
We may replicate the reflection code into every loop that currently uses this extension method. We will lose a little bit because we will search for the method at every deserialization, but won't use a dictionary at every add option. However, that means that we would have 51 copies of this code, if I am not mistaken. :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every deserialization means to call of FindMethod once per object instead of calling once for the type. However, all subsequent add calls would use a local variable. I am not 100% confident how much performance we would gain, as the slowest part here AFAIK is reflection call anyway, it would be interesting to try on one type code and compare.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean rather than an extension method, let's do something like this:

    var valueType = Reader.ReadSerializedType(reader);
    var add = TryGetAddAction(...);
    if (add != null)
        while (Reader.ReadEnumerableType(reader) == EnumerableType.HasItem)
        {
            add(...);
        }

You still want that dictionary to prevent reflection cost. But the cost of lookup is just once for the whole loop, not once per loop element.

Copy link
Author

@nikolaygekht nikolaygekht Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I meant the same.

However, the performance appears to be equal. :-(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, from one side I like your option better, but even after replacing the invoke with code emitting, the best I get is a 10% improvement.

Well, it would take quite a few minutes to replace every branch (I tested at one of them), I'll resolve the conversation as I finish.

{
internal static class IEnumerableExtension
{
private static ConcurrentDictionary<Type, Action<object, object>> gAdders = new ConcurrentDictionary<Type, Action<object, object>>();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps follow the existing convention, e.g.

    private static readonly ConcurrentDictionary<Type, Func<object>> Activators = 
        new ConcurrentDictionary<Type, Func<object>>();

@zachsaw
Copy link
Owner

zachsaw commented Feb 11, 2021

This looks good performance wise but can be improved a bit more - I've added comments inline.

Oh also can you squash the commits?

…ugh for var x = new T() { e1, e2, e3 } syntax.

- add debug code

- refactor the method of restoring IEnumerable with Add method (zachsaw#31)
- support of nullable in collections and as properties (zachsaw#34, zachsaw#35, zachsaw#36)
@nikolaygekht
Copy link
Author

Ok, seems like I couldn't squash commits for the existing pull request, let's try creating another one. :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants