New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplication of entries during item creation? #50

Closed
markerikson opened this Issue Oct 21, 2016 · 11 comments

Comments

Projects
None yet
3 participants
@markerikson
Collaborator

markerikson commented Oct 21, 2016

I'm doing research for a blog post on some techniques for using Redux-ORM. One example that I want to talk about is using Redux-ORM as a replacement for Normalizr. It looks like Redux-ORM only partly handles de-duplicating entries with the same ID.

The Normalizr readme gives the following example data:

[{
  id: 1,
  title: 'Some Article',
  author: {
    id: 1,
    name: 'Dan'
  }
}, {
  id: 2,
  title: 'Other Article',
  author: {
    id: 1,
    name: 'Dan'
  }
}]

I wrote a couple quick models that know how to parse that data:

class Author extends Model {
    static parse(data) {
        return this.create(data);
    }
}

Author.fields = {
    books : many("Book")
};
Author.modelName = "Author";

class Book extends Model {

    static parse(data) {
        const {Author} = this.session;

        const parsedData = {
            ...data,
            author : Author.parse(data.author),
        };

        return this.create(data);
    }
}

Book.fields = {
    author : oneToOne("Author")
};
Book.modelName = "Book";

I then ran the following sequence:

const defaultState = schema.getDefaultState();
const session = schema.from(defaultState);
const {Author, Book} = session;

books.map(book => Book.parse(book));

const data = session.reduce();
console.log(data);

const session2 = schema.from(data);

session2.Author.all().withRefs.forEach(author => console.log(author));

The Author table definitely only had a single entry, {1 : {id : 1, name : "Dan"}}. However, the items array had the ID in there twice: [1, 1]. As a result, the Author.all() line logged out two entries instead of one.

At a quick glance, it looks like this is because Backend.insert() simply pushes the new item's ID into the array, without any checks. For that matter, it looks as though the second author object replaced the first. If I had two different fields to the two author object definitions, both of the logged values match the second definition.

Any suggestions on ideas for better merge handling in cases like this?

@tommikaikkonen

This comment has been minimized.

Show comment
Hide comment
@tommikaikkonen

tommikaikkonen Oct 25, 2016

Owner

Good catch! This definitely needs to be handled.

One possibility would be to make creating an already existing object raise an error. This is what the ORM I work with on the backend does (Django), which I think is reasonable. If that were implemented, for the use case here, you'd use a method like updateOrCreate which would check if an entity with those attributes exists, and choose to update or create based on that. You can already add that method to your own model since it doesn't require anything to change internally.

Implementing this is straightforward so I'll be putting out a patch with a couple of simple test cases. Let me know if you have a different opinion on what the library should do when trying to create an already existing entity.

Owner

tommikaikkonen commented Oct 25, 2016

Good catch! This definitely needs to be handled.

One possibility would be to make creating an already existing object raise an error. This is what the ORM I work with on the backend does (Django), which I think is reasonable. If that were implemented, for the use case here, you'd use a method like updateOrCreate which would check if an entity with those attributes exists, and choose to update or create based on that. You can already add that method to your own model since it doesn't require anything to change internally.

Implementing this is straightforward so I'll be putting out a patch with a couple of simple test cases. Let me know if you have a different opinion on what the library should do when trying to create an already existing entity.

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Oct 25, 2016

Collaborator

The use case I have here would be iterating over a batch of incoming data, and trying to call create for multiple items that may be duplicates (basically, the Normalizr scenario). In that scenario, I believe there would be multiple CREATE actions queued up for the same item ID, but since the item wouldn't actually be "created" yet, you wouldn't be able to execute a straightforward check to determine whether a "create" or "update" call is needed.

Collaborator

markerikson commented Oct 25, 2016

The use case I have here would be iterating over a batch of incoming data, and trying to call create for multiple items that may be duplicates (basically, the Normalizr scenario). In that scenario, I believe there would be multiple CREATE actions queued up for the same item ID, but since the item wouldn't actually be "created" yet, you wouldn't be able to execute a straightforward check to determine whether a "create" or "update" call is needed.

@tommikaikkonen

This comment has been minimized.

Show comment
Hide comment
@tommikaikkonen

tommikaikkonen Nov 1, 2016

Owner

Been thinking about your problem and I realized that with the batched mutations implemented, it wouldn't be too hard to directly apply each edit directly without waiting for the getNextState call - the batched mutations won't mutate any existing state, but when any new objects are created during a session, it'll apply further updates to them with mutations. Any queries during an updating session would recognize the updated state. I know you've implemented some logic to hook into the action/update log inside Redux-ORMs to work around that, this might actually help with those too.

It's a big breaking change though. I might be forgetting some edge cases as well...

Owner

tommikaikkonen commented Nov 1, 2016

Been thinking about your problem and I realized that with the batched mutations implemented, it wouldn't be too hard to directly apply each edit directly without waiting for the getNextState call - the batched mutations won't mutate any existing state, but when any new objects are created during a session, it'll apply further updates to them with mutations. Any queries during an updating session would recognize the updated state. I know you've implemented some logic to hook into the action/update log inside Redux-ORMs to work around that, this might actually help with those too.

It's a big breaking change though. I might be forgetting some edge cases as well...

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Nov 1, 2016

Collaborator

Yeah, there's definitely some questions about exactly how that should behave. Come to think of it, it would be worth running the same test with Normalizr to see exactly how it handles things.

Collaborator

markerikson commented Nov 1, 2016

Yeah, there's definitely some questions about exactly how that should behave. Come to think of it, it would be worth running the same test with Normalizr to see exactly how it handles things.

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Mar 15, 2017

Collaborator

Hi! I'm playing around with 0.9 for the first time, and wanted to see where this stood.

I ran the following test code, based off my previous example and the current docs:

import { ORM, fk, many, attr, Model } from 'redux-orm';

class Book extends Model {
  static get fields() {
    return {
      id: attr(),
      name: attr(),
      releaseYear: attr(),
      author: fk('Author', 'books')
    };
  }
    
  static parse(data) {
    const {Author} = this.session;

    const parsedData = {
      ...data,
      author : Author.parse(data.author)
    };

    return this.create(parsedData);
  }
}

Book.modelName = 'Book';

class Author extends Model {
  static get fields() {
    return {
      id: attr(),
      name: attr(),
      age : attr(),
      location : attr()
    };
  }
  
  static parse(data) {
    return this.create(data);
  }
}
Author.modelName = 'Author';

const books = [{
  id: 1,
  title: 'Some Article',
  author: {
    id: 1,
    name: 'Dan #1',
    age : 24
  }
}, {
  id: 2,
  title: 'Other Article',
  author: {
    id: 1,
    name: 'Dan #2',
    location : "London"
  }
}];

const orm = new ORM();
orm.register(Book, Author);

const initialState = orm.getEmptyState(); 
const session = orm.session(initialState); 

const {Book : BookModel, Author : AuthorModel} = session;

const bookModels = books.map(book => BookModel.parse(book));

bookModels.forEach(bm => console.log(bm.ref))

AuthorModel.all().toRefArray().forEach(am => console.log(am));

console.log(session.state)

This gave me the following output in WebpackBin:

{
    id: 1,
    title: "Some Article",
    author: 1
}
{
    id: 2,
    title: "Other Article",
    author: 1
}


{
    id: 1,
    name: "Dan #2",
    location: "London"
}
{
    id: 1,
    name: "Dan #2",
    location: "London"
}


{
    Book: {
        items: [
            1,
            2
        ],
        itemsById: {
            1: {
                id: 1,
                title: "Some Article",
                author: 1
            },
            2: {
                id: 2,
                title: "Other Article",
                author: 1
            }
        },
        meta: {
            maxId: 2
        }
    },
    Author: {
        items: [
            1,
            1
        ],
        itemsById: {
            1: {
                id: 1,
                name: "Dan #2",
                location : "London"
            }
        },
        meta: {
            maxId: 2
        }
    }
}

Based on that, it doesn't look like the de-dupe behavior has changed. Was #62 supposed to fix this, or just make it possible to improve the behavior?

I'll try to take a look through the code to get a better idea what's going on.

Collaborator

markerikson commented Mar 15, 2017

Hi! I'm playing around with 0.9 for the first time, and wanted to see where this stood.

I ran the following test code, based off my previous example and the current docs:

import { ORM, fk, many, attr, Model } from 'redux-orm';

class Book extends Model {
  static get fields() {
    return {
      id: attr(),
      name: attr(),
      releaseYear: attr(),
      author: fk('Author', 'books')
    };
  }
    
  static parse(data) {
    const {Author} = this.session;

    const parsedData = {
      ...data,
      author : Author.parse(data.author)
    };

    return this.create(parsedData);
  }
}

Book.modelName = 'Book';

class Author extends Model {
  static get fields() {
    return {
      id: attr(),
      name: attr(),
      age : attr(),
      location : attr()
    };
  }
  
  static parse(data) {
    return this.create(data);
  }
}
Author.modelName = 'Author';

const books = [{
  id: 1,
  title: 'Some Article',
  author: {
    id: 1,
    name: 'Dan #1',
    age : 24
  }
}, {
  id: 2,
  title: 'Other Article',
  author: {
    id: 1,
    name: 'Dan #2',
    location : "London"
  }
}];

const orm = new ORM();
orm.register(Book, Author);

const initialState = orm.getEmptyState(); 
const session = orm.session(initialState); 

const {Book : BookModel, Author : AuthorModel} = session;

const bookModels = books.map(book => BookModel.parse(book));

bookModels.forEach(bm => console.log(bm.ref))

AuthorModel.all().toRefArray().forEach(am => console.log(am));

console.log(session.state)

This gave me the following output in WebpackBin:

{
    id: 1,
    title: "Some Article",
    author: 1
}
{
    id: 2,
    title: "Other Article",
    author: 1
}


{
    id: 1,
    name: "Dan #2",
    location: "London"
}
{
    id: 1,
    name: "Dan #2",
    location: "London"
}


{
    Book: {
        items: [
            1,
            2
        ],
        itemsById: {
            1: {
                id: 1,
                title: "Some Article",
                author: 1
            },
            2: {
                id: 2,
                title: "Other Article",
                author: 1
            }
        },
        meta: {
            maxId: 2
        }
    },
    Author: {
        items: [
            1,
            1
        ],
        itemsById: {
            1: {
                id: 1,
                name: "Dan #2",
                location : "London"
            }
        },
        meta: {
            maxId: 2
        }
    }
}

Based on that, it doesn't look like the de-dupe behavior has changed. Was #62 supposed to fix this, or just make it possible to improve the behavior?

I'll try to take a look through the code to get a better idea what's going on.

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Mar 15, 2017

Collaborator

After a quick trace through, looks like the key logic is Table.insert(), which currently looks like:

    insert(tx, branch, entry) {
        const { batchToken, withMutations } = tx;

        const hasId = entry.hasOwnProperty(this.idAttribute);

        let workingState = branch;

        // This will not affect string id's.
        const [newMaxId, id] = idSequencer(this.getMaxId(branch), entry[this.idAttribute]);
        workingState = this.setMaxId(tx, branch, newMaxId);

        const finalEntry = hasId
            ? entry
            : ops.batch.set(batchToken, this.idAttribute, id, entry);

        if (withMutations) {
            ops.mutable.push(id, workingState[this.arrName]);
            ops.mutable.set(id, finalEntry, workingState[this.mapName]);
            return {
                state: workingState,
                created: finalEntry,
            };
        }

        const nextState = ops.batch.merge(batchToken, {
            [this.arrName]: ops.batch.push(batchToken, id, workingState[this.arrName]),
            [this.mapName]: ops.batch.merge(batchToken, { [id]: finalEntry }, workingState[this.mapName]),
        }, workingState);

        return {
            state: nextState,
            created: finalEntry,
        };
    }

Based on that, yeah, it appears to unconditionally insert the ID into the array, and does appear to still do a replace rather than a merge.

@tommikaikkonen , I know you said you're really busy. Is this something you still hope to work on before releasing 0.9 final, or is it later in the task list?

Collaborator

markerikson commented Mar 15, 2017

After a quick trace through, looks like the key logic is Table.insert(), which currently looks like:

    insert(tx, branch, entry) {
        const { batchToken, withMutations } = tx;

        const hasId = entry.hasOwnProperty(this.idAttribute);

        let workingState = branch;

        // This will not affect string id's.
        const [newMaxId, id] = idSequencer(this.getMaxId(branch), entry[this.idAttribute]);
        workingState = this.setMaxId(tx, branch, newMaxId);

        const finalEntry = hasId
            ? entry
            : ops.batch.set(batchToken, this.idAttribute, id, entry);

        if (withMutations) {
            ops.mutable.push(id, workingState[this.arrName]);
            ops.mutable.set(id, finalEntry, workingState[this.mapName]);
            return {
                state: workingState,
                created: finalEntry,
            };
        }

        const nextState = ops.batch.merge(batchToken, {
            [this.arrName]: ops.batch.push(batchToken, id, workingState[this.arrName]),
            [this.mapName]: ops.batch.merge(batchToken, { [id]: finalEntry }, workingState[this.mapName]),
        }, workingState);

        return {
            state: nextState,
            created: finalEntry,
        };
    }

Based on that, yeah, it appears to unconditionally insert the ID into the array, and does appear to still do a replace rather than a merge.

@tommikaikkonen , I know you said you're really busy. Is this something you still hope to work on before releasing 0.9 final, or is it later in the task list?

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Mar 15, 2017

Collaborator

One other side suggestion because I don't want to file a whole new issue :)

Perhaps consider renaming createReducer to createOrmReducer ? There's already a million utils out there named createReducer, which generally implement "case reducer lookup table" functionality. Be nice to differentiate things a bit.

Collaborator

markerikson commented Mar 15, 2017

One other side suggestion because I don't want to file a whole new issue :)

Perhaps consider renaming createReducer to createOrmReducer ? There's already a million utils out there named createReducer, which generally implement "case reducer lookup table" functionality. Be nice to differentiate things a bit.

@tommikaikkonen

This comment has been minimized.

Show comment
Hide comment
@tommikaikkonen

tommikaikkonen May 2, 2017

Owner

Thanks for submitting the PR related to this issue! Some thoughts about how to best solve the problem below.

Before 0.9 and the eager updates, solving this with an updateOrCreate function wouldn't have worked. If a row with the same ID had been created during the same session, it wouldn't show up on the query that checks whether that row already exists to choose between an update or a create. You'd see the changes only after ending a session and starting a new one.

With eager updates, the situation is different. Doing this in one session works as expected:

// inside a session

const props1 = { id: 1, name: 'Name 1' };
const props2 = { id: 1, name: 'Name 2' };

Author.create(props1);

// A basic update or create operation
if (Author.hasId(props2.id)) {
    const { id: props2id, ...nonIdProps };
    Author.withId(props2id).update(nonIdProps);
} else {
    Author.create(props2);
}

If Redux-ORM had an updateOrCreate similar to the one in the above snippet, would you be able to handle the parsing without duplicates if you replace Model.create calls with calls to Model.updateOrCreate, where you can specify the lookup props (in this case, the row id) and the props values to update/create with?

Merging props where the id already exists is convenient, but it could lead to silent bugs where people expect an error. In relational databases, if you try to insert a row that has an already existing id, that's a failure unless you specify what happens in the existing case. There's also the problem of how to merge - putting this logic in the insert locks us into either merging the props or replacing the record with the newer props. updateOrCreate, or your own logic, would allow for both approaches.

Related comment: #124 (comment)

Owner

tommikaikkonen commented May 2, 2017

Thanks for submitting the PR related to this issue! Some thoughts about how to best solve the problem below.

Before 0.9 and the eager updates, solving this with an updateOrCreate function wouldn't have worked. If a row with the same ID had been created during the same session, it wouldn't show up on the query that checks whether that row already exists to choose between an update or a create. You'd see the changes only after ending a session and starting a new one.

With eager updates, the situation is different. Doing this in one session works as expected:

// inside a session

const props1 = { id: 1, name: 'Name 1' };
const props2 = { id: 1, name: 'Name 2' };

Author.create(props1);

// A basic update or create operation
if (Author.hasId(props2.id)) {
    const { id: props2id, ...nonIdProps };
    Author.withId(props2id).update(nonIdProps);
} else {
    Author.create(props2);
}

If Redux-ORM had an updateOrCreate similar to the one in the above snippet, would you be able to handle the parsing without duplicates if you replace Model.create calls with calls to Model.updateOrCreate, where you can specify the lookup props (in this case, the row id) and the props values to update/create with?

Merging props where the id already exists is convenient, but it could lead to silent bugs where people expect an error. In relational databases, if you try to insert a row that has an already existing id, that's a failure unless you specify what happens in the existing case. There's also the problem of how to merge - putting this logic in the insert locks us into either merging the props or replacing the record with the newer props. updateOrCreate, or your own logic, would allow for both approaches.

Related comment: #124 (comment)

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson May 2, 2017

Collaborator

Yeah, I think an updateOrCreate function should work well. Good idea.

Collaborator

markerikson commented May 2, 2017

Yeah, I think an updateOrCreate function should work well. Good idea.

@lamoglia

This comment has been minimized.

Show comment
Hide comment
@lamoglia

lamoglia Jul 8, 2017

Looks like this upsert addition solved the issue for me #124 (comment)

lamoglia commented Jul 8, 2017

Looks like this upsert addition solved the issue for me #124 (comment)

@markerikson

This comment has been minimized.

Show comment
Hide comment
@markerikson

markerikson Jul 8, 2017

Collaborator

Yep, upsert seems to cover it. I'll go ahead and close this now.

Collaborator

markerikson commented Jul 8, 2017

Yep, upsert seems to cover it. I'll go ahead and close this now.

@markerikson markerikson closed this Jul 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment