Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTERLIS Import: Data in re_tables is ducplicated when re-importing #365

Closed
2 tasks
sjib opened this issue Aug 7, 2024 · 6 comments
Closed
2 tasks

INTERLIS Import: Data in re_tables is ducplicated when re-importing #365

sjib opened this issue Aug 7, 2024 · 6 comments
Labels
bug Something isn't working datamodel Concerns the datamodel enhancement New feature or request INTERLIS About INTERLIS exchange format (import / export)

Comments

@sjib
Copy link
Contributor

sjib commented Aug 7, 2024

Describe the bug
A clear and concise description of what the bug is.

When re-importing the same dataset or another dataset in a existing database with data it does not check if there is already the same data in the re_tables. This leads to duplicates in re_maintenance_event_wastewater_structure and re_building_group_disposal.

To Reproduce
Re-import a dataset with re_data a second time and check your re_table in postgres

Expected behavior
A clear and concise description of what you expected to happen.

On import it should be checked as with all other tables if there is already a dataset with the same combination fk_1 / fk_2.
The function create_or_update should be enhanced or an new function should be written that checks on the two fk_* attributes of a n:m re_table instead of the obj_id if there is already the same dataset in there.

Screenshots / data
If applicable, add screenshots or data to help explain your problem.

Desktop (please complete the following information):

Additional context
Add any other context about the problem here.

Tests with create_re_class_entry do not work as expected - moved to this new issue

  • How to get the classname out of variable cls?
    teksi_wastewater.utils:Not supported n:m relation class: - please open an issue to add this to interlis_importer_to_intermediate_schema configuration

So it does not find the cls

  • How to make this function more flexible in not having to hardcode the supported re_* classes?
    def create_re_class_entry(self, cls, **kwargs):
        """
        Checks if an existing instance (if fk_1, fk_2 combination is found) or creates an instance of the provided re_class
        with given kwargs, and returns it.
        """
        instance = None

        # We try to get the instance from the session/database
        # does not work as we can then not run instance = cls(**kwargs) to create object at the end
        # fk_1_value = kwargs.get("fk_1", None)
        # fk_2_value = kwargs.get("fk_2", None)

        logger.info(
                    f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
                )

        if cls == "re_maintenance_event_wastewater_structure":
            fk_1_value = kwargs.get("fk_wastewater_structure", None)
            fk_2_value = kwargs.get("fk_maintenance_event", None)
        elif cls == "re_building_group_disposal":
            fk_1_value = kwargs.get("fk_disposal", None)
            fk_2_value = kwargs.get("fk_building_group", None)
        else:
            fk_1_value = None
            fk_2_value = None

            logger.warning(
                f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
            )

        if fk_1_value and fk_2_value:
            # instance = self.session_tww.query(cls).get(kwargs.get("obj_id", None))
            # instance = self.session_tww.query(cls).get(kwargs.get("fk_1", "fk_2", None))

            # try with filter
            # filter(or_(db.users.name=='Ryan', db.users.country=='England'))
            # filter should be and not or see https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.filter_by

            if cls == "re_maintenance_event_wastewater_structure":
                instance = self.session_tww.query(cls).filter_by(
                    fk_wastewater_structure=fk_1_value, fk_maintenance_event=fk_2_value
                )

            elif cls == "re_building_group_disposal":
                instance = self.session_tww.query(cls).filter_by(
                    fk_disposal=fk_1_value, fk_building_group=fk_2_value
                )

            else:
                logger.warning(
                    f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
                )

        if instance is None:

            # We found it -> skip
            # instance.__dict__.update(kwargs)
            # flag_dirty(instance)  # we flag it as dirty so it stays in the session
            # else:
            # We didn't find it -> create
            instance = cls(**kwargs)

        return instance

and function call

    def _import_erhaltungsereignis_abwasserbauwerkassoc(self):
        for row in self.session_interlis.query(
            self.model_classes_interlis.erhaltungsereignis_abwasserbauwerkassoc
        ):
            # test with create_re_class_entry

            re_maintenance_event_wastewater_structure = self.create_re_class_entry(
                self.model_classes_tww_od.re_maintenance_event_wastewater_structure,
                # this class does not inherit base_commmon
                # **self.base_common(row),
                # --- re_maintenance_event_wastewater_structure ---
                # does not work
                # fk_1=self.get_pk(row.erhaltungsereignis_abwasserbauwerkassocref__REL),
                # fk_2=self.get_pk(row.abwasserbauwerkref__REL),
                fk_maintenance_event=self.get_pk(
                    row.erhaltungsereignis_abwasserbauwerkassocref__REL
                ),
                fk_wastewater_structure=self.get_pk(row.abwasserbauwerkref__REL),
            )

            self.session_tww.add(re_maintenance_event_wastewater_structure)
            print(".", end="")

Originally posted by @sjib in #357 (comment)

@sjib sjib added the bug Something isn't working label Aug 7, 2024
@sjib sjib added this to the TEKSI Wastewater 2024.1 milestone Aug 7, 2024
@sjib sjib added enhancement New feature or request INTERLIS About INTERLIS exchange format (import / export) labels Aug 7, 2024
@sjib
Copy link
Contributor Author

sjib commented Aug 7, 2024

eg. Line 2 and line 8 are the same data
20240807_re_me_wws_duplicate_data

@sjib
Copy link
Contributor Author

sjib commented Aug 7, 2024

  • Add unique constraint fk_1 / fk_2 to re_ tables to improve data integrity

@sjib sjib added the datamodel Concerns the datamodel label Aug 7, 2024
@sjib
Copy link
Contributor Author

sjib commented Aug 7, 2024

@ponceta Can we ask opengis to code this new create_re_class_entry ?

sjib added a commit that referenced this issue Aug 7, 2024
Datacheck in re_classes moved to new issue #365
@sjib
Copy link
Contributor Author

sjib commented Aug 7, 2024

  • unique constraint fk_1 / fk_2 to re_ tables

@cymed

Should we do for tww_od.re_building_group_disposal

  • create an index as with identifier, fk_dataowner: CREATE UNIQUE INDEX in_od_organisation_identifier ON tww_od.organisation USING btree (identifier ASC NULLS LAST, fk_dataowner ASC NULL LAST);

this would then be

CREATE UNIQUE INDEX in_re_building_group_disposal_fks ON tww_od.re_building_group_disposal USING btree (fk_building_group ASC NULLS LAST, fk_disposal ASC NULLS LAST);

20240807_UNIQUE INDEX in_re_building_group_disposal_fks

or

  • create a UNIQUE constraint: ALTER TABLE tww_od.re_building_group_disposal ADD CONSTRAINT unique_fk_combination UNIQUE (fk_building_group, fk_disposal);

20240707_re_building_group_disposal_unique_fk_combination

@cymed @ponceta Do you know what is the difference? Pro / Con?

@sjib
Copy link
Contributor Author

sjib commented Aug 7, 2024

Do we also want to implement the UNIQUE constraint of class organisation?


    UNIQUE 
        !!@comment = "!! Neben UNIQUE OID zusätzlich auch Kombination Bezeichnung, Organisationstyp, UID (Wegleitung GEP-Daten 2020)"
        Bezeichnung, Organisationstyp, UID; 
    END Organisation;

domi4484 pushed a commit that referenced this issue Aug 21, 2024
Datacheck in re_classes moved to new issue #365
Link table don't exists for KEK
@domi4484
Copy link
Contributor

domi4484 commented Oct 5, 2024

Data in re_tables is ducplicated when re-importing

This should not happen anymore since #414 was merged.

@ponceta ponceta closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datamodel Concerns the datamodel enhancement New feature or request INTERLIS About INTERLIS exchange format (import / export)
Projects
None yet
Development

No branches or pull requests

3 participants