Skip to content

Fixes#36

Merged
itamargiv merged 4 commits into
masterfrom
fixes
May 11, 2020
Merged

Fixes#36
itamargiv merged 4 commits into
masterfrom
fixes

Conversation

@Ladsgroup
Copy link
Copy Markdown
Contributor

@Ladsgroup Ladsgroup commented May 6, 2020

  • Fixed Makefile dependency

    modified:   Makefile
    
  • Fixed issue with missing resource urls

    modified:   wikidatarefisland/data_model/filters.py
    modified:   tests/data_model/test_filters.py
    
  • Fix issue with null statements breaking the scraper

    modified:   wikidatarefisland/pipes/item_extractor_pipe.py
    modified:   tests/pipes/test_item_extractor_pipe.py
    
  • Fixed initial results seem to only include P2888

    modified:   wikidatarefisland/services/schemaorg_property_mapper.py
    modified:   wikidatarefisland/data_model/schemaorg_normalizer.py
    
  • Fixed issue with request headers

    AND

  • Fixed issue with normalized data structure

    modified:   wikidatarefisland/pipes/scraper.py
    modified:   tests/pipes/test_scraper.py
    
  • Fixed issue with recursive comparisons

    modified:   wikidatarefisland/data_model/wikibase/value_types.py
    
  • Fixed issue with char encoding

    modified:   wikidatarefisland/data_access/storage.py
    
  • Isort imports

    modified:   wikidatarefisland/data_model/__init__.py
    modified:   wikidatarefisland/pipes/__init__.py
    modified:   wikidatarefisland/pumps/pump.py
    modified:   wikidatarefisland/services/__init__.py
    

Ladsgroup added 4 commits May 6, 2020 15:52
	modified:   Makefile
	modified:   tests/data_model/test_filters.py
	modified:   tests/pipes/test_item_extractor_pipe.py
	modified:   tests/pipes/test_scraper.py
	modified:   wikidatarefisland/data_model/__init__.py
	modified:   wikidatarefisland/data_model/filters.py
	modified:   wikidatarefisland/data_model/schemaorg_normalizer.py
	modified:   wikidatarefisland/pipes/__init__.py
	modified:   wikidatarefisland/pipes/item_extractor_pipe.py
	modified:   wikidatarefisland/pipes/scraper.py
	modified:   wikidatarefisland/pumps/pump.py
	modified:   wikidatarefisland/services/__init__.py
	modified:   wikidatarefisland/services/schemaorg_property_mapper.py
This way it would output non-latin text properly
Copy link
Copy Markdown
Contributor

@tarrow tarrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good. I wondered if there is some way I should be "manually" testing that these work before I merge though?

filtered_statements = potentially_ref_statement_filterer.filter_statements(
all_statements)
result_statements = map(_extract_statement, filtered_statements)
result_statements = reduce(_extract_statement, filtered_statements, [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what using a reduce here does vs using a map? Can you help me understand what is the benefit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this and the answer is that now for n filtered_statements results_statements will be n-m long where m>=0. Whereas with map the length of filtered_statements equalsresults_statements

that are any of the passed property ids
"""
return lambda statement: statement.get('pid', '') not in excluded_properties
return lambda statement: statement.get('mainsnak', {}).get('property', '') \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a good change; I'm really searching for which flavour of the dumps I found that had a "pid" key because clearly the prod dumps aren't having this. Or did I totally make it up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never saw "pid" in any serialization.

))


@pytest.mark.skip
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally get the idea of just skipping this test if it isn't useful but I'm a tiny bit concerned that we have to do this. I'd love to fix up the test rather than skip. To me this seems like the only test that checks the pipe obeys "the contract" from README.md

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task to track re-enabling this test is T252036 (already picked up in the sprint). We marked it as skipped because it exploded when we fixed the first pipe. Fixing it should not be too hard TBH.

@itamargiv itamargiv merged commit 1409bea into master May 11, 2020
@itamargiv itamargiv deleted the fixes branch May 11, 2020 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants