Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Long Description text parsing #78

Closed
wants to merge 6 commits into from

Conversation

Hunter275
Copy link

See #77

Copy link
Owner

@tomasbedrich tomasbedrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Hunter, thanks for your contribution.

I am sad to say it, but it contains multiple issues. I have tried to explain most of them in other comments. Next time, please focus on fixing one issue at time, which would allow me to merge your fix immediately and discuss other changes separately.

Two more comments, which cannot be added to specific line of code:

  1. Please revert the mode change for setup.py file, it has no sense.
  2. Please cleanup a commit history before resubmitting your PR (use a git rebase command and squash your commits - help).

But thank you for your effort! Looking forward to merge your fix. Tomas

@@ -624,6 +695,16 @@ def load(self):

self.location = Point.from_string(root.find(id="uxLatLon").text)

self.lat = str(root.find(id="ctl00_ContentBody_Location").find("a")).split("=")[2].split("&")[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is problematic. The a sub-element seems to be optional – only presented for caches inside US. If you look at the ctl00_ContentBody_Location element for any cache outside the US, it has no a sub-element, so this code cannot work. Please see the picture:

ctl00_contentbody_location

The other thing is, that if you need lat and lon (I assume you mean cache latitude and longitude), they are already parsed in cache.locationread the docs. 😉

Therefore, I would suggest deleting the whole lat+lon code you added (the lines in load() method + in load_quick() method and the properties+setters).


self.lon = str(root.find(id="ctl00_ContentBody_Location").find("a")).split("=")[3].split("&")[0]

self.id = str(root.find(id="ctl00_ContentBody_GeoNav_logButton")).split("=")[3].split("&")[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not a good idea to parse an URL. Please take a look at urllib.parse to know how do this correctly. 😉

On the other hand – I don't see any reason, why to parse this cache ID. We already have 2 different IDs (waypoint + guid) which are used elsewhere. So why should we want another ID, which wouldn't be used anywhere?


self.id = str(root.find(id="ctl00_ContentBody_GeoNav_logButton")).split("=")[3].split("&")[0]

self.stateprovince = str(root.find(id="ctl00_ContentBody_Location").text.split(",")[0].split(" ")[1])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your effort to parse a state and a country, but the second split(" ")[1] is problematic. On the website, there is a text "In STATE, COUNTRY". What if the STATE part consists of multiple words (for example – see the screenshot attached to another comment)?

Notwithstanding that, I would prefer not to solve this in pycaching, but if you need detailed address info in your application, try something like Google Geocoding API instead. The benefit would be, that you could geocode any point, not only the cache (for example multicache waypoints, parking places, etc.).

Therefore, I would also suggest deleting this + the country parsing (including the properties and setters of course).

@@ -637,7 +718,7 @@ def load(self):

user_content = root.find_all("div", "UserSuppliedContent")
self.summary = user_content[0].text
self.description = str(user_content[1])
self.description = str(user_content[1].text)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple fix. Thank you!

@@ -729,6 +810,16 @@ def load_by_guid(self):
self.location = Point.from_string(
content.find("p", "LatLong Meta").text)

self.lat = str(content.find(id="ctl00_ContentBody_Location").find("a").find("a")).split("=")[2].split("&")[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the bellow can work, because the content is loaded from cache print-page (example), which doesn't have these elements. So I would like to kindly ask you to delete these lines, irrespectively of what has been written in other comments.

@tomasbedrich tomasbedrich added this to the 3.6.1 milestone Jul 20, 2017
@Hunter275
Copy link
Author

Hunter275 commented Jul 24, 2017

Thanks for the input, I'll take what you said and go back to the drawing board. I wrote a majority of these changes to use with my Garmin device so a lot of these probably are out of scope of this project.

@Hunter275 Hunter275 closed this Jul 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants