-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed Long Description text parsing #78
Conversation
Fixed parsing issue with Long Description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Hunter, thanks for your contribution.
I am sad to say it, but it contains multiple issues. I have tried to explain most of them in other comments. Next time, please focus on fixing one issue at time, which would allow me to merge your fix immediately and discuss other changes separately.
Two more comments, which cannot be added to specific line of code:
- Please revert the mode change for
setup.py
file, it has no sense. - Please cleanup a commit history before resubmitting your PR (use a
git rebase
command and squash your commits - help).
But thank you for your effort! Looking forward to merge your fix. Tomas
@@ -624,6 +695,16 @@ def load(self): | |||
|
|||
self.location = Point.from_string(root.find(id="uxLatLon").text) | |||
|
|||
self.lat = str(root.find(id="ctl00_ContentBody_Location").find("a")).split("=")[2].split("&")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is problematic. The a
sub-element seems to be optional – only presented for caches inside US. If you look at the ctl00_ContentBody_Location
element for any cache outside the US, it has no a
sub-element, so this code cannot work. Please see the picture:
The other thing is, that if you need lat and lon (I assume you mean cache latitude and longitude), they are already parsed in cache.location
– read the docs. 😉
Therefore, I would suggest deleting the whole lat+lon code you added (the lines in load()
method + in load_quick()
method and the properties+setters).
|
||
self.lon = str(root.find(id="ctl00_ContentBody_Location").find("a")).split("=")[3].split("&")[0] | ||
|
||
self.id = str(root.find(id="ctl00_ContentBody_GeoNav_logButton")).split("=")[3].split("&")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely not a good idea to parse an URL. Please take a look at urllib.parse to know how do this correctly. 😉
On the other hand – I don't see any reason, why to parse this cache ID. We already have 2 different IDs (waypoint + guid) which are used elsewhere. So why should we want another ID, which wouldn't be used anywhere?
|
||
self.id = str(root.find(id="ctl00_ContentBody_GeoNav_logButton")).split("=")[3].split("&")[0] | ||
|
||
self.stateprovince = str(root.find(id="ctl00_ContentBody_Location").text.split(",")[0].split(" ")[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your effort to parse a state and a country, but the second split(" ")[1]
is problematic. On the website, there is a text "In STATE, COUNTRY". What if the STATE part consists of multiple words (for example – see the screenshot attached to another comment)?
Notwithstanding that, I would prefer not to solve this in pycaching, but if you need detailed address info in your application, try something like Google Geocoding API instead. The benefit would be, that you could geocode any point, not only the cache (for example multicache waypoints, parking places, etc.).
Therefore, I would also suggest deleting this + the country parsing (including the properties and setters of course).
@@ -637,7 +718,7 @@ def load(self): | |||
|
|||
user_content = root.find_all("div", "UserSuppliedContent") | |||
self.summary = user_content[0].text | |||
self.description = str(user_content[1]) | |||
self.description = str(user_content[1].text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simple fix. Thank you!
@@ -729,6 +810,16 @@ def load_by_guid(self): | |||
self.location = Point.from_string( | |||
content.find("p", "LatLong Meta").text) | |||
|
|||
self.lat = str(content.find(id="ctl00_ContentBody_Location").find("a").find("a")).split("=")[2].split("&")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the bellow can work, because the content is loaded from cache print-page (example), which doesn't have these elements. So I would like to kindly ask you to delete these lines, irrespectively of what has been written in other comments.
Thanks for the input, I'll take what you said and go back to the drawing board. I wrote a majority of these changes to use with my Garmin device so a lot of these probably are out of scope of this project. |
See #77