Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve geo encode #74

Merged
merged 20 commits into from
Nov 8, 2016
Merged

Improve geo encode #74

merged 20 commits into from
Nov 8, 2016

Conversation

lexman
Copy link

@lexman lexman commented Oct 28, 2016

Hello,

Here's a PR to improve the _geo_encode() function. It should improve

  • readability by removing skipping coordinates and the use of _handle_skipped_last() function
  • performance (20 %). In our test case, before the PR :
     9901    0.044    0.000    4.632    0.000 encoder.py:219(addFeature)
     9901    0.961    0.000    4.381    0.000 encoder.py:370(_geo_encode)
      999    0.002    0.000    0.007    0.000 encoder.py:45(__init__)
   639778    0.136    0.000    0.211    0.000 encoder.py:53(_round)
      999    0.038    0.000    7.247    0.007 encoder.py:65(addFeatures)

After the PR :

     9901    0.042    0.000    3.680    0.000 encoder.py:208(addFeature)
     9901    0.033    0.000    3.431    0.000 encoder.py:293(_geo_encode)
      999    0.002    0.000    0.007    0.000 encoder.py:34(__init__)
   639778    0.138    0.000    0.216    0.000 encoder.py:42(_round)
      999    0.038    0.000    6.247    0.006 encoder.py:54(addFeatures)

My major concern is that I've removed the tolerance logic, but according to the test suite, and test on our website everything is Ok. I don't know if anyone would change mapbox_vector_tile.encoder.tolerance when using the library.

I've also changed some tests because encoding a geometry with 3 same points [[-71, 42], [-71, 42], [-71, 42]] should not be valid. According to the spec this is not a valid geometry. I haven't done anything for this case, but I would suggest either to raise an exception or to omit the geometry. For the moment, the new encoder produces an invalid lienstring (without MOVETO comands)

If your need the benchmack script, I can provide it.

@coveralls
Copy link

coveralls commented Oct 28, 2016

Coverage Status

Coverage decreased (-2.2%) to 92.797% when pulling f0e7229 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

@lexman lexman mentioned this pull request Oct 28, 2016
2 tasks
@nvkelso
Copy link
Member

nvkelso commented Oct 28, 2016

Yes, please include benchmark script.

Copy link
Member

@rmarianski rmarianski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it's fine that you removed the tolerance and pending globals. Those might have been used at some point, but don't think they are any longer.

I only noticed that some of the tests were now not run, but besides that LGTM 👍

I also think that checking in the benchmark script would be useful.

@@ -12,7 +12,7 @@ def test_suite():
except:
import unittest

suite = unittest.TestLoader().discover("tests")
suite = unittest.TestLoader().discover("tests", pattern='test_enc*.py')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave this in? I assume that it was just to speed up running the tests to exercise the geom encoding. If so, can we revert it back? The tests all pass for me locally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's a mistake. I've reverted.

properties = {
'floatval': 3.14159
}
self.assertRoundTrip(
input_geometry=geometry,
expected_geometry=[[-71, 42], [-71, 42], [-71, 42]],
expected_geometry=[[-71, 42], [-71, 43], [-71, 42]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch correcting these here.

@nvkelso
Copy link
Member

nvkelso commented Oct 31, 2016

No description provided.

@nvkelso nvkelso assigned rmarianski and lexman and unassigned rmarianski Oct 31, 2016
@coveralls
Copy link

coveralls commented Nov 2, 2016

Coverage Status

Coverage increased (+1.7%) to 96.763% when pulling 8580927 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

@lexman
Copy link
Author

lexman commented Nov 2, 2016

@rmarianski
Copy link
Member

Thanks for sharing the benchmark script! All looks good to me, and my results were a little bit faster with your branch. @lexman is this at a good point to merge in? @zerebubuth any objections or thoughts?

@lexman
Copy link
Author

lexman commented Nov 2, 2016

Please hold !

I've found a bug with linestrings when all MOVE_TO commands are too small (smaller than 1 pixel). I'm finishing a refactor to handle this case (with a test). Sorry for that.

@rmarianski
Copy link
Member

Sounds good. Nice catch!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.008%) to 95.025% when pulling ea859d1 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.008%) to 95.025% when pulling ea859d1 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

@coveralls
Copy link

coveralls commented Nov 3, 2016

Coverage Status

Coverage increased (+0.5%) to 95.492% when pulling 279702f on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

@lexman
Copy link
Author

lexman commented Nov 3, 2016

At last :)

I've refactored the previous code in order not to include in a tile any shape (linestring or polygon) that will be too small (ie less than one unit of the extent). In that case, neither the shape nor the attributes are added to the tile. Sorry for the delay.

Here are the new perf results. Before :

     7000    0.005    0.000    0.387    0.000 encoder.py:141(handle_shape_validity)
     9901    0.045    0.000    4.650    0.000 encoder.py:219(addFeature)
     9901    0.016    0.000    0.079    0.000 encoder.py:235(_get_feature_type)
     9901    0.005    0.000    0.007    0.000 encoder.py:273(_handle_attr)
      999    0.002    0.000    0.007    0.000 encoder.py:45(__init__)
   639778    0.137    0.000    0.210    0.000 encoder.py:53(_round)
     9991    0.019    0.000    2.454    0.000 encoder.py:102(enforce_winding_order)
      999    0.039    0.000    7.282    0.007 encoder.py:65(addFeatures)

After :

     7000    0.005    0.000    0.365    0.000 encoder.py:130(handle_shape_validity)
     9901    0.057    0.000    3.103    0.000 encoder.py:208(addFeature)
     9901    0.014    0.000    0.072    0.000 encoder.py:232(_get_feature_type)
     9900    0.004    0.000    0.006    0.000 encoder.py:267(_handle_attr)
      999    0.002    0.000    0.005    0.000 encoder.py:34(__init__)
   639778    0.131    0.000    0.204    0.000 encoder.py:42(_round)
     9991    0.017    0.000    2.322    0.000 encoder.py:91(enforce_winding_order)
      999    0.040    0.000    5.590    0.006 encoder.py:54(addFeatures)

A bit more than 20 % on addFeatures

@lexman
Copy link
Author

lexman commented Nov 7, 2016

Hello,

just to mention that this pull request also brings improvements tovards V2 spec ( #42 ) by ensuring that a LineString geometry can't be only a moveto command and a polygon can't have less than 3 points...

Copy link
Member

@rmarianski rmarianski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a spurious print was left in, but otherwise looks good to me.

features = result['foo']['features']
import sys
sys.stderr.write(str(features))
sys.stderr.write("\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you meant to check in the print?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Fixed !

@coveralls
Copy link

Coverage Status

Coverage increased (+0.5%) to 95.492% when pulling a5692f4 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.5%) to 95.492% when pulling a5692f4 on Mappy:improve_geo_encode into 04b03c4 on tilezen:master.

@rmarianski rmarianski merged commit e7b8a20 into tilezen:master Nov 8, 2016
@rmarianski
Copy link
Member

Thanks!

@iandees
Copy link
Member

iandees commented Jan 6, 2017

Hi @lexman, I'm trying to use the change here to improve Mapzen's MVT tile performance and seem to be running into some dependency problems. When I try to generate an MVT tile I get the following exception:

# $ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION=2 python profile_tilequeue.py /etc/tilequeue/config.yaml
Traceback (most recent call last):
  File "profile_tilequeue.py", line 3, in <module>
    from tilequeue.command import lookup_formats
  File "/usr/local/lib/python2.7/dist-packages/tilequeue/command.py", line 14, in <module>
    from tilequeue.format import lookup_format_by_extension
  File "/usr/local/lib/python2.7/dist-packages/tilequeue/format/__init__.py", line 3, in <module>
    from tilequeue.format.mvt import encode as mvt_encode
  File "/usr/local/lib/python2.7/dist-packages/tilequeue/format/mvt.py", line 1, in <module>
    from mapbox_vector_tile.encoder import on_invalid_geometry_make_valid
  File "/usr/local/lib/python2.7/dist-packages/mapbox_vector_tile/__init__.py", line 1, in <module>
    from . import encoder
  File "/usr/local/lib/python2.7/dist-packages/mapbox_vector_tile/encoder.py", line 14, in <module>
    from .compat import PY3, vector_tile, apply_map
  File "/usr/local/lib/python2.7/dist-packages/mapbox_vector_tile/compat.py", line 10, in <module>
    from .Mapbox import vector_tile_pb2
  File "/usr/local/lib/python2.7/dist-packages/mapbox_vector_tile/Mapbox/vector_tile_pb2.py", line 4, in <module>
    from google.protobuf import descriptor as _descriptor
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 46, in <module>
    from google.protobuf.pyext import _message
ImportError: cannot import name _message

I'm running on Ubuntu 14.04.5 and have the protobuf 3.1.0 pip package installed:

$ pip show protobuf
Name: protobuf
Version: 3.1.0.post1
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: protobuf@googlegroups.com
Author-email: protobuf@googlegroups.com
License: New BSD License
Location: /usr/local/lib/python2.7/dist-packages
Requires: six, setuptools

Have you been able to get this running on Ubuntu?

@jalessio
Copy link
Contributor

jalessio commented Jan 7, 2017

@iandees looks like this won't work for you exactly (python 2 vs 3), but for reference here's a known-working Dockerfile for using this (along with the "Use native protobuf library for performance" optimizations). The python:3.5.2 base image is Debian Jessie, so similar should be achievable on Ubuntu.

FROM python:3.5.2

RUN mkdir -p /provisioning
WORKDIR /provisioning

# Install OS dependencies
RUN apt-get update && apt-get install -y \
      build-essential \
      curl \
      unzip && \
    rm -rf /var/lib/apt/lists/*

# Get libgeos for Python Shapely package
RUN mkdir -p /provisioning/geos && \
    cd /provisioning/geos && \
    curl -# -O http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 && \
    tar -xjf geos-3.5.0.tar.bz2 && \
    cd geos-3.5.0 && \
    ./configure && \
    make -j$(python -c 'import multiprocessing; print(multiprocessing.cpu_count())') && \
    make -j$(python -c 'import multiprocessing; print(multiprocessing.cpu_count())') install && \
    ldconfig -v && \
    rm -rf /provisioning/geos

# Get the protobuf package and build from source
# https://github.com/google/protobuf/releases
RUN mkdir -p /provisioning/protobuf && \
    cd /provisioning/protobuf && \
    curl -# -L -O https://github.com/google/protobuf/archive/v3.1.0-alpha-1.tar.gz && \
    tar -xvf v3.1.0-alpha-1.tar.gz && \
    cd protobuf-3.1.0-alpha-1 && \
    ./autogen.sh && \
    ./configure && \
    make -j$(python -c 'import multiprocessing; print(multiprocessing.cpu_count())') && \
    make -j$(python -c 'import multiprocessing; print(multiprocessing.cpu_count())') install && \
    cd python && \
    ldconfig && \
    python setup.py install --cpp_implementation && \
    rm -rf /provisioning/protobuf

# Install mapbox-vector-tile directly from the git repo
RUN mkdir -p cd /provisioning/mapbox-vector-tile && \
    cd /provisioning/mapbox-vector-tile && \
    curl -# -L -O https://github.com/tilezen/mapbox-vector-tile/archive/v1.0.0.tar.gz && \
    curl -# -L -o 6e3a0c7.tar.gz https://github.com/tilezen/mapbox-vector-tile/tarball/6e3a0c751c515bef66bd72a36dcd8cf41fa63fe8 && \
    tar -xf 6e3a0c7.tar.gz && \
    cd tilezen-mapbox-vector-tile-6e3a0c7 && \
    python setup.py install && \
    rm -rf /provisioning/mapbox-vector-tile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants