Add python tests and travis integration

twosigma · Jul 14, 2017 · 3d932b4 · 3d932b4
1 parent 6d49fb3
commit 3d932b4
Show file tree

Hide file tree

Showing 19 changed files with 1,710 additions and 1,098 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,4 +2,8 @@ project/project
 project/target
 target
 .idea
-
+.vscode
+metastore_db
+derby.log
+python/spark
+**/.cache
diff --git a/.travis.yml b/.travis.yml
@@ -3,3 +3,42 @@ scala:
   - 2.11.8
 jdk:
   - oraclejdk8
+install:
+  # This work is to Run Conda on travis; based on https://conda.io/docs/travis.html
+  - sudo apt-get update
+  # We do this conditionally because it saves us some downloading if the
+  # version is the same.
+  - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
+      wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh;
+    else
+      wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
+    fi
+  - bash miniconda.sh -b -p $HOME/miniconda
+  - export PATH="$HOME/miniconda/bin:$PATH"
+  - hash -r
+  - conda config --set always_yes yes --set changeps1 no
+  - conda update -q conda
+  # Useful for debugging any issues with conda
+  - conda info -a
+  # The parameter here is the number of batches to split into
+  # Depending on the total number of tests the matrix below
+  # has to have the same number of entries, 
+  # i.e. 8 entries will have: aa, ab, ac, ad, ..., ah
+  - bash ./scripts/divide_scala_tests.sh 8
+env:
+  matrix:
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh aa'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ab'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ac'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ad'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ae'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh af'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ag'
+    - TEST_DIR=. DEPS_CMD=':' TEST_CMD='./scripts/run_scala_test.sh ah'
+    - TEST_DIR=python DEPS_CMD='./travis/prepare_python_tests.sh' TEST_CMD='./travis/run_python_tests.sh'
+before_script:
+  # this chmod command is a workaround for a current bug in travis image.
+  # see https://github.com/travis-ci/travis-ci/issues/7703
+  - sudo chmod +x /usr/local/bin/sbt 
+  - (cd $TEST_DIR && $DEPS_CMD)
+script: cd $TEST_DIR && $TEST_CMD
diff --git a/python/README.md b/python/README.md
@@ -1,3 +1,20 @@
+<!--
+#
+#  Copyright 2017 TWO SIGMA OPEN SOURCE, LLC
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+-->
 ts-flint - Time Series Library for PySpark
 ==========================================
 
@@ -50,6 +67,11 @@ Documentation
 
 The Flint python bindings are documented at https://ts-flint.readthedocs.io/en/latest
 
+Run tests
+---------
+
+To run tests for the Python code see a separate [README](tests/README.md) file in the tests directory
+
 Examples
 --------
 

diff --git a/python/tests/README.md b/python/tests/README.md
@@ -0,0 +1,56 @@
+<!--
+#
+#  Copyright 2017 TWO SIGMA OPEN SOURCE, LLC
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+-->
+# Python tests
+
+## Overview
+This directory contains the code to test the Python code. It uses the `unittest` module.
+
+## Prerequisites
+The tests need a spark distribution installed locally to run. An easy way to do it is to go to the
+[Apache Spark download page](https://spark.apache.org/downloads.html) and select version 2.1.1 (May 02 2017), Pre-built for Apache Hadoop 2.7 and later.
+
+Extract the tarball in a local directory and set the following environment variable:
+```
+export SPARK_HOME=<local-spark-directory>
+```
+One time preparation for running the python tests can be setup by runinning the following
+from the root Flint directory:
+```
+scripts/prepare_python_tests.sh
+```
+
+## Running tests
+To run the tests issue the following command from the root Flint directory:
+```
+scripts/run_python_tests.sh
+```
+
+## Code
+The code for the tests is found in this `tests` directory. The content of the files here is as follows:
+
+* `base_test_case.py` Contains code for the `BaseTestCase` abstract class that is the grandfather of all the testcases.
+* `spark_test_case.py` Contains a concrete class, `SparkTestCase`, that inherits from `BaseTestCase` and sets up a local `SparkContext`. This is the default class to inherit test cases from.
+* `test_dataframe.py` Contains about 50 test cases for the `TimeSeriesDataFrame`.
+* `test_data.py` Contains constant data for the tests.
+* `utils.py` Contains specialized assert functions and Pandas DataFrame creation.
+
+## Extending
+If the test setup done in the default class, `SparkTestCase`, does not fit the needs of a particular environment, a new class can be written. The name of the new class, say `MyTestCase` is then exported in the `BASE_CLASS` variable before the tests are run:
+```
+export FLINT_BASE_TESTCASE=<Name of new class>
+```
diff --git a/python/tests/ts/__init__.py b/python/tests/ts/__init__.py
diff --git a/python/tests/ts/base_test_case.py b/python/tests/ts/base_test_case.py
@@ -0,0 +1,72 @@
+#
+#  Copyright 2017 TWO SIGMA OPEN SOURCE, LLC
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+'''
+    The base class code for all Flint unit tests
+'''
+import unittest
+from abc import ABCMeta, abstractclassmethod
+import tests.utils as test_utils
+from tests.ts.test_data import (FORECAST_DATA, PRICE_DATA, VOL_DATA, VOL2_DATA,
+                             VOL3_DATA, INTERVALS_DATA)
+from functools import lru_cache
+
+
+class BaseTestCase(unittest.TestCase, metaclass=ABCMeta):
+    ''' Abstract base class for all Flint tests
+    '''
+    @abstractclassmethod
+    def setUpClass(cls):
+        ''' The automatic setup method for subclasses '''
+        return
+
+    @abstractclassmethod
+    def tearDownClass(cls):
+        ''' The automatic tear down method for subclasses '''
+        return
+
+    @lru_cache(maxsize=None)
+    def forecast(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(FORECAST_DATA, ["time", "id", "forecast"]))
+
+    @lru_cache(maxsize=None)
+    def vol(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(VOL_DATA, ["time", "id", "volume"]))
+
+    @lru_cache(maxsize=None)
+    def vol2(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(VOL2_DATA, ["time", "id", "volume"]))
+
+    @lru_cache(maxsize=None)
+    def vol3(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(VOL3_DATA, ["time", "id", "volume"]))
+
+    @lru_cache(maxsize=None)
+    def price(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(PRICE_DATA, ["time", "id", "price"]))
+
+    @lru_cache(maxsize=None)
+    def intervals(self):
+        return self.flintContext.read.pandas(
+            test_utils.make_pdf(INTERVALS_DATA, ['time']))
+
+    def clocks(self):
+        from ts.flint import clocks
+        return clocks