Build system revamp

utdemir · Jul 13, 2019 · 30d1843 · 30d1843
1 parent a0c05a8
commit 30d1843
Show file tree

Hide file tree

Showing 37 changed files with 2,070 additions and 1,633 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,5 @@
 .envrc
+*.lock
 
 # nix
 result

diff --git a/README.md b/README.md
@@ -11,15 +11,24 @@ A distributed data processing framework in pure Haskell. Inspired by [Apache Spa
 
 ### distributed-dataset
 
-This package provides a `Dataset` type which lets you express and execute transformations on a distributed multiset. Its API is highly inspired by Apache Spark.
+This package provides a `Dataset` type which lets you express and execute
+transformations on a distributed multiset. Its API is highly inspired
+by Apache Spark.
 
-It uses pluggable `Backend`s for spawning executors and `ShuffleStore`s for exchanging information. See 'distributed-dataset-aws' for an implementation using AWS Lambda and S3.
+It uses pluggable `Backend`s for spawning executors and `ShuffleStore`s
+for exchanging information. See 'distributed-dataset-aws' for an
+implementation using AWS Lambda and S3.
 
-It also exposes a more primitive `Control.Distributed.Fork` module which lets you run `IO` actions remotely. It is especially useful when your task is [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel).
+It also exposes a more primitive `Control.Distributed.Fork`
+module which lets you run `IO` actions remotely. It
+is especially useful when your task is [embarrassingly
+parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel).
 
 ### distributed-dataset-aws
 
-This package provides a backend for 'distributed-dataset' using AWS services. Currently it supports running functions on AWS Lambda and using an S3 bucket as a shuffle store.
+This package provides a backend for 'distributed-dataset' using AWS
+services. Currently it supports running functions on AWS Lambda and
+using an S3 bucket as a shuffle store.
 
 ### distributed-dataset-opendatasets
 
@@ -34,13 +43,16 @@ Provides `Dataset`'s reading from public open datasets. Currently it can fetch G
   $ cd distributed-dataset
   ```
 
-* Make sure that you have AWS credentials set up. The easiest way is to install [AWS command line interface](https://aws.amazon.com/cli/) and to run:
+* Make sure that you have AWS credentials set up. The easiest way is
+  to install [AWS command line interface](https://aws.amazon.com/cli/)
+  and to run:
 
   ```sh
   $ aws configure
   ```
 
-* Create an S3 bucket to put the deployment artifact in. You can use the console or the CLI:
+* Create an S3 bucket to put the deployment artifact in. You can use
+  the console or the CLI:
 
   ```sh
   $ aws s3api create-bucket --bucket my-s3-bucket
@@ -70,37 +82,36 @@ Provides `Dataset`'s reading from public open datasets. Currently it can fetch G
 
 ## Stability
 
-Experimental. Expect lots of missing features, bugs, instability and API changes. You will probably need to modify the source if you want to do anything serious. See [issues](https://github.com/utdemir/distributed-dataset/issues).
+Experimental. Expect lots of missing features, bugs,
+instability and API changes. You will probably need to
+modify the source if you want to do anything serious. See
+[issues](https://github.com/utdemir/distributed-dataset/issues).
 
 ## Contributing
 
 I am open to contributions; any issue, PR or opinion is more than welcome.
 
-## Hacking
+* In order to develop `distributed-dataset`, you can use;
+  * On Linux: `Nix`, `cabal-install` or `stack`.
+  * On MacOS: `stack` with `docker`.
+* Use [ormolu](https://github.com/tweag/ormolu) to format source code.
 
-* You can use `Nix`, `cabal-install` or `stack`.
+### Nix
 
-If you use Nix:
+* You can use [my binary cache on cachix](https://utdemir.cachix.org/)
+  so that you don't recompile half of the Hackage.
+* `nix-shell` will drop you into a shell with `ormolu`, `cabal-install`,
+  `.ghcid` alongside with all required haskell and system dependencies. 
+  You can use `cabal new-*` commands there.
+* There is a `./make.sh` at the root folder with some utilities like
+  formatting the source code or running `ghcid`, run `./make.sh --help`
+  to see the usage.
 
-* You can use [my binary cache on cachix](https://utdemir.cachix.org/) so that you don't recompile half of the Hackage.
-* 'nix-shell' gives you a development shell with required Haskell dependencies alongside with `cabal-install`, `ghcid` and `stylish-haskell`. Example:
+### Stack
 
-```
-$ nix-shell --pure --run 'ghcid -c "cabal new-repl distributed-dataset-opendatasets"'
-```
-
-* Use stylish-haskell and hlint:
-
-```
-$ nix-shell --run 'find -name "*.hs" -exec stylish-haskell -i {} \;'
-$ nix-shell --run 'hlint .'
-``` 
-
-* You can generate the Haddocks using 
-
-```
-$ nix-build -A docs
-```
+* Make sure that you have `Docker` installed.
+* Use `stack` as usual, it will automatically use a Docker image
+* Run `./make.sh stack-build` before you send a PR to test different resolvers.
 
 ## Related Work
 
@@ -109,7 +120,7 @@ $ nix-build -A docs
 * [Towards Haskell in Cloud](https://www.microsoft.com/en-us/research/publication/towards-haskell-cloud/) by Jeff Epstein, Andrew P. Black, Simon L. Peyton Jones 
 * [Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing](https://cs.stanford.edu/~matei/papers/2012/nsdi_spark.pdf) by Matei Zaharia, et al.
 
-## Projects
+### Projects
 
 * [Apache Spark](https://spark.apache.org/).
 * [Sparkle](https://github.com/tweag/sparkle): Run Haskell on top of Apache Spark.

diff --git a/ci.sh b/ci.sh
diff --git a/default.nix b/default.nix
@@ -1,5 +1,4 @@
-{ compiler ? "ghc865"
-, pkgs ? import ./pkgs.nix
+{ pkgs ? import ./pkgs.nix
 }:
 
 let
@@ -45,22 +44,7 @@ overlays = se: su: {
     });
 
   # Use newer version
-  # Haddocks does not work with ghc 8.4
-  stratosphere = pkgs.haskell.lib.dontHaddock se.stratosphere_0_40_0;
-
-  # Pulls in a broken dependency on 1.8.1, fixed in master but no new release yet.
-  # https://github.com/yesodweb/Shelly.hs/commit/8288d27b93b57574135014d0888cf33f325f7c80
-  shelly =
-    se.callCabal2nix
-      "shelly"
-      (builtins.fetchGit {
-        url = "https://github.com/yesodweb/Shelly.hs";
-        rev = "8288d27b93b57574135014d0888cf33f325f7c80";
-      })
-      {};
-
-  # Always use the new Cabal
-  Cabal = se.Cabal_2_4_1_0;
+  stratosphere = se.stratosphere_0_40_0;
 
   # not on Hackage yet
   ormolu =
@@ -73,7 +57,7 @@ overlays = se: su: {
       {};
 };
 
-haskellPackages = pkgs.haskell.packages.${compiler}.override {
+haskellPackages = pkgs.haskell.packages.ghc865.override {
   overrides = overlays;
 };
 
@@ -99,7 +83,7 @@ in rec
       ${distributed-dataset-aws.src} \
       ${distributed-dataset-opendatasets.src}
   '';
-} // (if compiler > "ghc86" then {
+
   shell = haskellPackages.shellFor {
     packages = p: with p; [
       distributed-dataset
@@ -110,9 +94,8 @@ in rec
     buildInputs = with haskellPackages; [
       cabal-install
       ghcid
-      stylish-haskell
       ormolu
     ];
     withHoogle = true;
   };
-} else {})
+}
diff --git a/distributed-dataset-aws/src/Control/Distributed/Dataset/AWS.hs b/distributed-dataset-aws/src/Control/Distributed/Dataset/AWS.hs
@@ -1,27 +1,29 @@
-{-# LANGUAGE StaticPointers   #-}
+{-# LANGUAGE StaticPointers #-}
 {-# LANGUAGE TypeApplications #-}
 
 module Control.Distributed.Dataset.AWS
   ( s3ShuffleStore
-  -- Re-exports
-  , module Control.Distributed.Fork.AWS
-  ) where
+  , -- Re-exports
+    module Control.Distributed.Fork.AWS
+  )
+where
 
 --------------------------------------------------------------------------------
-import           Conduit
-import           Control.Distributed.Closure
-import           Control.Lens
-import           Control.Monad
-import           Control.Monad.Trans.AWS                  (AWST)
-import qualified Data.Text                                as T
-import           Network.AWS
-import           Network.AWS.Data.Body                    (RsBody (_streamBody))
-import qualified Network.AWS.S3                           as S3
-import qualified Network.AWS.S3.StreamingUpload           as S3
-import           System.IO.Unsafe
+import Conduit
+import Control.Distributed.Closure
 --------------------------------------------------------------------------------
-import           Control.Distributed.Dataset.ShuffleStore
-import           Control.Distributed.Fork.AWS
+import Control.Distributed.Dataset.ShuffleStore
+import Control.Distributed.Fork.AWS
+import Control.Lens
+import Control.Monad
+import Control.Monad.Trans.AWS (AWST)
+import qualified Data.Text as T
+import Network.AWS
+import Network.AWS.Data.Body (RsBody (_streamBody))
+import qualified Network.AWS.S3 as S3
+import qualified Network.AWS.S3.StreamingUpload as S3
+import System.IO.Unsafe
+
 --------------------------------------------------------------------------------
 
 -- |
@@ -30,29 +32,36 @@ import           Control.Distributed.Fork.AWS
 -- TODO: Cleanup
 -- TODO: Use a temporary bucket created by CloudFormation
 s3ShuffleStore :: T.Text -> T.Text -> ShuffleStore
-s3ShuffleStore bucket' prefix'
-  = ShuffleStore
-    { ssGet = static (\bucket prefix num range -> do
-        ret <- runAWS globalAWSEnv $
-          send $ S3.getObject
-                   (S3.BucketName bucket)
-                   (S3.ObjectKey $ prefix <> T.pack (show num))
-                   & S3.goRange
-                       .~ (case range of
-                             RangeAll -> Nothing
-                             RangeOnly lo hi ->
-                               Just . T.pack $ "bytes=" <> show lo <> "-" <> show hi
-                          )
-        _streamBody $ ret ^. S3.gorsBody
-      ) `cap` cpure (static Dict) bucket' `cap` cpure (static Dict) prefix'
-    , ssPut = static (\bucket prefix num ->
-        void . transPipe @(AWST (ResourceT IO)) (runAWS globalAWSEnv)
-          $ S3.streamUpload Nothing $ S3.createMultipartUpload
-              (S3.BucketName bucket)
-              (S3.ObjectKey $ prefix <> T.pack (show num))
-      ) `cap` cpure (static Dict) bucket' `cap` cpure (static Dict) prefix'
-
-
+s3ShuffleStore bucket' prefix' =
+  ShuffleStore
+    { ssGet = static
+        ( \bucket prefix num range -> do
+          ret <-
+            runAWS globalAWSEnv $
+              send $
+              S3.getObject
+                (S3.BucketName bucket)
+                (S3.ObjectKey $ prefix <> T.pack (show num)) &
+              S3.goRange .~
+              ( case range of
+                RangeAll -> Nothing
+                RangeOnly lo hi ->
+                  Just . T.pack $ "bytes=" <> show lo <> "-" <> show hi
+              )
+          _streamBody $ ret ^. S3.gorsBody
+        ) `cap`
+        cpure (static Dict) bucket' `cap`
+        cpure (static Dict) prefix'
+    , ssPut = static
+      ( \bucket prefix num ->
+        void . transPipe @(AWST (ResourceT IO)) (runAWS globalAWSEnv) $
+          S3.streamUpload Nothing $
+          S3.createMultipartUpload
+            (S3.BucketName bucket)
+            (S3.ObjectKey $ prefix <> T.pack (show num))
+      ) `cap`
+      cpure (static Dict) bucket' `cap`
+      cpure (static Dict) prefix'
     }
 
 -- FIXME
@@ -64,4 +73,3 @@ s3ShuffleStore bucket' prefix'
 globalAWSEnv :: Env
 globalAWSEnv = unsafePerformIO $ newEnv Discover
 {-# NOINLINE globalAWSEnv #-}
-
diff --git a/distributed-dataset-aws/src/Control/Distributed/Fork/AWS.hs b/distributed-dataset-aws/src/Control/Distributed/Fork/AWS.hs
@@ -1,9 +1,9 @@
 module Control.Distributed.Fork.AWS
   ( module Control.Distributed.Fork.AWS.Lambda
-  ) where
+  )
+where
 
 --------------------------------------------------------------------------------
-import           Control.Distributed.Fork.AWS.Lambda
---------------------------------------------------------------------------------
-
+import Control.Distributed.Fork.AWS.Lambda
 
+--------------------------------------------------------------------------------