update contributing guide (#285)

unionai-oss · Oct 6, 2020 · 3c4589d · 3c4589d
1 parent 47bc7b8
commit 3c4589d
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 29 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -19,51 +19,74 @@ create a development environment that is separate from your existing Python
 environment so that you can make and test changes without compromising your
 own work environment.
 
-### Dataframe Style Guides
-We have guidelines regarding dataframe and schema styles that are enforced for
-each pull request:
+### Dataframe Schema Style Guides
+
+We have guidelines regarding dataframe and schema styles that are encouraged
+for each pull request:
 
 - If specifying a single column DataFrame, this can be expressed as a one-liner:
-```DataFrameSchema({"col1": Column(...)})```
+    ```python
+    DataFrameSchema({"col1": Column(...)})
+    ```
 
 - If specifying one column with multiple lines, or multiple columns:
-    ```
-    DataFrameSchema({
-        "col1": Column(type, checks=[
-            Check(...),
-            Check(...),
-        ]),
-    })
-
-
-    DataFrameSchema({
-        "col1": Column(...),
-        "col2": Column(...),
-    })
+    ```python
+    DataFrameSchema(
+        {
+            "col1": Column(
+                int,
+                checks=[
+                    Check(...),
+                    Check(...),
+                ]
+            ),
+        }
+    )
     ```
 
-- If specifying single columns with additional arguments
-    ```
-    DataFrameSchema({"a": Column(Int, nullable=True)},
-                    strict=True)
+- If specifying columns with additional arguments that fit in one line:
+    ```python
+    DataFrameSchema(
+        {"a": Column(int, nullable=True)},
+        strict=True
+    )
     ```
 
-- If specifying columns with additional arguments
-    ```
+- If specifying columns with additional arguments that don't fit in one line:
+    ```python
     DataFrameSchema(
         {
-            "col1": Column(...),
-            "col2": Column(...),
+            "a": Column(
+                int,
+                nullable=True,
+                coerce=True,
+                ...
+            ),
+            "b": Column(
+                ...,
+            )
         },
         strict=True)
     ```
 
+### Set up `pre-commit`
+
+This project uses [pre-commit](https://pre-commit.com/) to ensure that code
+standard checks pass locally before pushing to the remote project repo. Follow
+the [installation instructions](https://pre-commit.com/#installation), then
+set up hooks with `pre-commit install`. After `pylint` and `mypy` checks should
+be run with every commit.
+
 ### Run the tests
 Before submitting your changes for review, make sure to check that your changes
 do not break any tests by running: ``pytest tests/``
 
-Additionally, sphinxdocs may block you; make sure that the docs build successfully:
-``python -m sphinx -E -W -b=doctest "docs/source" "docs/_build"``
+Additionally, sphinxdocs may block you; make sure that the docs build
+successfully:
+
+```
+make docs
+```
 
 ### Raising Pull Requests
 

diff --git a/README.md b/README.md
@@ -136,7 +136,7 @@ Here are a few other alternatives for validating Python data structures.
 - [pandas-validator](https://github.com/c-data/pandas-validator)
 - [table_enforcer](https://github.com/xguse/table_enforcer)
 
-**Other tools that include data validation**
+**Other tools for data validation**
 
 - [great_expectations](https://github.com/great-expectations/great_expectations)
 
@@ -147,7 +147,7 @@ Here are a few other alternatives for validating Python data structures.
 - `check_input` and `check_output` decorators enable seamless integration with
   existing code.
 - `Check`s provide flexibility and performance by providing access to `pandas`
-  API by design.
+  API by design and offers built-in checks for common data tests.
 - `Hypothesis` class provides a tidy-first interface for statistical hypothesis
   testing.
 - `Check`s and `Hypothesis` objects support both tidy and wide data validation.