Merge pull request #89 from treasure-data/doc-issue-87-88

Doc: URL fix & comparison between pytd, td-client-python, and pandas-td
treasure-data · May 11, 2020 · 1862dc5 · 1862dc5
2 parents 78409e0 + 7bb739e
commit 1862dc5
Show file tree

Hide file tree

Showing 9 changed files with 51 additions and 23 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -32,7 +32,7 @@ v0.8.0 (2019-09-17)
    (`#43 <https://github.com/treasure-data/pytd/pull/43>`__, `#44 <https://github.com/treasure-data/pytd/pull/44>`__)
 -  Disable ``type``, one of the Treasure Data-specific query parameters, because it is conflicted with the ``engine`` option.
    (`#45 <https://github.com/treasure-data/pytd/pull/45>`__)
--  Add `td-pyspark <https://pypi.org/project/td-pyspark/>`__ dependency for easily accessing to the `td-spark <https://support.treasuredata.com/hc/en-us/articles/360001487167-Apache-Spark-Driver-td-spark-FAQs>`__ functionalities.
+-  Add `td-pyspark <https://pypi.org/project/td-pyspark/>`__ dependency for easily accessing to the `td-spark <https://treasure-data.github.io/td-spark/>`__ functionalities.
    (`#46 <https://github.com/treasure-data/pytd/pull/46>`__, `#47 <https://github.com/treasure-data/pytd/pull/47>`__)
 
 v0.7.0 (2019-08-23)

diff --git a/README.rst b/README.rst
@@ -6,7 +6,7 @@ pytd
 **pytd** provides user-friendly interfaces to Treasure Data’s `REST
 APIs <https://github.com/treasure-data/td-client-python>`__, `Presto
 query
-engine <https://support.treasuredata.com/hc/en-us/articles/360001457427-Presto-Query-Engine-Introduction>`__,
+engine <https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083607/Presto+Query+Engine+Introduction>`__,
 and `Plazma primary
 storage <https://www.slideshare.net/treasure-data/td-techplazma>`__.
 
@@ -29,9 +29,9 @@ Usage
    Colaboratory <https://colab.research.google.com/drive/1ps_ChU-H2FvkeNlj1e1fcOebCt4ryN11>`__
 
 Set your `API
-key <https://support.treasuredata.com/hc/en-us/articles/360000763288-Get-API-Keys>`__
+key <https://tddocs.atlassian.net/wiki/spaces/PD/pages/1081428/Getting+Your+API+Keys>`__
 and
-`endpoint <https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints>`__
+`endpoint <https://tddocs.atlassian.net/wiki/spaces/PD/pages/1085143/Sites+and+Endpoints>`__
 to the environment variables, ``TD_API_KEY`` and ``TD_API_SERVER``,
 respectively, and create a client instance:
 
@@ -93,7 +93,7 @@ data to Treasure Data:
       query through the Presto query engine.
    -  Recommended only for a small volume of data.
 
-3. `td-spark <https://support.treasuredata.com/hc/en-us/articles/360001487167-Apache-Spark-Driver-td-spark-FAQs>`__:
+3. `td-spark <https://treasure-data.github.io/td-spark/>`__:
    ``spark``
 
    -  Local customized Spark instance directly writes ``DataFrame`` to
@@ -137,8 +137,36 @@ with ``td_spark_path`` option would be helpful.
    writer = SparkWriter(apikey='1/XXX', endpoint='https://api.treasuredata.com/', td_spark_path='/path/to/td-spark-assembly.jar')
    client.load_table_from_dataframe(df, 'mydb.bar', writer=writer, if_exists='overwrite')
 
+Comparison between pytd, td-client-python, and pandas-td
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Treasure Data offers three different Python clients on GitHub, and the following list summarizes their characteristics.
+
+1. `td-client-python <https://github.com/treasure-data/td-client-python>`__
+
+   - Basic REST API wrapper.
+   - Similar functionalities to td-client-{`ruby <https://github.com/treasure-data/td-client-ruby>`__, `java <https://github.com/treasure-data/td-client-java>`__, `node <https://github.com/treasure-data/td-client-node>`__, `go <https://github.com/treasure-data/td-client-go>`__}.
+   - The capability is limited by `what Treasure Data REST API can do <https://tddocs.atlassian.net/wiki/spaces/PD/pages/1085354/REST+APIs+in+Treasure+Data>`__.
+
+2. **pytd**
+
+   - Access to Plazma via td-spark as introduced above.
+   - Efficient connection to Presto based on `presto-python-client <https://github.com/prestodb/presto-python-client>`__.
+   - Multiple data ingestion methods and a variety of utility functions.
+
+3. `pandas-td <https://github.com/treasure-data/pandas-td>`__ *(deprecated)*
+
+   - Old tool optimized for `pandas <https://pandas.pydata.org>`__ and `Jupyter Notebook <https://jupyter.org>`__.
+   - **pytd** offers its compatible function set (see below for the detail).
+
+An optimal choice of package depends on your specific use case, but common guidelines can be listed as follows:
+
+- Use td-client-python if you want to execute *basic CRUD operations* from Python applications.
+- Use **pytd** for (1) *analytical purpose* relying on pandas and Jupyter Notebook, and (2) achieving *more efficient data access* at ease.
+- Do not use pandas-td. If you are using pandas-td, replace the code with pytd based on the following guidance as soon as possible.
+
 How to replace pandas-td
-------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 **pytd** offers
 `pandas-td <https://github.com/treasure-data/pandas-td>`__-compatible
@@ -180,13 +208,6 @@ Consequently, all ``pandas_td`` code should keep running correctly with
 `here <https://github.com/treasure-data/pytd/issues/new>`__ if you
 noticed any incompatible behaviors.
 
-.. note:: There is a known difference to ``pandas_td.to_td`` function for type conversion.
-   Since :class:`pytd.writer.BulkImportWriter`, default writer pytd, uses CSV as an intermediate file before
-   uploading a table, column type may change via ``pandas.read_csv``. To respect column type as much as possible,
-   you need to pass `fmt="msgpack"` argument to ``to_td`` function.
-
-   For more detail, see ``fmt`` option of :func:`pytd.pandas_td.to_td`.
-
 .. |Build status| image:: https://github.com/treasure-data/pytd/workflows/Build/badge.svg
    :target: https://github.com/treasure-data/pytd/actions/
 .. |PyPI version| image:: https://badge.fury.io/py/pytd.svg

diff --git a/doc/index.rst b/doc/index.rst
@@ -5,6 +5,13 @@
 
 .. include:: ../README.rst
 
+.. note:: There is a known difference to ``pandas_td.to_td`` function for type conversion.
+   Since :class:`pytd.writer.BulkImportWriter`, default writer pytd, uses CSV as an intermediate file before
+   uploading a table, column type may change via ``pandas.read_csv``. To respect column type as much as possible,
+   you need to pass `fmt="msgpack"` argument to ``to_td`` function.
+
+   For more detail, see ``fmt`` option of :func:`pytd.pandas_td.to_td`.
+
 More Examples
 -------------
 

diff --git a/pytd/client.py b/pytd/client.py
@@ -25,7 +25,7 @@ class Client(object):
     endpoint : str, optional
         Treasure Data API server. If not given, ``https://api.treasuredata.com`` is
         used by default. List of available endpoints is:
-        https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints
+        https://tddocs.atlassian.net/wiki/spaces/PD/pages/1085143/Sites+and+Endpoints
 
     database : str, default: 'sample_datasets'
         Name of connected database.
@@ -203,7 +203,7 @@ def query(self, query, engine=None, **kwargs):
             - ``wait_callback`` (function): called every interval against job itself
             - ``engine_version`` (str): run query with Hive 2 if this parameter
               is set to ``"experimental"`` and ``engine`` denotes Hive.
-              https://support.treasuredata.com/hc/en-us/articles/360027259074-How-to-use-Hive-2
+              https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083123/Using+Hive+2+to+Create+Queries
 
             Meanwhile, when a following argument is set to ``True``, query is
             deterministically issued via ``tdclient``.

diff --git a/pytd/pandas_td/__init__.py b/pytd/pandas_td/__init__.py
@@ -23,7 +23,7 @@ def connect(apikey=None, endpoint=None, **kwargs):
     endpoint : str, optional
         Treasure Data API server. If not given, ``https://api.treasuredata.com`` is
         used by default. List of available endpoints is:
-        https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints
+        https://tddocs.atlassian.net/wiki/spaces/PD/pages/1085143/Sites+and+Endpoints
 
     kwargs : dict, optional
         Optional arguments
@@ -174,7 +174,7 @@ def read_td_query(
         - ``wait_callback`` (function): called every interval against job itself
         - ``engine_version`` (str): run query with Hive 2 if this parameter is
           set to ``"experimental"`` in ``HiveQueryEngine``.
-          https://support.treasuredata.com/hc/en-us/articles/360027259074-How-to-use-Hive-2
+          https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083123/Using+Hive+2+to+Create+Queries
 
     Returns
     -------

diff --git a/pytd/query_engine.py b/pytd/query_engine.py
@@ -75,7 +75,7 @@ def execute(self, query, **kwargs):
             - ``wait_callback`` (function): called every interval against job itself
             - ``engine_version`` (str): run query with Hive 2 if this parameter
               is set to ``"experimental"`` in ``HiveQueryEngine``.
-              https://support.treasuredata.com/hc/en-us/articles/360027259074-How-to-use-Hive-2
+              https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083123/Using+Hive+2+to+Create+Queries
 
             Meanwhile, when a following argument is set to ``True``, query is
             deterministically issued via ``tdclient``.
@@ -179,7 +179,7 @@ def _get_tdclient_cursor(self, con, **kwargs):
             - ``wait_callback`` (function): called every interval against job itself
             - ``engine_version`` (str): run query with Hive 2 if this parameter
               is set to ``"experimental"`` in ``HiveQueryEngine``.
-              https://support.treasuredata.com/hc/en-us/articles/360027259074-How-to-use-Hive-2
+              https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083123/Using+Hive+2+to+Create+Queries
 
         Returns
         -------
@@ -399,7 +399,7 @@ def cursor(self, force_tdclient=True, **kwargs):
             - ``wait_callback`` (function): called every interval against job itself
             - ``engine_version`` (str): run query with Hive 2 if this parameter
               is set to ``"experimental"``.
-              https://support.treasuredata.com/hc/en-us/articles/360027259074-How-to-use-Hive-2
+              https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083123/Using+Hive+2+to+Create+Queries
 
         Returns
         -------

diff --git a/pytd/spark.py b/pytd/spark.py
@@ -68,7 +68,7 @@ def fetch_td_spark_context(
     endpoint : str, optional
         Treasure Data API server. If not given, ``https://api.treasuredata.com`` is
         used by default. List of available endpoints is:
-        https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints
+        https://tddocs.atlassian.net/wiki/spaces/PD/pages/1085143/Sites+and+Endpoints
 
     td_spark_path : str, optional
         Path to td-spark-assembly_x.xx-x.x.x.jar. If not given, seek a path

diff --git a/pytd/table.py b/pytd/table.py
@@ -73,7 +73,7 @@ def create(self, column_names=[], column_types=[]):
         column_types : list of str, optional
             Column types corresponding to the names. Note that Treasure Data
             supports limited amount of types as documented in:
-            https://support.treasuredata.com/hc/en-us/articles/360001266468-Schema-Management
+            https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083743/Schema+Management
         """
         if len(column_names) > 0:
             schema = ", ".join(

diff --git a/pytd/writer.py b/pytd/writer.py
@@ -235,7 +235,7 @@ def _insert_into(self, table, list_of_tuple, column_names, column_types, if_exis
         column_types : list of str
             Column types corresponding to the names. Note that Treasure Data
             supports limited amount of types as documented in:
-            https://support.treasuredata.com/hc/en-us/articles/360001266468-Schema-Management
+            https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083743/Schema+Management
 
         if_exists : {'error', 'overwrite', 'append', 'ignore'}
             What happens when a target table already exists.