Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] [migrate] changes to import and export sections using the COPY command. #13164

Merged
merged 10 commits into from
Jul 25, 2022
15 changes: 11 additions & 4 deletions docs/content/preview/migrate/manual-import/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
title: Manual import
headerTitle: Manual import
linkTitle: Manual import
description: Migrate PostgreSQL data to YugabyteDB.
description: Manual PostgreSQL import to YugabyteDB.
image: /images/section_icons/develop/learn.png
headcontent: Migrate PostgreSQL data to YugabyteDB using ysql_dump.
headcontent: Manual PostgreSQL import to YugabyteDB.
aliases:
- /preview/migrate/migrate-from-postgresql/
menu:
Expand All @@ -17,11 +17,18 @@ type: indexpage

The steps below cover how to manually migrate PostgreSQL data and applications to YugabyteDB.

- [Convert a PostgreSQL schema](migrate-schema/)
- [Migrate a PostgreSQL application](migrate-application/)
- [Export PostgreSQL data](export-data/)
- [Prepare a cluster](prepare-cluster/)
- [Import PostgreSQL data](import-data/)
- [Verify a migration](verify-migration/)

{{< tip title="Migrate using YugabyteDB Voyager" >}}
aishwarya24 marked this conversation as resolved.
Show resolved Hide resolved
To automate your migration from PostgreSQL to YugabyteDB, use [YugabyteDB Voyager](../yb-voyager/).
{{< /tip >}}

<div class="row">
<!-- <div class="row">

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="migrate-schema/">
Expand Down Expand Up @@ -107,4 +114,4 @@ To automate your migration from PostgreSQL to YugabyteDB, use [YugabyteDB Voyage
</a>
</div>

</div>
</div> -->
83 changes: 81 additions & 2 deletions docs/content/preview/migrate/manual-import/export-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,86 @@ menu:
type: docs
---

The recommended way to export data from PostgreSQL for purposes of importing it to YugabyteDB is using the CSV format.

The recommended way to export data from PostgreSQL for purposes of importing it to YugabyteDB is via CSV files using the COPY command.
However, for exporting an entire database that consists of smaller datasets, you use the YugabyteDB [`ysql_dump`](../../../admin/ysql-dump/) utility.

## Export data into CSV files using the COPY command

To export the data, connect to the source PostgreSQL database using the psql tool, and execute the COPY TO command as follows:

```sql
COPY <table_name>
TO '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

{{< note title="Note" >}}

The COPY TO command exports a single table, so you should execute it for every table that you want to export.

{{< /note >}}

It is also possible to export a subset of rows based on a condition:

```sql
COPY (
SELECT * FROM <table_name>
WHERE <condition>
)
TO '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

For all available options provided by the COPY TO command, refer to the [PostgreSQL documentation](https://www.postgresql.org/docs/current/sql-copy.html).

### Parallelize large table export

For large tables, it might be beneficial to parallelize the process by exporting data in chunks as follows:

```sql
COPY (
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET 0
)
TO '<table_name>_1.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

```sql
COPY (
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET num_rows_per_export
)
TO '<table_name>_2.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

```sql
COPY (
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET num_rows_per_export * 2
)
TO '<table_name>_3.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

You can run the above commands in parallel to speed up the process. This approach will also produce multiple CSV files, allowing for parallel import on the YugabyteDB side.

## Export data into SQL script using ysql_dump

An alternative way to export the data is using the YugabyteDB [`ysql_dump`](../../../admin/ysql-dump/) backup utility, which is derived from PostgreSQL pg_dump.

```sh
$ ysql_dump -d <database_name> > <database_name>.sql
```

`ysql_dump` is the ideal option for smaller datasets, because it allows you to export a whole database by running a single command. However, the COPY command is recommended for large databases, because it significantly enhances the performance.

<!--
aishwarya24 marked this conversation as resolved.
Show resolved Hide resolved

## Exporting an entire database

Expand Down Expand Up @@ -66,4 +145,4 @@ COPY (
) TO 'export-2.csv' WITH CSV;

...
```
``` -->
56 changes: 41 additions & 15 deletions docs/content/preview/migrate/manual-import/import-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,28 +13,54 @@ menu:
type: docs
---

The next step is to import the PostgreSQL data into YugabyteDB.

{{< note title="Note" >}}
After the data import step, remember to recreate any constraints and triggers that might have been disabled to speed up loading the data. This would ensure that the database will perform relational integrity checking for data going forward.
{{< /note >}}

## Import a database
## Import data from CSV files

To import an entire database from a `pg_dump` or `ysql_dump` export, use `ysqlsh`. The command should look as shown below.
To import data that was previously exported into CSV files, use the COPY FROM command as follows:

```sh
$ ysqlsh -f <db-sql-script>
```sql
COPY <table_name>
FROM ‘<table_name>.csv’
aishwarya24 marked this conversation as resolved.
Show resolved Hide resolved
WITH (
FORMAT CSV DELIMITER ',' HEADER
ROWS_PER_TRANSACTION 1000
DISABLE_FK_CHECK
);
```

{{< tip title="Tip" >}}
The `ysqlsh` tool is a derivative of the PostgreSQL tool, `psql`. All `psql` commands would work in `ysqlsh`.
{{< /tip >}}
In the command above, the `ROWS_PER_TRANSACTION` parameter splits the updates into smaller transactions (1000 rows each in this example), instead of running a single transaction spawning across all the data in the file. Additionally, the `DISABLE_FK_CHECK` parameter skips the foreign key checks for the duration of the import process.

Both `ROWS_PER_TRANSACTION` and `DISABLE_FK_CHECK` parameters are recommended for the initial import of the data, especially for large tables, because they significantly reduce the total time required to import the data. If you imported the data into multiple CSV files, you need to run the command for every file. You can import multiple files in parallel to further speed up the process.

## Import a table using COPY FROM
For detailed information on the COPY FROM command, refer to the [COPY](../../../api/ysql/the-sql-language/statements/cmd_copy/) statement reference.

Importing a single table (or a partial export from a table) can be done by running the COPY FROM command, and providing it the location of the export file prepared in a previous step. This should look as shown below.
### Error handling

If the COPY FROM command fails during the process, you should try rerunning it. However, you don’t have to rerun the entire file. COPY FROM imports data into rows individually, starting from the top of the file. So if you know that some of the rows have been successfully imported prior to the failure, you can safely ignore those rows by adding the SKIP parameter.

For example, to skip the first 5000 rows in a file, run the command as follows:

```sql
COPY country FROM 'export.csv' DELIMITER ',' CSV HEADER;
COPY <table_name>
FROM ‘<table_name>.csv’
WITH (
FORMAT CSV DELIMITER ',' HEADER
ROWS_PER_TRANSACTION 1000
DISABLE_FK_CHECK
SKIP 5000
);
```

## Import data from SQL script

To import an entire database from a `pg_dump` or `ysql_dump` export, use `ysqlsh` as follows:

```sql
ysqlsh -f <database_name>.sql
```

{{< note title="Note" >}}

After the data import step, remember to recreate any constraints and triggers that might have been disabled to speed up loading the data. This ensures that the database will perform relational integrity checking for data going forward.

{{< /note >}}