Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Hive Input/Output format to use vineyard as the storage backend #1520

Closed
wants to merge 3 commits into from

Conversation

vegetableysm
Copy link
Collaborator

@vegetableysm vegetableysm commented Aug 14, 2023

What do these changes do?

Related issue number

Fixes #1420

Refer to:
PR #1554
PR #1551
PR #1548
PR #1552

@netlify
Copy link

netlify bot commented Aug 14, 2023

Deploy Preview for v6d ready!

Name Link
🔨 Latest commit 72bd707
🔍 Latest deploy log https://app.netlify.com/sites/v6d/deploys/64f57f5a760e6d0007ca80f1
😎 Deploy Preview https://deploy-preview-1520--v6d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@sighingnow sighingnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @vegetableysm

A quick note about string array and large string array: string array cannot contains chars more than. 2^32-1 while large string array can.

In vineyard, vineyard::StringArray is arrow's large string array (LargeVarCharArray in java) and you shouldn't put string array to vineyard in any time. When hive gives you string array, cast it to large string array first then put to vineyard.

Copy link
Member

@sighingnow sighingnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sighingnow
Copy link
Member

@vegetableysm please

Thanks!

@sighingnow sighingnow changed the title Alternative solution for integrating Spark/Hive Implements Hive Input/Output format to use vineyard as the storage backend Aug 22, 2023
@vegetableysm
Copy link
Collaborator Author

vegetableysm commented Aug 23, 2023

performance test

prepare data

create table vineyard_table(
                            src_id int,
                            dst_id int)
        row format serde "io.v6d.hive.ql.io.VineyardSerDe"
        stored as
            INPUTFORMAT 'io.v6d.hive.ql.io.VineyardInputFormat'
            OUTPUTFORMAT 'io.v6d.hive.ql.io.VineyardOutputFormat'
        LOCATION "vineyard:///opt/hive/data/warehouse/vineyard_table";

create table nomal_table(
                            src_id int,
                            dst_id int);

create table hive_test_data_livejournal(
                            src_id int,
                            dst_id int
        )
        row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
        stored as textfile ;
        load data local inpath "file:///opt/hive/data/warehouse/soc-livejournal.csv" into table hive_test_data_livejournal;

insert test

insert into vineyard_table select * from hive_test_data_livejournal; 
insert into nomal_table select * from hive_test_data_livejournal; 

select test

select * from vineyard_table where src_id = 1;
select * from nomal_table where src_id = 1;

vegetableysm and others added 3 commits September 4, 2023 14:54
…ckend

Signed-off-by: vegetableysm <yuanshumin.ysm@alibaba-inc.com>
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
…ters

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hive integration: vineyard serve as the storage backend for hive
2 participants