Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add libsvm dataset support #32

Merged
merged 3 commits into from
Dec 22, 2018
Merged

Conversation

yupbank
Copy link
Member

@yupbank yupbank commented Dec 19, 2018

Address #10

add make_libsvm_dataset function, which returns a dataset contains (feature, label) per row.

def make_libsvm_dataset(file_names,
                         num_features,
                         dtype=None,
                         label_dtype=None,
                         batch_size=1,
                         compression_type='',
                         buffer_size=None,
                         num_parallel_parser_calls=None,
                         drop_final_batch=False,
                         prefetch_buffer_size=0):

@yupbank yupbank changed the title [WIP] Add libsvm dataset support Add libsvm dataset support Dec 19, 2018
@yongtang
Copy link
Member

Overall looks good, though I am wondering if we could expose a class interface such as class LibSVMDataset(dataset_ops.DatasetSource)? Maybe we could add the class interface on top of the current implementation?

@yupbank
Copy link
Member Author

yupbank commented Dec 21, 2018

it is hard, since we only have a parsing kernel for now, we need to implement a datasource kernel to support that basically.

if it is really worth it, i can make a second pr to port current paring kernel into datasource kernel

and the function pattern is also from tensorflow core https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/experimental/ops/readers.py#L311

Copy link
Member

@yongtang yongtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is good. We could consider adding DatasetSource support later. LGTM

@yongtang yongtang merged commit 726907c into tensorflow:master Dec 22, 2018
@yupbank yupbank deleted the add-libsvm branch December 22, 2018 18:25
@yongtang
Copy link
Member

@yupbank Added a PR #38 to fix some minor issues. Please take a look.

yongtang pushed a commit that referenced this pull request Jan 27, 2022
* feat: reading from bigtable (#2)

Implements reading from bigtable in a synchronous manner.

* feat: RowRange and RowSet API.

* feat: parallel read (#4)

In this pr we make the read methods accept a row_set reading only rows specified by the user.
We also add a parallel read, that leverages the sample_row_keys method to split work among workers.

* feat: version filters (#6)

This PR adds support for Bigtable version filters.

* feat: support for other data types (#5)

* fix: linter fixes (#8)

* feat docs (#9)

* fix: building on windows (#12)

* fix: refactor bigtable package to api folder (#14)

moved bigtable to tfensorflow_io.python.api

* fix: tests hanging (#30)

changed path to bigtable emulator and cbt in tests

moved arguments' initializations to the body of the function in bigtable_ops.py

 fixed interleaveFromRange of column filters when using only one column

* fix: temporarily disable macos tests (#32)

* disable tests on macos

Co-authored-by: Kajetan Boroszko <kajetan@unoperate.com>
Co-authored-by: Kajetan Boroszko <kajetan.boroszko@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants