Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying multiple areas and areas by % using -a option #209

Closed
asheeshrana opened this issue Mar 8, 2018 · 10 comments
Closed

Allow specifying multiple areas and areas by % using -a option #209

asheeshrana opened this issue Mar 8, 2018 · 10 comments

Comments

@asheeshrana
Copy link

From commandline, currently we can only specify one area at a time. I want to specify multiple areas as well as areas using top, left, bottom, right as % of height and width of the page for my use case.

Multiple areas are helpful when trying to extract tables from multiple parts of the page using commandline

Specifying areas using % of height and width is useful when I don't want to calculate absolute value of top/left/bottom/right points to define areas.

I will be happy to do the changes and create pull request myself if such a change is acceptable.

@jazzido
Copy link
Contributor

jazzido commented Mar 9, 2018

Hi @asheeshrana,

That's a good suggestion, and should be pretty easy to implement. My only concern is keeping the command line options consistent, and easy to understand.

BTW, we'd love to get a pull request for this feature :)

@asheeshrana
Copy link
Author

asheeshrana commented Mar 9, 2018

Thanks for the response.
I propose to change the behavior of "-a" option as follows:
-a top,left,bottom,right,top,left,bottom,right...

Each group of 4 parameters would define an area that could be used to extract the table

If the value of parameters top, left, bottom, right is between 0-1 (inclusive), we can consider it as a fractions of height and width... e.g. the below parameters
-a 0,0,1,0.5,0,0.5,1,1 would define two areas such that
area 1 = (top = 0 x height, left = 0 x width, bottom = 1 x height, right = 0.5 x width)
area 2 = (top = 0 x height, left = 0.5 x width, bottom = 1 x height, right = 1 x width)
Basically it is defining areas that split page into two areas right at the middle.

And update the description for -a option as
-a/--area = Portion of the page to analyze. Accepts sets of (top,left,bottom,right). Example: --area 269.875,12.75,790.5,561,300,570,500,700. Additionally, if all values are between 0-1 (inclusive), input will be taken as fraction of actual height or width of the page. Example: --area 0,0,1,0.5,0,0.5,1,1. Default is entire page

Let me know if you have any concerns... Expect a pull request soon :).

@criztovyl
Copy link
Contributor

I have concerns about the between-0-and-1 overloading of --area, for me a parameter should not accept ambiguous formats.
I would rather go by explicitly specifying that you want percent/fractions.

I would suggest using a prefix, in example % and / (fraction): --area %0,0,100,50,50,100,100 and --area /0,0,1,0.5,0.5,1,1.

--fraction and --percent(-age) might also be possible, but -p is already --pages, don't know for -f. :D

@asheeshrana
Copy link
Author

I would suggest using a prefix, in example % and / (fraction): --area %0,0,100,50,50,100,100 and --area /0,0,1,0.5,0.5,1,1.

@criztovyl
I will take your suggestion and use / to explicitly indicate that it is a fraction. Using both / and % seems redundant so I will only allow "/".

New description
-a/--area = Portion of the page to analyze. Accepts sets of (top,left,bottom,right). Example: --area 269.875,12.75,790.5,561,300,570,500,700. Additionally, if all values are between 0-1 (inclusive) and preceded by "/", input will be taken as fraction of actual height or width of the page. Example: --area /0,0,1,0.5,0,0.5,1,1. Default is entire page

PS: -f is being used for format (JSON, CSV etc...)

@jazzido
Copy link
Contributor

jazzido commented Mar 12, 2018

+1 to the prefix.

BTW, let's make sure that we're validating the input.

Usability-wise, percentages (0-100) strike me better than 0-1 fractional values. What do you think?

@asheeshrana
Copy link
Author

I don't have preference for either. I will do % then.

New description
-a/--area = Portion of the page to analyze. Accepts sets of (top,left,bottom,right). Example: --area 269.875,12.75,790.5,561,300,570,500,700. Additionally, if all values are between 0-100 (inclusive) and preceded by "%", input will be taken as % of actual height or width of the page. Example: --area %0,0,100,50,50,100,100. Default is entire page

@jazzido
Copy link
Contributor

jazzido commented Mar 12, 2018

BTW, I'm not sure about extending the valid inputs for -a so it accepts sets of rectangular areas.

What about changing CommandLineApp so it allows repeated appearances of the -a argument? Your example would be something like:

-a 269.875,12.75,790.5,561 -a 300,570,500,700.

@asheeshrana
Copy link
Author

I agree multiple occurrences of-a is much cleaner.

Updated description...
-a/--area = Portion of the page to analyze. Accepts top,left,bottom,right . Example: --area 269.875,12.75,790.5,561. If all values are between 0-100 (inclusive) and preceded by "%", input will be taken as % of actual height or width of the page. Example: --area %0,0,100,50. To specify multiple areas, -a option should be repeated. Default is entire page

@asheeshrana
Copy link
Author

Created the pull request... tabula-java allow specifying multiple areas

@asheeshrana
Copy link
Author

The PR was merged and hence closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants