Skip to content

Conversation

@ahadas
Copy link
Contributor

@ahadas ahadas commented Jul 18, 2020

The background for this PR is that file-search functionality was recently added to muCommander but we currently only search files by their name, there's no way to search by the content of the files (that appears to be desired). It seems like the grep implementation in unix4j can be used for this.

However, in muCommander we abstract files with the AbstractFile class. We have concrete implementations of the AbstractFile for various formats, like SFTP, SMB and so on, that we cannot use Java's File/Path for. The content of AbstractFiles can be read using Java's InputStream.

Looking at unix4j it seems it intended to use Java's FileInputStream but (a) Looks like it's not used at the moment (at least not by the grep command; (b) in muCommander, the InputStreams are not necessarily FileInputStreams.

This PR makes it possible for the grep command to operate on Java's InputStream.

@ahadas
Copy link
Contributor Author

ahadas commented Jul 18, 2020

@terzerm @benjwarner hi, could you please let me know what's your take on this before I proceed with unit tests and documentation?

@coveralls
Copy link

coveralls commented Jul 18, 2020

Coverage Status

Coverage decreased (-0.07%) to 69.893% when pulling 8f60ad6 on ahadas:streams into 4db49ef on tools4j:master.

@terzerm terzerm self-assigned this Jul 18, 2020
@terzerm terzerm self-requested a review July 18, 2020 23:56
Copy link
Member

@terzerm terzerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi

This is not required.

Unix4j supports redirection of input and output with different built in types and the option to implement your own custom type.

Grepping from input stream can be done via

Unix4j.from(inputStream).grep("Subject", testFile).toStringResult());

@terzerm
Copy link
Member

terzerm commented Jul 19, 2020

closing as this is not required.

@terzerm
Copy link
Member

terzerm commented Jul 19, 2020

Unit test was added demonstrating use of non-file input stream at the end of
https://github.com/tools4j/unix4j/blob/master/unix4j-core/unix4j-command/src/test/java/org/unix4j/unix/GrepTest.java

@terzerm
Copy link
Member

terzerm commented Jul 19, 2020

Are you trying to support grep from multiple input files given as input streams ? That is not supported so I might reconsider

@terzerm terzerm reopened this Jul 19, 2020
@ahadas
Copy link
Contributor Author

ahadas commented Jul 19, 2020

@terzerm thanks for the quick feedback!
I thought about executing grep per-file (I don't really need the matched content but only to know if there's any match within the stream's content) - so I think I can follow your suggestion. Do you see any benefit in doing that on multiple streams? I guess I can do that with -l and then compare the files that were passed with the output - but that seems like extra work that I'm not sure that would worth it.
Btw, thinking about the -l option, I wonder if there's an optimization in the code of the grep command that stops searching the content after the first match when this flag is set, is there? maybe it worth setting that even for a single file in that case

@terzerm
Copy link
Member

terzerm commented Jul 20, 2020

Hi

Definitely the matching files option is expected to be faster --- in particular if a file has many matching lines. You can see this in WriteFilesWithMatchingLinesProcessor --- early return if first match found. I would do a small test but unix4j has generally been performance tested and performs reasonably well. In some cases even faster than the unix original :-)

@terzerm
Copy link
Member

terzerm commented Jul 20, 2020

FYI, support for multiple inputs has been added to the following commands: cat, grep, head, sort, tail, wc

You can either pass StreamInput instances to the newly added method or you create your own Input implementation for the AbstractFile class mentioned above (best with a good to-string implementation for commands that print the input name).

See unit tests for examples.

Change added as per 77f1847

@terzerm terzerm closed this Jul 20, 2020
@ahadas
Copy link
Contributor Author

ahadas commented Jul 21, 2020

Thanks @terzerm , I'll try :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants