Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grepping the contents of thousands of emails. #140

Closed
lawlist opened this issue Feb 16, 2017 · 6 comments
Closed

Grepping the contents of thousands of emails. #140

lawlist opened this issue Feb 16, 2017 · 6 comments

Comments

@lawlist
Copy link

lawlist commented Feb 16, 2017

Q: How can I grep thousands of emails within Wanderlust while avoiding the dreaded error of "Too many arguments" or finding zero matches (when there should be one or more)?

Wanderlust (and related elmo libraries) support searching the contents of emails by using grep: http://wanderlust.github.io/wl-docs/wl.html#grep . This works well for a small number of emails in the directory to be searched, but does not work when there are thousands of files in the directory -- i.e., zero results are returned even though there should have been one or more hits.

http://emacs.stackexchange.com/questions/30769/wanderlust-how-to-grep-the-contents-of-thousands-of-emails


A: The limitation of the number of files is due to the number of arguments grep can accept before throwing an error of "Too many arguments". Wanderlust and elmo related libraries are filled with condition-case statements that mask errors and make troubleshooting extremely time consuming. The default configuration for grep uses the function elmo-search-grep-target, which creates a listing of files -- sometimes too many. In researching this answer, I found it helpful to inspect the functions elmo-search-engine-do-search and elmo-map-folder-list-message-locations, the latter of which when supplied with a handy (message ...) told me that the problem was due to "Too many arguments". This error message led me to an answer by Barmar suggesting to use the recursive grep feature and supply a directory, rather than a zillion files. The upshot is that I had also been wanting to enable recursive grepping, which the default configuration did not provide. So, we create a new function called elmo-search-rgrep-target and add the r argument to the grep search. The method of searching is essentially the same, from with the Summary or Folder buffer, type the letter "g" and then enter into the minibuffer something like [hello-world]/path/to/be/recursively/searched!grep

(defun elmo-search-rgrep-target (engine pattern)
  (let ((dirname (expand-file-name (elmo-search-engine-param-internal engine))))
    dirname))

;;; Setup `elmo-search-engine-alist'
(unless noninteractive
  (or (assq 'namazu elmo-search-engine-alist)
      (elmo-search-register-engine
        'namazu 'local-file
        :prog "namazu"
        :args '("--all" "--list" "--early" pattern elmo-search-namazu-index)
        :charset 'iso-2022-jp))
  (or (assq 'mu elmo-search-engine-alist)
      (elmo-search-register-engine
       'mu 'local-file
       :prog "/path/to/executable/mu"
       :args '("find" pattern "--fields" "l" "--muhome=/path/to/muhome/.mu")
       :charset 'utf-8))
  (or (assq 'grep elmo-search-engine-alist)
      (elmo-search-register-engine
       'grep 'local-file
       :prog "grep"
       ;; :args '("-l" "-e" pattern elmo-search-grep-target)
       :args '("-rle" pattern elmo-search-rgrep-target))))
@ikazuhiro
Copy link
Member

Recursive grep returns the different result when the target directory has subdirectories. How about using find program with grep?

(defun elmo-search-rgrep-target (engine pattern)
  (expand-file-name (elmo-search-engine-param-internal engine)))

(elmo-search-register-engine
 'grep 'local-file
 :prog "find"
 :args '(elmo-search-rgrep-target
	 "-type" "f" "-exec" "grep" "-l" "-e" pattern "/dev/null" "{}" "+"))

@lawlist
Copy link
Author

lawlist commented Mar 3, 2017

I was not able to achieve reliable results with the plus sign at the end, which I am assuming is producing a list of results. In some cases zero results were found (when there should have been a couple of hundred), whereas in other cases some results were obtained. Using a semicolon, however, worked well for my primary test case. A semicolon produces a line by line result. Here is a link to the stackoverflow thread that I consulted: http://stackoverflow.com/a/6085237/2112489

@ikazuhiro
Copy link
Member

I think I don't understand your result. Do you mean below commands return different results?

$ find /path/to/target/directory -type f -exec grep -l -e 'pattern' '{}' +
$ find /path/to/target/directory -type f -exec grep -l -e 'pattern' '{}' \;

I know the difference between + and ;, but it should be unrelated with the result and using ; causes severe performance disadvantage.

@lawlist
Copy link
Author

lawlist commented Mar 4, 2017

It may be a problem with this particular version of find and grep on OSX 10.6.8. I used a last name of one person that I know is distinct, and ran both commands in the terminal:

The following yields zero results:

find /Users/HOME/.0.data/.0.emacs/.0.elmo/Maildir/.0.offlineimap/INBOX/cur/ -type f -exec grep -l -e 'glickstein' '{}' +

The following yields a couple hundred results:

find /Users/HOME/.0.data/.0.emacs/.0.elmo/Maildir/.0.offlineimap/INBOX/cur/ -type f -exec grep -l -e 'glickstein' '{}' \;

I realize this is outside the scope of Emacs since the results are different in a terminal without using Emacs. It may be a bug in either this older version of find or this older version of grep, or both.

@ikazuhiro
Copy link
Member

I've added new search method rgrep and related documentation. rgrep mrthod calls grep with -r option.

@lawlist
Copy link
Author

lawlist commented Mar 13, 2017

Thank you very much -- greatly appreciated! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants