Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omindex: delay libmagic checks(ticket#743) #153

Closed
wants to merge 1 commit into from

Conversation

caiyulun
Copy link
Contributor

Link to the Ticket: https://trac.xapian.org/ticket/743

the libmagic call to get mime type is expensive, so we should check it later, for example, we can check the size by call the stat, check the timestamps and the DB for an existing entry before call libmagic.

This PR now achieves to check the filesize before call the libmagic.

To solved next:

since check the timestamps of a file need to check through the database in some situation: when the timestamps of the file return from stat call (says T1 for convenience) is older than the last time we run omindex, we need to look through the database to determine if the file is indexed and get the modify time when it is indexed (says T2 for convenience), if T2 is older than T1, we should re-index the file.

it is expensive to iterate through the DB, so we should figure out the order between libmagic check and timestamps in next step.

the libmagic call to get mime type is expensive than
call the stat to get filesize, so we can check the size
by call the stat before call libmagic.
@ojwb
Copy link
Contributor

ojwb commented Apr 26, 2017

Merged to git master as d32e135, thanks.

@ojwb ojwb closed this Apr 26, 2017
@@ -133,6 +120,20 @@ index_file(const string &file, const string &url, DirectoryIterator & d,
return;
}

// if can't get the mime type from extension,call libmagic to get it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small but quite important point - you should aim to make comments clear and unambiguous, and getting the grammar and punctuation helps that (though I realise that's often harder for non-native speakers).

So better to capitalise the sentence, give the verb a subject ("we can't" rather than "can't"), put a space after the comma, and a full stop (period) at the end. I've pushed a commit to fix these in this case.

@barufa barufa mentioned this pull request Apr 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants