-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Buggy" initial scan? #1
Comments
Hi @ExSport, Regarding the "1 changed" - this is the root share folder itself. EFUTool uses the folder timestamp to determine if the contents are changed or not. However, a share has no timestamp, so EFUTool always considers the root folder to be changed to force a rescan/refresh of the root contents. That's why it will always say "1 changed". If your root folder is not a share (like "\\server\share\folder") then a valid timestamp will exist and you will not see this behavior. Your rescan times are also a bit strange: the 12s one seems to indicate some caching going on (on the share server). I would also expect subsequent scans to be faster than the first scan, but they're not (except that 12s one). EFUTool is really great for huge volumes (I test it on shares with millions of files and about 100K folders) - perhaps it's not so fast for your particular setup. Let me know if you find any other issues - Github notifications are enabled, but add a mention to @zybexXL just in case. |
Hello @zybexXL
Thanks for looking into it 👍 |
Hi @ExSport, You are right, the root folder is ALWAYS marked as new/changed. I did this because of DFS links (mountpoints) which are common in server environments. DFS links have a static timestamp which does not change even when contents of the root folder change, and it's not trivial to detect if a given folder is a DFS link or not (there's no attribute indicating it). To work around it and since DFS links are mostly used as root folders/shares, I've decided to always consider the root folders as "changed". I've published v1.07 which fixes the trailing slash issue. You would see "2 changed" because the app would count both the root folder you indicated on the command line, plus the root folder already present on the existing EFU file - in the end they would be found to be the same folder (so scan results are correct), but still counted 2 changes. In practice, when doing re-scans, you don't need to specify which folder you are scanning - the app gets that from the existing EFU file already. Regarding scan times - a compressed volume forces the OS to extract an entire directory even when requesting only a single entry, so when EFUTool scans for changes, all directories have to be de-compressed (by the OS), so I'm guessing that negates any speed advantage of the change-detection algorithm. The filesize difference you saw... are you sure that isn't real? Maybe some app added/deleted a file in one of the subfolders of that path between your scans. Sometimes just going to a folder with images causes Windows to create/update a hidden thumbs.db file - you should see this change in your EFU comparison too though. Br, |
Hi @zybexXL
About the speed. Ok, I hoped there will be some glitch so 20seconds after the fix will be the default behavior but it seems due to some weird glitch user/I got 20sec rescan times what is not good, like in case intermittent network problem during EFUTool rescan what led to skipping "something" what created so short scans. |
@ExSport, thanks for all the feedback! You do have an environment which is not ideal for performance... Wifi+VPN+Compression... :) We can't easily know what's going on. The 20sec rescan time is what I would expect on normal circumstances (7x faster than first scan), but there's nothing in the code that would make a 3rd scan different. Getting DFS info for each folder would slow down things too much. NetDfsGetClientInfo is fast, but only returns info already stored on the local client cache - if a given server folder is not in cache yet, then there's no info. |
@zybexXL |
@ExSport It sounds like there's some TMP file being created on the target volume (taking 1 or 2 clusters), which sometimes happens during scanning. You should see this file on your EFU compares though. I suggest you run the same stress test with a simple "dir /s \target\share > someLogFile.txt" and see if you spot the same difference in folder sizes/contents. This is basically what EFUTool is doing :) |
@zybexXL As stated above, there is no new TMP file indexed in EFU file, no timestamp change, only incorrect TOTAL (sometimes EFU file have 2 incorrect TOTALS, other time zero or ten. It is unpredictable). I already did this dir traversing tests during stress tests I described in the morning. There was no difference. I run it on two different servers, at time when EFU file generated incorrect TOTALS on one server but on the other server it was OK, also in such case when EFU files differ between servers, DIR log was exactly same between servers. No size change, file change, timestamp, anything. Totally weird :) Powershell or cmd is my daily job so I did a lot of stress tests to see what is happening but unfortunately some tests are so slow so they are useless and when I reduced the scope, it was hard to reproduce it. Tried it locally where speed is ok, but I was not able to reproduce it. Btw. how you get TOTALS for folders? Nevermind, we are on GitHub...C#...as I said you count files to get folder size :) So again, if you understood me bad. It is not problem the size of files you get via OS API, problem sometimes arise for TOTALS which are calculated for folders. Unfortunately to reproduce it is a nightmage :)) Thx for support.... Btw. don't lose much time with it, it is nothing critical. I reported it only as I spotted such behavior during few tests (I don't use your tool right now for anything, only wanted to test speed difference between your tool and Everything 👍) |
@ExSport If the inputs are always the same, then the outputs MUST always be the same. This is what I mean by "deterministic". There are only 3 ways to get different results on different scans:
Perhaps you can send me a couple of EFU files with those differences? |
@zybexXL
I will try to reproduce it again on some general data and upload it. |
@zybexXL
|
@ExSport When asking windows for a folder info (FindFirstFileW/FindNextFileW) the returned size for directories SHOULD be zero. I was assuming that is the case, so I was using the returned value as the initial folder size. However, it is possible that Windows API is not always resetting that field to zero, and sometimes returning 4K/8K/whatever. In v8, I'm now ignoring the returned folder size and setting it manually to zero. Let me know if you can still reproduce it. |
@zybexXL that is exactly what I expected it is happening. That sometimes API will return nonzero value for folders. I even wrote post about such possible behaviour but then deleted it as meanwhile I tested it in powershell and saw the size property is not initialized at all :) |
@zybexXL |
Great! Thanks for finding the issue and sticking with it until we solved it. |
Hello
Tried your tool and spotted strange behavior:
First RUN it is initial SCAN, all others are RESCAN.
The first oddity is that all rescans shows one difference but EFU file is totally same all the time:
Second oddity is that first rescan for some reason was very fast (12 sec compared to 2-3mins) with weird statistics.
It seems initial scan created smaller EFU file with not all files.
When doing other rescans, the CONTENTS is consistent:
So I tried whole procedure again and confirmed that initial scan creates file with same number of lines like rescan but not all folder attributes are populated or fully counted.
Now, the first rescan took about 2.5 minutes. Unfortunately I didn't checked EFU file at time it took 12 secs only and all other retries of initial scan+rescan took standard 2-3mins.
EFU initial files are exactly same during retries. Same for rescanned/updated files (but little different from initial scan).
Initial Scan (example)
Rescan (example)
Thanks for looking into it. 👍
The text was updated successfully, but these errors were encountered: