-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU usage #34
Comments
My experimentation shows a much smaller difference between It's expected that the more metadata there is in a zipfile, the more the "control code" written in JavaScript will slow down the processing. The more data there is in large compressed files, then the more processing will be happening in C (specifically the zlib module) for So to test how much room there is for improvement, we can examine the two ends of the spectrum. Let's begin with a test case where we start with 1GB of 0's and compress it as a single file into a .zip archive. Extracting that should have a trivial amount of metadata, and most of the time should be spent in the zlib module inflating all those 0's.
Next, we can use random bits instead of all 0's. We'll skip the zlib compression, since we can guess that it will consistently fallback to the literal encoding of data, so that won't be very interesting. Let's just use the STORE mode in the .zip file to skip zlib entirely. Since this zip file also has minimal metadata, we expect that yauzl's logic will play a trivial role. Most of the work should be done in node's fs module for piping a file read stream to a file write stream.
Now we get to the other end of the spectrum, which is much more grim. Let's create a zipfile comprized of 100000 empty files. Extracting these files is mostly an exercise in metadata parsing (and file system metadata writing), which means the difference between JavaScript and C will be quite noticeable.
Between these two ends of the spectrum is some (mostly) monotonic function of the ratio of C performance to JavaScript performance. However, system capabilities and limitations (such as SSD vs HDD, RAM availability, and even node version and configuration) can affect what this curve looks like. The zipfile usecase you provided gives us a point in the middle of the spectrum where we can evaluate the ratio of performances. It appears that for you, the ratio is about 2:1, and for me the ratio is closer to 6:5. And now that I've said all that and tried to sound educated, I must admit I have no idea why there's a bigger performance difference on your machine than on mine. All I know for sure is that when I designed yauzl's code, I tried to make sure not to waste any CPU cycles doing something that could have easily been avoided (extensive API argument validation, for example). I tried to design yauzl with (reasonably) optimal performance, and I knew I wouldn't be able to compete with C for metadata-heavy test cases. But since yauzl does so well for the cases where the metadata is minimal, I think it's a good sign that I didn't do anything stupid, like buffer entire files in RAM or copy everything to a temporary file for no reason, or anything like that. I'm sure there are countless tiny optimizations that could be made by taking advantage of the subtleties of JavaScript and v8, but I don't feel that pouring effort in that direction will be worthwhile. So I guess my summary of all this and my direct answer to your question is: No, I don't think there's any room for improvement. However, I've got an open mind, and if someone has a pull request that demonstrates a performance boost, I will be thrilled! |
Thank you very much for the extensive explanation, I wasn't expecting it. The more the control code in JavaScript, the more slowdown, so it's about how many files are to be decompressed, because for every file contained in the zip, a piece of JavaScript code is executed. So, if there was a plain extraction method Would that additional approach be possible? Thanks again. |
I suppose someone could write an unzip implementation in C and provide JavaScript bindings for it. That would speed things up pretty much optimally. That's not the goal of yauzl though, so I don't think I will do that. You may be able to find bindings like that already out there somewhere. And of course you could always shell out to the For yauzl, I want to keep everything in JavaScript. |
Hi!
It seems that at last I found an unzip library that doesn't leak memory, so thank you.
Having said that, I'm observing that
yauzl
vsunzip
command-line utility takes twice the amount of CPU time to decompress an identical ZIP. The code used is the one provided in README by you.The ZIP file is public: https://downloads.mariadb.org/f/mariadb-10.1.16/winx64-packages/mariadb-10.1.16-winx64.zip
While
unzip
takes 16 second of CPU,yauzl
takes 33.Is there any room for improvement?
Thank you,
Albert
The text was updated successfully, but these errors were encountered: