how to hash very large files
We have deprecated the old std/hash code in favor of the crypto module, but I do not know what the standard approach to hashing something piecemeal looks like. The advice around the crypto digest generally looks like this:
this works fine on smaller files, but I need to occasionally hash a big file, like 2GB files. This is a lot of data to hold in memory, and if I could leverage readable streams that would be ideal. If I need to do that, should I simply feed the
hash_buffer
back into itself, concatenating new data as I go?8 Replies
See the conversation here https://discord.com/channels/684898665143206084/1064614652081885394/1064614652081885394
I had the same issue, the conclusion is I'm not quite sure how to do this with current web specs
hmm well its a question for everyone using the web, not just deno so there will definitely be an answer eventually
Ill keep researching. Imo a readable stream interface in std/crypto would add a lot of value though
I believe webcrypto allows streaming of some sort but I forget
I found this, which seems to say it cant https://github.com/w3c/webcrypto/issues/73
I think heres the crux of it. I can make a very simple function that hashes my file chunkwise, appending files as I go. I will have consistency within my app, because I will always hash it the same way. However, that will not be the same standard sha256 that I might get out of the linux command line, or any number of tools. I need to reimplement the standard sha256 hash algorithm for chunks, which is going to take a bit of doing
that seems to be the case, yeah
unfortunate when you just want to verify a checksum though
looks like https://github.com/wintercg/proposal-webcrypto-streams could be the solution to this problem long-term
GitHub
GitHub - wintercg/proposal-webcrypto-streams
Contribute to wintercg/proposal-webcrypto-streams development by creating an account on GitHub.
unfortunate that
std/hash
was removed before all usecases were fully consideredwell, I suppose there is nothing broken in the older hashing code, its just wasm vs native so a tad slower
https://deno.land/std@0.160.0/hash/mod.ts
Ill probably just use this for now
fwiw, web devs outside of deno have been feeling this pain as well https://lists.w3.org/Archives/Public/public-webcrypto/2016Nov/0000.html