andykais
andykais2y ago

how to hash very large files

We have deprecated the old std/hash code in favor of the crypto module, but I do not know what the standard approach to hashing something piecemeal looks like. The advice around the crypto digest generally looks like this:
async function crypto_hash(filepath: string) {
const file_buffer = await Deno.readFile(filepath)
const hash_buffer = await crypto.subtle.digest('SHA-256', file_buffer)
const hash_array = Array.from(new Uint8Array(hash_buffer))
const hash_hex = hash_array.map((b) => b.toString(16).padStart(2, '0')).join('')
return hash_hex
}
async function crypto_hash(filepath: string) {
const file_buffer = await Deno.readFile(filepath)
const hash_buffer = await crypto.subtle.digest('SHA-256', file_buffer)
const hash_array = Array.from(new Uint8Array(hash_buffer))
const hash_hex = hash_array.map((b) => b.toString(16).padStart(2, '0')).join('')
return hash_hex
}
this works fine on smaller files, but I need to occasionally hash a big file, like 2GB files. This is a lot of data to hold in memory, and if I could leverage readable streams that would be ideal. If I need to do that, should I simply feed the hash_buffer back into itself, concatenating new data as I go?
8 Replies
ioB
ioB2y ago
See the conversation here https://discord.com/channels/684898665143206084/1064614652081885394/1064614652081885394 I had the same issue, the conclusion is I'm not quite sure how to do this with current web specs
andykais
andykais2y ago
hmm well its a question for everyone using the web, not just deno so there will definitely be an answer eventually Ill keep researching. Imo a readable stream interface in std/crypto would add a lot of value though
ioB
ioB2y ago
I believe webcrypto allows streaming of some sort but I forget
andykais
andykais2y ago
I found this, which seems to say it cant https://github.com/w3c/webcrypto/issues/73 I think heres the crux of it. I can make a very simple function that hashes my file chunkwise, appending files as I go. I will have consistency within my app, because I will always hash it the same way. However, that will not be the same standard sha256 that I might get out of the linux command line, or any number of tools. I need to reimplement the standard sha256 hash algorithm for chunks, which is going to take a bit of doing
ioB
ioB2y ago
that seems to be the case, yeah unfortunate when you just want to verify a checksum though
ioB
ioB2y ago
looks like https://github.com/wintercg/proposal-webcrypto-streams could be the solution to this problem long-term
GitHub
GitHub - wintercg/proposal-webcrypto-streams
Contribute to wintercg/proposal-webcrypto-streams development by creating an account on GitHub.
ioB
ioB2y ago
unfortunate that std/hash was removed before all usecases were fully considered
andykais
andykais2y ago
well, I suppose there is nothing broken in the older hashing code, its just wasm vs native so a tad slower https://deno.land/std@0.160.0/hash/mod.ts Ill probably just use this for now fwiw, web devs outside of deno have been feeling this pain as well https://lists.w3.org/Archives/Public/public-webcrypto/2016Nov/0000.html