Updating digest with web crypto
For some context, I'm updating some code that was importing from
std/hash
(specifically std/hash/sha256
) to use std/crypto
since the hash submodule is long gone.
I've run up against a case where I'm not really sure what I should do to move forward. The code looks something like this:
Is this possible to handle in the web crypto standard? I'm pretty sure I should just be able to aggregate the chunks into one massive uint8array or something but I feel like there is no reason to buffer the whole thing into memory.
Any ideas?13 Replies
@iuioiua as the main contributor to the removal of
std/hash
, do you have any ideas? I tried buffering the input but this resulted in a roughly ~10x slowdown so not really ideal.What code are you currently using that’s causing the slowdown?
It looks like the bulk of the work lies within that
for await
loop. If that second conditional wasn't there, you could just do the following because .digest
accepts AsyncIterable
s:
If that conditional is required, I'd:
1. Create one Uint8Array
from the AsyncIterable
using BytesList
from std/bytes
(https://deno.land/std/bytes/mod.ts?s=BytesList), specifically the .add()
and .concat()
.
2. Pass the resulting data into crypt.subtle.digest()
.I haven't looked into the impl of BytesList. Would this be more efficient than the code I presented?
The conditional is neccessary, also this wouldn't work for other reasons because the example I gave is a drastic oversimplification of the current implementation I am replacing
AFAIK, using
digest.update()
in your initial snippet loads the data into memory behind the scenes anyway, so having a Uint8Array
might be fine to do.
Based on if having a single Uint8Array
is fine, performance-wise, yeah, using either BytesList
or concat()
from std/bytes
would be efficient.afaik, it definitely did not. Here was the std implementation:
I'm really surprised no one as far as I can tell has run into the issue of hashing a large file. That seems like a pretty obvious use case?
To clarify on my statement that it did not load more data into memory, sha256 works by having some blocks behind the scenes. These blocks hold a certain amount of bytes each and then some magic bit shifting allows you to continuously add on to the hash.
It's a fixed sized hash function so even without looking at the implementation this is how I would imagine it working
I will see if the perf of this is any good. I will return with my findings.
std/crypto
wouldn't have any issues hashing a large file if simply passed as an AsyncIterator
, which would be the usual case. This case becomes more interesting because of the conditional.
Ok, I see what you might mean.
If you're code is publicly available, I'd love to take a look.It's an OSS project. Here's the PR I'm working on https://github.com/teaxyz/cli/pull/312 (probably should have linked this earlier 😛).
https://github.com/teaxyz/cli/blob/c908653b66df4a8f89b654fd357b5fb2bbfe977b/src/hooks/useDownload.ts#L112
specifically it's the
download_with_sha
function in useDownload.ts
Ok, what's the simplest version of that function that demonstrates the performance regression? I'd like to replicate it better on my end. Can you provide a pure function?
Let me try to get something hold on
Hm, so from my messing around with it, it looks like web crypto is obviously much faster than the std implementation. That being said, if the file is bigger than a certain size (in the gigabytes), the std implementation continues working while the web crypto implementation crashes because v8 restricts the memory usage.
I think in this case we'll just have to pin the hash to an older std version and maybe come back to fixing this more elegantly another day
If
std/crypto
currently doesn’t handle large streams well, we should open an issue
WDYT, @iobdas?Will attempt a proper implementation @tea soon and report back
for posterity’s sake please check out https://discord.com/channels/684898665143206084/684898665151594506/1087454360243540062
also I did implement it in tea using this method