D
Deno

help

Updating digest with web crypto

IioB1/16/2023
For some context, I'm updating some code that was importing from std/hash (specifically std/hash/sha256) to use std/crypto since the hash submodule is long gone. I've run up against a case where I'm not really sure what I should do to move forward. The code looks something like this:
const digest = new Sha256()

for await (const chunk of generate.chunks()) {
digest.update(chunk)
}

if(some_condition) {
digest.update(bonus_chunk)
}

return digest.hex()
const digest = new Sha256()

for await (const chunk of generate.chunks()) {
digest.update(chunk)
}

if(some_condition) {
digest.update(bonus_chunk)
}

return digest.hex()
Is this possible to handle in the web crypto standard? I'm pretty sure I should just be able to aggregate the chunks into one massive uint8array or something but I feel like there is no reason to buffer the whole thing into memory. Any ideas? @iuioiua as the main contributor to the removal of std/hash, do you have any ideas? I tried buffering the input but this resulted in a roughly ~10x slowdown so not really ideal.
Iiuioiua1/16/2023
What code are you currently using that’s causing the slowdown?
IioB1/16/2023
let buffer = new Uint8Array(0)

for await (const chunk of generate.chunks()) {
const appendedBuffer = new Uint8Array(buffer.length + buf.length)
appendedBuffer.set(buffer, 0)
appendedBuffer.set(buf, buffer.length)
buffer = appendedBuffer
}

if(some_condition) {
// same as above
}

return toHashString(await crypto.subtle.digest("SHA-256", buffer))
let buffer = new Uint8Array(0)

for await (const chunk of generate.chunks()) {
const appendedBuffer = new Uint8Array(buffer.length + buf.length)
appendedBuffer.set(buffer, 0)
appendedBuffer.set(buf, buffer.length)
buffer = appendedBuffer
}

if(some_condition) {
// same as above
}

return toHashString(await crypto.subtle.digest("SHA-256", buffer))
Iiuioiua1/16/2023
It looks like the bulk of the work lies within that for await loop. If that second conditional wasn't there, you could just do the following because .digest accepts AsyncIterables:
return toHashString(await crypto.subtle.digest("SHA-256", generate.chunks()));
return toHashString(await crypto.subtle.digest("SHA-256", generate.chunks()));
If that conditional is required, I'd: 1. Create one Uint8Array from the AsyncIterable using BytesList from std/bytes (https://deno.land/std/bytes/mod.ts?s=BytesList), specifically the .add() and .concat(). 2. Pass the resulting data into crypt.subtle.digest().
IioB1/16/2023
I haven't looked into the impl of BytesList. Would this be more efficient than the code I presented? The conditional is neccessary, also this wouldn't work for other reasons because the example I gave is a drastic oversimplification of the current implementation I am replacing
Iiuioiua1/16/2023
AFAIK, using digest.update() in your initial snippet loads the data into memory behind the scenes anyway, so having a Uint8Array might be fine to do. Based on if having a single Uint8Array is fine, performance-wise, yeah, using either BytesList or concat() from std/bytes would be efficient.
IioB1/16/2023
afaik, it definitely did not. Here was the std implementation:
/** Update hash
*
* @param message The message you want to hash.
*/
update(message: Message): this {
if (this.#finalized) {
return this;
}

let msg: string | number[] | Uint8Array | undefined;
if (message instanceof ArrayBuffer) {
msg = new Uint8Array(message);
} else {
msg = message;
}

let index = 0;
const length = msg.length;
const blocks = this.#blocks;

while (index < length) {
let i: number;
if (this.#hashed) {
this.#hashed = false;
blocks[0] = this.#block;
blocks[16] =
blocks[1] =
blocks[2] =
blocks[3] =
blocks[4] =
blocks[5] =
blocks[6] =
blocks[7] =
blocks[8] =
blocks[9] =
blocks[10] =
blocks[11] =
blocks[12] =
blocks[13] =
blocks[14] =
blocks[15] =
0;
}

if (typeof msg !== "string") {
for (i = this.#start; index < length && i < 64; ++index) {
blocks[i >> 2] |= msg[index] << SHIFT[i++ & 3];
}
} else {
for (i = this.#start; index < length && i < 64; ++index) {
let code = msg.charCodeAt(index);
if (code < 0x80) {
blocks[i >> 2] |= code << SHIFT[i++ & 3];
} else if (code < 0x800) {
blocks[i >> 2] |= (0xc0 | (code >> 6)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
} else if (code < 0xd800 || code >= 0xe000) {
blocks[i >> 2] |= (0xe0 | (code >> 12)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 6) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
} else {
code = 0x10000 +
(((code & 0x3ff) << 10) | (msg.charCodeAt(++index) & 0x3ff));
blocks[i >> 2] |= (0xf0 | (code >> 18)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 12) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 6) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
}
}
}

this.#lastByteIndex = i;
this.#bytes += i - this.#start;
if (i >= 64) {
this.#block = blocks[16];
this.#start = i - 64;
this.hash();
this.#hashed = true;
} else {
this.#start = i;
}
}
if (this.#bytes > 4294967295) {
this.#hBytes += (this.#bytes / 4294967296) << 0;
this.#bytes = this.#bytes % 4294967296;
}
return this;
}
/** Update hash
*
* @param message The message you want to hash.
*/
update(message: Message): this {
if (this.#finalized) {
return this;
}

let msg: string | number[] | Uint8Array | undefined;
if (message instanceof ArrayBuffer) {
msg = new Uint8Array(message);
} else {
msg = message;
}

let index = 0;
const length = msg.length;
const blocks = this.#blocks;

while (index < length) {
let i: number;
if (this.#hashed) {
this.#hashed = false;
blocks[0] = this.#block;
blocks[16] =
blocks[1] =
blocks[2] =
blocks[3] =
blocks[4] =
blocks[5] =
blocks[6] =
blocks[7] =
blocks[8] =
blocks[9] =
blocks[10] =
blocks[11] =
blocks[12] =
blocks[13] =
blocks[14] =
blocks[15] =
0;
}

if (typeof msg !== "string") {
for (i = this.#start; index < length && i < 64; ++index) {
blocks[i >> 2] |= msg[index] << SHIFT[i++ & 3];
}
} else {
for (i = this.#start; index < length && i < 64; ++index) {
let code = msg.charCodeAt(index);
if (code < 0x80) {
blocks[i >> 2] |= code << SHIFT[i++ & 3];
} else if (code < 0x800) {
blocks[i >> 2] |= (0xc0 | (code >> 6)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
} else if (code < 0xd800 || code >= 0xe000) {
blocks[i >> 2] |= (0xe0 | (code >> 12)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 6) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
} else {
code = 0x10000 +
(((code & 0x3ff) << 10) | (msg.charCodeAt(++index) & 0x3ff));
blocks[i >> 2] |= (0xf0 | (code >> 18)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 12) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | ((code >> 6) & 0x3f)) << SHIFT[i++ & 3];
blocks[i >> 2] |= (0x80 | (code & 0x3f)) << SHIFT[i++ & 3];
}
}
}

this.#lastByteIndex = i;
this.#bytes += i - this.#start;
if (i >= 64) {
this.#block = blocks[16];
this.#start = i - 64;
this.hash();
this.#hashed = true;
} else {
this.#start = i;
}
}
if (this.#bytes > 4294967295) {
this.#hBytes += (this.#bytes / 4294967296) << 0;
this.#bytes = this.#bytes % 4294967296;
}
return this;
}
I'm really surprised no one as far as I can tell has run into the issue of hashing a large file. That seems like a pretty obvious use case? To clarify on my statement that it did not load more data into memory, sha256 works by having some blocks behind the scenes. These blocks hold a certain amount of bytes each and then some magic bit shifting allows you to continuously add on to the hash. It's a fixed sized hash function so even without looking at the implementation this is how I would imagine it working I will see if the perf of this is any good. I will return with my findings.
Iiuioiua1/16/2023
std/crypto wouldn't have any issues hashing a large file if simply passed as an AsyncIterator, which would be the usual case. This case becomes more interesting because of the conditional. Ok, I see what you might mean. If you're code is publicly available, I'd love to take a look.
IioB1/16/2023
It's an OSS project. Here's the PR I'm working on https://github.com/teaxyz/cli/pull/312 (probably should have linked this earlier 😛). https://github.com/teaxyz/cli/blob/c908653b66df4a8f89b654fd357b5fb2bbfe977b/src/hooks/useDownload.ts#L112 specifically it's the download_with_sha function in useDownload.ts
Iiuioiua1/16/2023
Ok, what's the simplest version of that function that demonstrates the performance regression? I'd like to replicate it better on my end. Can you provide a pure function?
IioB1/16/2023
Let me try to get something hold on Hm, so from my messing around with it, it looks like web crypto is obviously much faster than the std implementation. That being said, if the file is bigger than a certain size (in the gigabytes), the std implementation continues working while the web crypto implementation crashes because v8 restricts the memory usage. I think in this case we'll just have to pin the hash to an older std version and maybe come back to fixing this more elegantly another day
Iiuioiua1/31/2023
If std/crypto currently doesn’t handle large streams well, we should open an issue WDYT, @iobdas?
IioB3/21/2023
Will attempt a proper implementation @tea soon and report back for posterity’s sake please check out https://discord.com/channels/684898665143206084/684898665151594506/1087454360243540062 also I did implement it in tea using this method

Looking for more? Join the community!

Recommended Posts
What would be the best third party module implementing `deflate()`?Currently using `npm:pako` but I'd like to avoid npm specifiers (modules in general).Pass string to struct FFIQuick question about Struct-Value feature, what would be the best way to pass `*const u8` (or stringdeno blog without html sanitizationI want to be able to javascript Canvas API things in my blog - surely this is possible?Killing subprocess not workingAnyone know what I'm doing wrong when trying to kill a process? I have a process start another procPossible to check of a file is being run as worker?```ts if (import.meta.main) { main() } ``` is there a way to disable calling `main()` if the file Deno linter is telling me that writeAll from streams conversion is deprecatedIs import { writeAll } from "https://deno.land/std@0.171.0/streams/conversion.ts" deprecated?How to get Deno's default compilerOptions value?I want to add more compilerOptions.lib options like this: ``` "compilerOptions": { "jsx": "readeno.run not working as expectedso when i run `mysql --defaults-extra-file=./tmpcnf_from_f_execute_query.cnf -e "DROP DATABASE test_Why doesn't latest Deno standard library for encoding CSV files support StringifyOptions?I need to remove headers see: https://deno.land/std@0.167.0/encoding/csv.ts?s=stringifyIs there any way to trace where permissions prompts are coming from?I'm running some code through Deno and its prompting for some permissions. How can I find out where modifying the Deno objectheyo. is there any way to add our own properties to the `Deno` object? I read something which said sWhat postgres module should I use?I've been using `/x/postgres` for almost a year but I see that `/x/postgresjs` has been updated more