carragom
carragom•6mo ago

How to go from a JS string[] to a C array of pointers

Hi, Trying to call a C function with FFI but struggling to covert a string[] to an array of pointers. Lets asume the following C function
void test_multi(int argc, const char * const *argv) {
for (int i = 0; i < argc; i++) {
puts(argv[i]);
}
}
void test_multi(int argc, const char * const *argv) {
for (int i = 0; i < argc; i++) {
puts(argv[i]);
}
}
And using some JS code I can try to call it
const enc = new TextEncoder()
const test = Deno.dlopen('./test.so'), {
test_multi: {
parameters: ['i32', 'buffer'],
result: 'void'
}
})

const params = [ 'ASD', 'ZXC' ]
test.symbols.test_multi(params.length, ??????)
const enc = new TextEncoder()
const test = Deno.dlopen('./test.so'), {
test_multi: {
parameters: ['i32', 'buffer'],
result: 'void'
}
})

const params = [ 'ASD', 'ZXC' ]
test.symbols.test_multi(params.length, ??????)
Is buffer the right type for that parameter I better us pointer?. And finally how to convert the params array to something that can be sent to the C function. Appreciate the help.
13 Replies
Deno AI Helper
Deno AI Helper•6mo ago
I wasn't able to find the answer to your question in the official Deno documentation. However, based on the code snippet you provided, I noticed a couple of potential issues that you might want to address: 1. In the JavaScript code, there is a syntax error in the dlopen call. The test object is being initialized with a trailing comma instead of a closing parenthesis. It should be:
const test = Deno.dlopen('./test.so', {
test_multi: {
parameters: ['i32', 'buffer'],
result: 'void'
}
});
const test = Deno.dlopen('./test.so', {
test_multi: {
parameters: ['i32', 'buffer'],
result: 'void'
}
});
2. In the test.symbols.test_multi call, param.length should be params.length to match the name of the params array. Regarding your question about whether to use buffer or pointer for the FFI type, and how to convert the params array to a format that can be sent to the C function, this is not directly covered in the Deno documentation. However, typically for FFI calls, you would need to allocate a buffer for each string, convert each string to a C-style string (null-terminated), and then create an array of pointers to these strings. Please note that this advice is not based on the Deno documentation, and you should double-check the accuracy of this advice. For the most accurate and up-to-date information, you may want to consult the Deno FFI documentation or seek help from the Deno community.
AapoAlas
AapoAlas•6mo ago
GitHub
libclang_deno/lib/utils.ts at main · aapoalas/libclang_deno
Deno FFI bindings for libclang. Contribute to aapoalas/libclang_deno development by creating an account on GitHub.
AapoAlas
AapoAlas•6mo ago
The basic idea is this: Create a buffer that is large enough to hold params.length pointers plus all of the param strings encoded. Then encode the strings into the buffer and write in pointers that point to the start of each string (so essentially self-references; the buffer contains pointers that point within the buffer).
AapoAlas
AapoAlas•6mo ago
GitHub
libclang_deno/lib/mod.ts at 5d6d8ed33627c6f12949efc15c3f7ccf07f26b3...
Deno FFI bindings for libclang. Contribute to aapoalas/libclang_deno development by creating an account on GitHub.
carragom
carragom•6mo ago
Thanks a lot !!! This was driving me crazy. I think it's safe to say that this a complex task to get right and there should be a helper function in the std library to deal with this conversion. This looks like a like a very common data structure to find as function parameter In order to understand your code I wrote a function based on it. It should achieve the same and should handle multi-byte characters. But do mind the possible bugs. If you find anything weird with it let me know. Here it's
function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into byte arrays appending null byte
const encoded = strings.map((str) => enc.encode(str + '\0'))
// Total bytes required by encoded strings
const encodedLength = encoded.reduce((acc, cur) => acc + cur.byteLength + 1, 0)
// Total bytes required by pointers
const pointersLength = encoded.length * 8

const buf = new Uint8Array(pointersLength + encodedLength)
const ptrSegment = new BigUint64Array(buf.buffer, 0, encoded.length)
const dataSegment = buf.subarray(pointersLength)

// Get a pointer to the start of the data segment
const dataPointer = BigInt(
Deno.UnsafePointer.value(Deno.UnsafePointer.of(dataSegment)),
)

let offset = 0
for (const [index, data] of encoded.entries()) {
dataSegment.set(data, offset) // Copy the data to the strings buffer
ptrSegment[index] = dataPointer + BigInt(offset) // Store the pointer the inserted data
offset += data.byteLength
}

return buf
}
function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into byte arrays appending null byte
const encoded = strings.map((str) => enc.encode(str + '\0'))
// Total bytes required by encoded strings
const encodedLength = encoded.reduce((acc, cur) => acc + cur.byteLength + 1, 0)
// Total bytes required by pointers
const pointersLength = encoded.length * 8

const buf = new Uint8Array(pointersLength + encodedLength)
const ptrSegment = new BigUint64Array(buf.buffer, 0, encoded.length)
const dataSegment = buf.subarray(pointersLength)

// Get a pointer to the start of the data segment
const dataPointer = BigInt(
Deno.UnsafePointer.value(Deno.UnsafePointer.of(dataSegment)),
)

let offset = 0
for (const [index, data] of encoded.entries()) {
dataSegment.set(data, offset) // Copy the data to the strings buffer
ptrSegment[index] = dataPointer + BigInt(offset) // Store the pointer the inserted data
offset += data.byteLength
}

return buf
}
AapoAlas
AapoAlas•6mo ago
This does suffer from doing two memory copies for the strings instead of one. Also you're doing N+1 memory allocations where N is the number of strings. You could calculate the encoded length from the string lengths times some constant. Any constant below 3 (IIRC) does mean that it's theoretically possible for the estimated encoded length to be too small, but that is fairly unlikely. This would mean no extra memory copy and only one allocation. If the buffer ends up being too small, you can detect that based on the encoder's return value (bytes read, bytes written) and create a new, larger buffer to replace the first one. This would be an extra memory copy but it's very, very unlikely to happen. Alternatively, you can just encode the strings one by one and create pointers to those resulting buffers. Then, allocate an array of pointers and write the pointers into that. Finally, use a WeakMap to bind the string-buffers to stay alive as long as the pointers-buffer is alive. This gets rid of the extra copy but retains the N+1 allocations.
carragom
carragom•6mo ago
Great insight. Extra allocations are definitively something to watch out for. I love the idea of creating pointers to the encoded chunks, seems to have the best of all worlds. Balancing unsafe memory with a WeakMap on the other hand feels like a very delicate procedure, but hey at this point everything feels a bit brittle. I will give a try and report back.
AapoAlas
AapoAlas•6mo ago
Deno internally does fun stuff with a WeakMap in Deno.UnsafePointer.of(arrayBuffer). This returns an opaque pointer object and we put it in as a key into a WeakMap with arrayBuffer as the value. The result is that as long as the pointer object lives, the buffer memory also lives.
carragom
carragom•6mo ago
Well that worked it's it's even simpler and more readable now. I would appreciate your keen eye on it. Specially the part about the WeakMap, I have no idea if that would be enough to prevent garbage collection.
function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into Uint8Array appending null byte
const chunks = strings.map((str) => enc.encode(str + '\0'))
const buf = new BigUint64Array(chunks.length)
const map = new WeakMap<Deno.PointerObject, Uint8Array>()

for (const [index, chunk] of chunks.entries()) {
const ptr = Deno.UnsafePointer.of(chunk)

if (ptr === null) {
// No idea how this could happen
throw new Error('Failed to create pointer')
} else {
// Map pointers to chunks to prevent GC from collecting them
map.set(ptr, chunk)
buf[index] = BigInt(Deno.UnsafePointer.value(ptr))
}
}

// Return the buffer as a Uint8Array
return new Uint8Array(buf.buffer)
}
function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into Uint8Array appending null byte
const chunks = strings.map((str) => enc.encode(str + '\0'))
const buf = new BigUint64Array(chunks.length)
const map = new WeakMap<Deno.PointerObject, Uint8Array>()

for (const [index, chunk] of chunks.entries()) {
const ptr = Deno.UnsafePointer.of(chunk)

if (ptr === null) {
// No idea how this could happen
throw new Error('Failed to create pointer')
} else {
// Map pointers to chunks to prevent GC from collecting them
map.set(ptr, chunk)
buf[index] = BigInt(Deno.UnsafePointer.value(ptr))
}
}

// Return the buffer as a Uint8Array
return new Uint8Array(buf.buffer)
}
I also have a couple of questions: 1. Will the return allocate new memory or reuse the underlying buffer? 2. Is there any advantage in using Uint8Array as return instead of just the BigUint64Array ?
AapoAlas
AapoAlas•6mo ago
Your WeakMap should be a map from your buf into chunks. Right now you're saying "until ptr exists, chunk must exist." Then you immediately let the ptr object go out of scope and potentially get garbage collected. The chunk is then free to be GC'd as well. Return is reusing the underlying buffer. And there is indeed an advantage to using Uint8Array as your go-to-choice whenever you pass data into FFI as "buffer": Due to reasons, Deno has to choose one buffer type to optimise for, and Uint8Array is the most versatile one we can choose so that one it is. As a result, with a trivial benchmark you should see perhaps even a 100x difference in performance between the exact same FFI call but just changing from a BigUint64Array into a Uint8Array. Re: The WeakMap. It should actually be created outside of the function scope, so that it itself cannot be GC'd. You can then start code-golfing to something like (pseudo-code incoming because I'm on mobile):
const BUF_TO_STRING_BUFS_MAP = new WeakMap<Uint8Array, Uint8Array[]>();
function toCStringArray(strings: string[]) {
const buf = new BigUint64Array(strings.length);
const chunks = strings.map((str, I) => {
const chunk = encode(str + "\0");
buf[i] = Deno.UnsafePointer.valueOf(chunk);
return chunk;
});
BUF_TO_STRINGS_BUF_MAP.set(buf, chunks);
return new Uint8Array(buf.buffer);
}
const BUF_TO_STRING_BUFS_MAP = new WeakMap<Uint8Array, Uint8Array[]>();
function toCStringArray(strings: string[]) {
const buf = new BigUint64Array(strings.length);
const chunks = strings.map((str, I) => {
const chunk = encode(str + "\0");
buf[i] = Deno.UnsafePointer.valueOf(chunk);
return chunk;
});
BUF_TO_STRINGS_BUF_MAP.set(buf, chunks);
return new Uint8Array(buf.buffer);
}
AapoAlas
AapoAlas•6mo ago
Related: If I did not plug these yet, then take a look at: * Denonomicon: https://denonomicon.deno.dev/introduction * WTF is FFI? videos: https://youtu.be/9f9Ujeods5o?si=qV9YeRyI_TiHxYsK * I also spend most of this seemingly Rust talk talking about Deno FFI: https://youtu.be/QDd4Iu_-MTw?si=VdZm_TF8KnNxl7xW
Deno
YouTube
WTF is FFI? An intro to foreign function interface with Deno, pt. 1
Foreign function interface, or "FFI", is a way to access data and call functions from native dynamic libraries (like C, C++, and Rust). View the source code from this episode's demo here: https://github.com/aapoalas/deno-ffi-tutorial This is part of a longer series where Andy and Aapo talk about FFI, how to use it with Deno, and advance use ca...
SoC Hub
YouTube
Aapo Alasuutari: FFI & libffi (@ Finland Rust-lang group meetup 11/...
Aapo Alasuutari's presentation "FFI & libffi" at the November 2022 meetup of the Finland Rust-lang group in Tampere (organized in cooperation with SoC Hub)
carragom
carragom•5mo ago
Great resources, helped me a lot. Here is the current implementation, if you get a chance let me know if you spot anything problematic.
const TO_STR_ARRAY_MAP = new WeakMap<Uint8Array, Uint8Array[]>()
export function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into Uint8Array appending null byte
const chunks = strings.map((str) => toCString(str))
const ptrs = new BigUint64Array(chunks.length)

for (const [index, chunk] of chunks.entries()) {
const ptr = Deno.UnsafePointer.of(chunk)

if (ptr === null) {
// Should never be possible ??
throw new Error('Failed to create pointer')
} else {
ptrs[index] = BigInt(Deno.UnsafePointer.value(ptr))
}
}

const buf = new Uint8Array(ptrs.buffer)
// Maps the ptrs buffer to the chunks to prevent GC from collecting them
TO_STR_ARRAY_MAP.set(buf, chunks)

// Return the buffer as a Uint8Array
return buf
}
const TO_STR_ARRAY_MAP = new WeakMap<Uint8Array, Uint8Array[]>()
export function toCStringArray(strings: string[]): Uint8Array | null {
if (strings === undefined || strings === null || strings.length === 0) {
return null
}

// Encode strings into Uint8Array appending null byte
const chunks = strings.map((str) => toCString(str))
const ptrs = new BigUint64Array(chunks.length)

for (const [index, chunk] of chunks.entries()) {
const ptr = Deno.UnsafePointer.of(chunk)

if (ptr === null) {
// Should never be possible ??
throw new Error('Failed to create pointer')
} else {
ptrs[index] = BigInt(Deno.UnsafePointer.value(ptr))
}
}

const buf = new Uint8Array(ptrs.buffer)
// Maps the ptrs buffer to the chunks to prevent GC from collecting them
TO_STR_ARRAY_MAP.set(buf, chunks)

// Return the buffer as a Uint8Array
return buf
}
AapoAlas
AapoAlas•5mo ago
Looks good to me! 🙂