Parsing HTML/XML
Lately I've been researching how to write a simple HTML/XML -> JSON converter. My syntax isn't completely standard HTML (i.e.,
<Button>
is possible) which makes this a notch harder than it probably should be. Here's an example of how things should work: https://github.com/gustwindjs/gustwind/blob/feat/html-prototype/html-to-breezewind/tests/element_test.ts .
The current implementation relies on https://deno.land/x/deno_dom but the problem is that it's losing information for tagName
. I cannot use it to tell a button
apart from a Button
as it's uppercasing the tag by default.
As an alternative, I looked into https://deno.land/x/xml but it's losing structural information due to its automatic grouping (i.e., it folds div
s within a single array and loses their relative positioning so I cannot map the structure later on to match the original).
Any insight on the issue would be valuable. Maybe I have to fork either to add the missing functionality but I hope to avoid that.GitHub
gustwind/html-to-breezewind/tests/element_test.ts at feat/html-prot...
🐳💨 – Deno powered JSON oriented site generator. Contribute to gustwindjs/gustwind development by creating an account on GitHub.
2 Replies
Using an existing HTML parser will be difficult because tag names are case insensitive in HTML. If you don't need to support all the crazy edge cases and could live with a more constrained language like JSX you could make parsing a lot easier. In
htm
for example which is a mix of HTML + JSX we handrolled our own for a similar purpose https://github.com/developit/htm/blob/master/src/build.mjsthat's a good point. JSX would be an option perhaps. i could just skip using their templating syntax (or even inject from context later and skip JS expressions that don't map to JSON). i'll try this direction in my next attempt. thanks!
looking closer, it's definitely going to be some variant of your
treeify
so that's a huge help already