Trying to get my head around whether opening a write stream in Node with 'as' flags is the same as running fsync() after every write… anyone have any experience with this? The docs are sketchy…

(Basically, I’d like my writes to be as “safe” as possible—not a trivial issue with file systems, I’m finding—but I’d rather avoid write/fsync race conditions and callback hell around using fsync with write streams if possible.)

Asked in more detail here: github.com/nodejs/node/issues/

Thanks in advance ;)

After more tests, I’ve empirically narrowed it down to:

- Up to ~67MB source size: require() is faster
- At around 67MB, they’re about equal
- Beyond 67MB, line-by-line streaming and eval is faster.

These tests are with the version of V8 that’s in Node 12.x. It may differ for other versions depending on whether they contain large string handling optimisations.

Show thread

@mathew After about ~67MB source size, it looks like V8’s string handling becomes the bottleneck. require() is faster the closer you move back from that number and the line-by-line/eval approach gets faster in the other direction. The timings include the evals. (And the code is run sequentially to simply create a data structure, so any runtime optimisations shouldn’t be an issue.)

Very interesting, on a 200MB JavaScript source file where each line is a complete statement, I’m seeing about 2x faster loads with a line-by-line stream and eval() vs loading the file in using require().

Context: mastodon.ar.al/@aral/104915201

@gert Just had a quick skim of the source; looks like a very light/simple implementation

PS. Have you published the package? Folks might be more willing to do an npm install --global ;)

@gert I guess that makes your toot the web page ;)

@gert Sounds neat – looking forward to playing with it – is there a link to a web page where I can find out more?

Pinched peroneal nerve. Do not recommend. Ouch!

*limps away*

@stoically It would if it was a flat structure (eg., rows in an array). But not for deeply-nested structures.

@zensaiyuki (Reading > 1GB files line-by-line also works. Just means that they cannot be compacted as you cannot create a JSON string that’s > 1GB. So basically, if you really need a large database, this should scale to available memory with instant read/write times during use.)

@zensaiyuki Also, even these aren’t really hard limits. You can increase the heap size by launching node with (e.g.) --max-old-space-size=8192 (with which I just wrote a 2GB table). And instead of using require (e.g., if it throws), you could stream the file in line-by-line and eval it.

Again, though, I’m not trying to encourage people to store data on the server as much as possible. And that’s butting up against my desire to make every piece of this as nice as possible. :)

@zensaiyuki So I just did I little test to see where it would break Node. Crashed it at around the 1.4GB mark with V8 running out of heap memory. Attempting to load in the 1.3GB file that attempt created also crashed Node with the answer to your question: “Cannot create a string longer than 0x3fffffe7 characters.” (So 1,073,741,799 characters). So the upper limit for data set sizes for this is ~1GB. I’m cool with that for its use case of small server-side data for Small Web sites/apps. :)

@zensaiyuki So I just did I little test to see where it would break Node. Crashed it at around the 1.4GB mark with V8 running out of heap memory. Attempting to load in the 1.3GB file that attempt created also crashed Node with the answer to your question: “Cannot create a string longer than 0x3fffffe7 characters.” (So 1,073,741,799 characters). So the upper limit for data set sizes for this is ~1GB. I’m cool with that for its use case of small server-side data for Small Web sites/apps. :)

@zensaiyuki (Will make sure to test. Node.js has a ~1.4GB memory limit anyway so I think we’d hit that before we hit source file size but always good to verify regardless so we know where the limits are) :)

@fosshermit Nothing’s impossible. It’s just in the interests of those who benefit from the status quo that you believe it is ;)

@david Ah thanks, about to hit the sack; will fix it in the morning :)

What if data was code?

ar.al/2020/09/23/what-if-data-

With thanks to @zensaiyuki @Moon @cjd @vertigo @gert @clacke @pizza_pal and everyone else who took part in my little online brainstorming session earlier today.

Show more
Aral’s Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!