Trying to get my head around whether opening a write stream in Node with 'as' flags is the same as running fsync() after every write… anyone have any experience with this? The docs are sketchy…
(Basically, I’d like my writes to be as “safe” as possible—not a trivial issue with file systems, I’m finding—but I’d rather avoid write/fsync race conditions and callback hell around using fsync with write streams if possible.)
Asked in more detail here: https://github.com/nodejs/node/issues/28513#issuecomment-699680062
Thanks in advance ;)
After more tests, I’ve empirically narrowed it down to:
- Up to ~67MB source size: require() is faster
- At around 67MB, they’re about equal
- Beyond 67MB, line-by-line streaming and eval is faster.
These tests are with the version of V8 that’s in Node 12.x. It may differ for other versions depending on whether they contain large string handling optimisations.
@zensaiyuki So I just did I little test to see where it would break Node. Crashed it at around the 1.4GB mark with V8 running out of heap memory. Attempting to load in the 1.3GB file that attempt created also crashed Node with the answer to your question: “Cannot create a string longer than 0x3fffffe7 characters.” (So 1,073,741,799 characters). So the upper limit for data set sizes for this is ~1GB. I’m cool with that for its use case of small server-side data for Small Web sites/apps. :)
Thank you to everyone who just helped me think through this and also for letting me rubber duck with you – appreciate it :)
… But, of course, given that changes to the data structure in Node.js cannot conflict (are synchronous in memory) what if we didn’t use JSON.stringify at all but simply stream the history of all changes to persist them. At server start (where performance is not the bottleneck), the in-memory object graph would be recreated from the playback of the history. And that should give us almost instant writes in addition to almost instant in-memory reads. Ooh, going to experiment with that now :)
Thinking of experimenting with delta updates of serialised JSON (as opposed to full serialisation every time, which is expensive for large collections) and wondering if anyone knows of any existing libraries, experiments, etc., that use special object IDs to mark the start and end of objects to enable delta string substitution in serialised JSON. My search engine fu is not returning any results.
Very impressed with json-stream-stringify (https://github.com/Faleij/json-stream-stringify) – clocking loop delay in the 0.x ms range during use. Now to try and reduce CPU usage somehow…
Added some more tests to WhatDB (What Database? The tiny, transparent server-side persistence/query layer I’m working on for Site.js) and I’m happy with the behaviour and performance of the persistence layer, I think.
Calling it a day. Will continue over the weekend with implementing the query interface.
Work in progress; but feel free to look around, have a play, share your thoughts, etc.
Keeping the raw data object and a lazily-instantiated Proxy structure to mirror it = an order of magnitude decrease in serialisation time on WhatDB, the tiny write-on-update in-memory server data layer I’m coding for Site.js. It’s meant to be used for small amounts of data and is blisteringly fast at that. Taking my time iterating on the design but should hopefully make better progress now that the core relationships are shaping up. Going to test memory/performance with a non-lazy version also.
Building tech for freedom?
Is it intuitive, easy to use, focused, and consistent?
Does it have beautiful defaults?
- Private by default?
- Secure by default?
- Usable by default?
Design philosophy: do the right thing by default and make the dangerous stuff hard to do accidentally.
Linux philosophy: do the wrong thing that’s also a massive security hole by default and then ridicule the person using it for not knowing the twenty-six command-line options it takes to make it do the right thing securely.
I make Small Tech.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!