by Maksadbek on 4/12/2023, 10:23:42 PM
by zylepe on 4/13/2023, 11:37:55 AM
I spent a while optimizing sqlite inserts for planetiler, this is what I came up with:
https://github.com/onthegomap/planetiler/blob/db0ab02263baaa...
It batches inserts into bulk statements and is able to do writes in the 500k+ per second range, and reads are 300-400k/s using those settings.
by vaughan on 4/13/2023, 8:41:54 AM
I hate the way so much of programming is just fiddling knobs on some old systems. It always gets to a point where I would prefer to rewrite the whole thing myself with only the functionality I need and to truly understand what is going on under the hood. I just wish these projects were built in a modular way so you can re-assemble parts of them back together easily.
by zackees on 4/13/2023, 3:27:34 AM
WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint.
I believe that WAL2 fixes this: Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted
https://sqlite.org/cgi/src/doc/wal2/doc/wal2.mdby freddw on 4/13/2023, 7:38:44 AM
It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:
Based on https://www.sqlite.org/wal.html it seems the WAL index is mmapped as a workaround to some edge cases not relevant to many application developers. They say it shouldn’t matter, but with the larger page sizes you’re using, using the original implementation approach they describe (volatile shared memory) actually might improve performance slightly - do you know if your WAL index ever exceeds 32KiB? Not sure as to the difficulty of resurrecting that old functionality though.
Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:
> When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.
Both the WAL docs and the article mention blocking checkpointing/a need for reader gaps to ensure the WAL flushes, or a possibility that WAL files will grow indefinitely. I had some speculation that this was an implementation limitation, and it turns out another comment mentions WAL2 may relax this requirement by using two WAL files split between “hot” and “cold”. Curious how the performance might compare with this: https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md
by avinassh on 4/13/2023, 5:30:13 AM
I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.
I have explained my rationale and approach here - https://avi.im/blag/2021/fast-sqlite-inserts/
the repo link - https://github.com/avinassh/fast-sqlite3-inserts
by datadeft on 4/13/2023, 5:08:49 AM
We are in the process of moving from Postgres to SQLite so these suggestions are very useful.
by d1l on 4/13/2023, 9:31:02 AM
The fact that he doesn't understand why his wal file is growing without bounds should be a warning to take his suggestions with a grain of salt. This is yet another lazy benchmark with lazy suggestions by someone who is uselessly retreading the path laid down in the SQLite documentation.
by yencabulator on 4/14/2023, 1:38:16 AM
> Normal is still completely corruption safe in WAL mode, and means only WAL checkpoints have to wait for FSYNC.
I would consider data loss on crash to be "corruption", for sure. And synchronous=normal + journal_mode=WAL can lose data:
https://www.sqlite.org/pragma.html#pragma_synchronous
> A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.
by Scarbutt on 4/13/2023, 4:39:07 AM
'Multiple GB' is ambiguous(is it 20GB or 200GB?) and the article doesn't specify the size.
by Dave3of5 on 4/13/2023, 4:00:11 PM
Doesn't this put the entire dataset into memory no wonder it's so fast.
by withinboredom on 4/13/2023, 10:06:35 AM
I'd be curious for a similar tuning with Dqlite: https://github.com/canonical/dqlite
I was benchmarking SQLite3 with Python on my MBP M1 and with default settings could achieve 2-4000 inserts per sec. Then applied suggested settings:
As a result the average number inserts per second was 80K