NearStar Fusion is developing a simplified modular approach to nuclear fusion.

Turbase the Network

By Artem V. Shamsutdinov
December 6th, 2023

Storage of Repository data is pretty straight forward - every entry in its transaction log can be a separate file, timestamped at the time of its creation (preferably the proven time of persistence of the transaction log entry). But the retrieval of data is a lot more complex. The original idea was to store each repository in a folder, with data being broken out in chronological increments (days, months, years) into their own folders. Eventually the idea of rolling up older transactions into single files was added, so that data for a given year can be rolled up into a single file for faster retrieval (with a flat directory structure containing year, month, day aggregates and recent individual log entries that call all be retrieved at the same time after the directory contents are read).

AIRport runs primarily on the client device, with the transaction logs being retrieved and composed into a working database right on user's phones or computers. This is needed to ensure data privacy, with transaction logs being encrypted (and decrypted) on user devices. This means that even in the optimal case of all data for a repository being created in a single year (and thus a single file) 2 serially placed network calls are required to retrieve the data. And in the ideal situation a single network call would be placed to retrieve all of the data for a Repository.

So, my latest thinking has been on how to accomplish that and so far I see no way of doing so without introducing a dedicated network of database nodes that will aggregate the data. This is further reinforced by my recent findings on how Filecoin and Arweave work - both require CDN like networks on top of them to ensure fast data retrieval and persistence.

This means that the original idea behind Turbase is now expanded into a fully blown network of nodes that will provide persistence and retrieval of database transaction logs. Internally these nodes can rely on Filecoin for data storage (primarily for backup purposes) and optional permanent archival (into Arweave). But for timely data retrieval they will likely have to maintain the transaction logs for as long as the user doesn't "file" them into infrequently accessed long term storage (on Filecoin or Arweave).

The need for a full blown network of nodes is further reinforced by the fact that not all of the data will belong to the users - quite a bit of it will be open data shared by communities. Having Applications own that data isn't a good solution either, since they will be materially motivated to restrict access to that data.

This is where the original idea behind Turbase re-enters but now in a new light. Instead of simply being a web-site that hosts the code necessary for a client device to talk to storage networks and run a database locally it now becomes a fully blown network of database nodes on top of which Applications run.

Arguments on how to make sure such a network remains independent and decentralized are yet to be made but from a technological point of view this network is different from any distributed database because the Relational Database engine and applications still run on user devices. This means that for basic operation all that is needed is a (fast retrieval) file server (for composite records) and an efficient NoSQL database (for recent records, yet to be combined into the composite). Of course for community Apps data will likely also need Full Text Search capability and also ability to make aggregate queries across repositories so additional (FTS and RDBMS) functionality can be provided by these nodes as well. Additional fast access storage may also be provided for the non-database data (images, video's, etc.) that is referenced from the (not yet-archived) Repositories.

And since the network will serve both private and community based storage needs it can be scaled naturally within already existing community lines (metro areas, states/regions & countries). Given that most of human activity is usually limited to these geographical areas it only makes sense to partition the (non global) data in a corresponding way (but more on that in a future post).

Update - Dec 7th, 2023

Node operator incentives should be structured around the fact that Turbase will host community and private data, as well as provide open (but possibly paid) access to aggregated relational data. Yes node operators should be compensated for their costs and should have a profit margin but making profit shouldn't be the primary motivation. First and foremost node operators will be in the business of providing a community service and should be incentivized to prioritise hardware allocation (and not deallocate) based on the fact that their community relies on them and not on the fact that Turbase provides the biggest profit margin.

This segues into the fact that Turbase data is meant to be partitioned into community run clusters. This means that there should be a global cluster and a number of (tiered) geographical community clusters. So a country can have its own cluster, as can a state/region, a metro area, a county and a municipality. Turbase will also allow for topic based communities to run their own clusters and for companies to also plugin into the ecosystem by providing company operated clusters.