lithium corrosion sample

Experts at Oak Ridge National Laboratory and around the world are making progress in managing lithium corrosion and are getting closer to tritium recovery.

Solving Id Problems

  • By Artem V. Shamsutdinov
  • June 18th, 2022

AIRport Id scheme has an optimization for local storage of globally unique Ids. It is a limiting factor that is leading to a new way of doing things.

AIRport runs both on the client and the server. On the client it runs in a separate browser tab (or as a native app). There it manages the user's data directly without any intermediate third party servers. The user can trust third party code better on their device than anywhere else. In effect they get their own server every time they use AIRport and (given that all application code can be inspected) this is as close as we can currently get to no third party interference with user data.

On the client everything is pretty straightforward - you get one local database where all your repositories are stored. The local integer Id scheme works well there, it saves space and query time. UuIds are only retrieved when you have to look objects up in other repositories.

Technically on the server the same thing holds true. All of the integer ids are unique to the server and still work just fine. However, when one looks further into what AIRport can do an number new scenarios become apparent where numeric Id scheme does not work.

Technical Problems

Problem I: Decentralized & distributed

This scheme starts to break down when you start mixing the two. AIRport applications are meant to be run with some of the data coming from centralized (though distributed) servers and some of the data being decentralized. This mix gives you the best combination of utility and privacy. But this could lead to Id conflicts in the UIs if both distributed and decentralized data needs to be displayed on the same screen.

Problem II: Back-end distributed

On the server side, the main goal of AIRport is to allow application servers to load data from multiple completely independent data sources. The idea is for every jurisdiction to maintain its own data. This can (and at least at first will be) accomplished by having geographically distributed databases like CockroachDB. But eventually AIRport application servers will support getting data from completely different databases that have the same schemas. This is particularly useful when an app server is processing data across multiple governmental agencies or when its processing data of separate corporations that are not associated with each other.

Problem III: Performance

Finally there is the problem of performance. In AIRport's case, wide column databases are about an order of magnitude faster. AIRport is particularly well suited for retrieving entire Repositories from wide-column databases: the repository record schema is extremely simple, it just has the transaction log entries that are stored as blobs. Thus, the eventual goal of AIRport on the server side is to perform all repository-scoped SQL operations in memory of the application server and just load the data from a wide column storage engine, like ScyllaDB. Basically, repositories will be loaded into an in-memory SQL database in their entirety (repositories are meant to be small and focused). All of the operations that are contained within a repository will be done directly in memory. Cross-repository queries will still go to the distributed relational databases.

Performance can be further enhanced by keeping the frequently accessed repositories in memory even after their requests are completed. The app server will then query the wide column database where repository is archived for new transaction log entries. This is where numeric ids in records, vs string ids can really help to store more data in app server memory, especially for smaller records. And for the Web implementation of AIRport, all repositories are stored in memory (with SqlJs).

But, the numeric Ids for repositories and actors are unique only to the in-memory SQL database that is running on the application server. Thus they cannot be relied upon for persistence operations, when those operations go to another application server (because UIs aren't expected to maintain server sessions).

The solution

And, as I have just realized there is a very elegant solution to these problems. A solution that does not require storing very long global UUIDs as strings in every record (which would really matter on mobile devices):

On the client

The main problem on the client is that the UI might talk to different AIRport servers and get records from them. In turn, these records might have conflicting numeric ids (for Repositories and Actors). This would cause an incorrect display of results.

The multiple server scenario is important because AIRport is designed for UIs that will be running against their own servers and the local Turbase at the same time.

"Air Bridge"

The solution is to always bundle a small AIRport library (airbridge) in your code and let it aggregate and re-Id all of the data. This implies that there will be a new (internal) format for query results that will include all of the Repository and Actor records present in the result (as a separate descriptor). The library would then simply populate those records in every record of the result set. Of course this helps in the standard case where there is only one server you are talking to as well.

On the way back to the server the library will split and direct repository requests to the correct servers (by parsing the repository UuId and finding where the request should go). This will also save network bandwidth by passing in Repository and Actor records only once.

Transparent UuId queries

This approach also takes care of the inelegant problem of having to add a 'uuId' field in the query engine which looks really weird to the untrained eye and can hurt adaption:


this._find({
    select: {
        '*': Y,
        uuId: Y,
        anotherEntity: {}
    }
})

The natural question when one sees this syntax for the first time is: "If you are doing SELECT * why are you also including another field?" This can throw developers off and make them think that they don't get what is going on here and lose confidence in their ability to just switch to AIRport without any learning curve.

With the new solution AIRport will simply collect all of the numeric ids on the returned results and do two more queries:


select: {},
from: [
    r = Q.Repository
],
where: r.id.in(repositoryNumericIds)

select: {},
from: [
    a = Q.Actor
],
where: a.id.in(actorNumericIds)

Faster than you think

This seems like a high price to pay - 3 queries instead of one. But it really only impacts performance on the most trivial of queries, where only one table is queried by Id (really fast). And the alternative would be to join the simple query to Repository and Actor entities anyway.

But, this approach actually becomes faster than the alternative with larger queries, where every entity in the select statement would have to be separately joined to Repository and Actor. Also the native result set in memory (which is always just a table with every possible combination of records) becomes much smaller, reducing both memory and CPU requirements to process it. This also enables returning the entire Repository record and the Users behind Actors and the owning User of the Repository, almost for free. This is true since the implementation of _find and _findOne methods returns interlinked object graphs where the same Repository, Actor and User objects are pointed to by any number of referencing objects.

On the server

On the server the situation is slightly different. Application server receives records from the UI and persists them. If it has a local cache and already has the passed in Repository in it, then the Repository and Actor numeric Ids will likely be different from what is coming from the UI.

Also there may be a duality of Ids between the underlying distributed database and the in-memory database on the app server (especially if the app server is talking to multiple separate databases).

Having an internal format for passing in objects solves this issue. The UUIDs are always passed in and numeric ides are converted to the values the app server in it's cache (and to the values in the central distributed database). An alike adapter can be used to manage multi database (with same schemas) connections to the app server, once that becomes relevant.