Helion Energy is working with Nucor to deploy a 500 MW fusion power plant for a steelmaking facility.

Dependency-First Queries

By Artem V. Shamsutdinov
October 15th, 2023

This morning I finally arrived at the idea of how to efficiently and effectively ensure repository Atomicity at query time.

The problem of Atomic Repository queries has to do with the duality of state of Automic Repositories. This duality arises because Autonomous Repositories carry with them a local version of all the records they depend on. This means that when an AIRport database does not have dependencies of a given Repository it can still provide full query results for that Repository by first querying against the records that are local to that Repository and subsequently checking other repository data for information.

And it is exactly this thinking that got me stuck on the solution so far. First checking local copies and then progressively loading remote data is a complex solution. It requires constructing multiple versions of queries depending on what Repositories are loaded or are missing. This is exactly why as of right now AIRport repositories rely on availability of Repository dependencies (progressively loading them and not checking local data records).

The initial solution to this complexity is to flip the problem description on its head. Instead of first checking local data, construct a join across Repository boundaries. Then check the resulting object tree against non-null constraints and see what is missing. If there is data that is missing then separate queries can be constructed against local data copies.

This leaves the problem of WHERE clause statements against Repository dependencies. In that case a query will return a result that is missing records completely. The solution for that is to run a "fully local" query, detect missing records in the original query and stub them in as necessary. This solution replaces the non-null constraint check above, since cross checking queries will both provide more accurate results and detect missing data at the same time. And the cost of doing so is fixed - AIRport has to always run 2 queries instead of potentially any number of sub-queries.

This works for both Observable and Promise queries since now instead of a single query run every query performs two runs and a combined result set is returned. And, this solution also covers the problem of having more missing data when immediate Repository dependencies are loaded. The identical "second query" process will detect and additional repositories that must also be loaded to return the fully accurate query results.

Additional bonus - simple copy creation

An additional question came up on how to maintain the "copy records". One approach is to try to update the copies with the latest available data from the related repositories. However, this approach can get out of sync on different databases that have varying copies of the related repositories (causing a potentially infinite loop of data conflict resolutions). On top of that this solution is not performant and requires additional queries, persistence operations and network round trips and storage.

The solution to this question now appears to be simple - just write the copy records once, when the repository linking (or in most cases the linked data) is created. This solves the above problems in a very elegant way - if related repositories cannot be loaded you simply see the original state of the linked data.

Update - Dec 6th, 2023

Since I wrote about copy objects last time I've been able to refine the concept further. Quite a bit of complexity can be removed if the objects (referenced via @ManyToOne()) are just copied into a Repository as is, maintaining their original IDs (that belong to another Repository). When a Repository is used in a database where the referenced Repository isn't yet loaded, when it does get loaded the synchronization code will just run UPDATES instead of INSERTS and update the present records. This increases the complexity of the synchronization code but removes the need for double queries on every read statement. This also removes the need for special "*_COPY_*" id fields on every @ManyToOne() relationship. Given that the number of read queries will be far greater than the number of synchronization updates there should be significant performance gains (as well as compartmentalization of complexity).

On a separate note, maintaining copy objects greatly helps with synchronization code since it allows Repositories to be loaded in "no particular order", irrespective of what dependencies they have for each other. This means that given two Repositories that depend on each other either one can be loaded before the other. Finally, another thing that copy objects help with is the enforcement of NOT NULL constraints, which can now be done on the database level since there will always be data at insertion time (even when it spans multiple repositories).