Tableland’s SQL is has some web3-specific intricacies and recent learnings to take note of.
What to expect when you’re expecting to write SQL on Tableland
Tableland’s SQL is has some web3-specific intricacies and recent learnings to take note of.
By @Carson Farmer
Since our first official SQL Specification announcement, we’ve learned a lot of things, largely driven by real usage on the Tableland network. For example, did you know that the full SQL spec that most folks refer to when talking about SQL is somewhere around 128 pages. Ours is much smaller, but covers a lot! Also did you know that earlier this month, the main Tableland Validator Node processed over three-hundred thousand query requests in one day?!
We also learned that our initial specification was missing some features that folks building on our network needed, and included some things that it probably shouldn’t. So this week, we’re launching our first major network upgrade, and it includes a whole slew of SQL Specification updates. Here’s a comparison of recent changes to our SQL Specification to explore. You can view them using the nice GitHub UI (check out the “Files changed” tab), or you can read on to learn more about what we’ve changed and why.
Define canonical encoding
SQL is all about text-based statements to represent actions taken on tabular data. That text-based representation is obviously pretty important. The Tableland SQL Specification document outlines what types of statements Tableland accepts, and how to use them. Internally, the statements are represented using an abstract syntax tree (AST). The internal representation of the nodes of this AST is outside the bounds of the Tableland SQL Specification, however, further details can be found in the Go Tableland SQL Parser reference implementation.
With the above caveat in mind, the Tableland SQL Specification does define a canonical string encoding of a set of (compliant) SQL statements that have passed through the Tableland SQL Parser (and have been represented via the Parser's AST). That is, our Specification now outlines — in general terms — the string encoding produced by parsing a set of Tableland SQL Specification compliant statements and re-encoding them into a canonical (string) format.
Drop REAL and ANY support
This one is a biggie! I’m actually not going to cover this one in great detail here because I’ve written a whole separate blog post about this over here. But suffice to say, our SQL Specification has been updated to no longer allow
ANY datatypes in table
CREATE statements, or anywhere else (i.e., in queries) for that matter. It turns out that floating point math is too difficult to support in blockchain based networks of distributed (and often heterogeneous) systems.
This means that the set of support data types on Tableland are:
Signed integer values, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
Text string, stored using the database encoding (UTF-8).
A blob of data, stored exactly as it was input. Useful for byte slices etc.
Anything else you’d like to store on Tableland can and should be done using the above data types. Practically speaking,
TEXT is good for things like JSON, large integers, event floating point numbers assuming you don’t want to do math on them!
INTEGER are great for things like date/time and even booleans. In fact, we have a list of common types and how to represent them in our SQL Specification, so take a look there for ideas.
Deterministic primary keys
As part of our work on reducing non-determinism in our SQL Specification (related, see section above about dropping REAL and ANY support), we also identified some peculiarities when working with integer primary keys. On Tableland (which inherits this behavior from SQLite), if a table has a single column primary key and the declared type of that column is
INTEGER (as opposed to
INT), then the column is known as an
INTEGER PRIMARY KEY, and it bestowed with special properties and behavior. You can read all about these special properties in the SQL Specification document here. One notable feature of Tableland’s implementation of integer primary keys is that here is an additional implied
AUTOINCREMENT constraint, which forces the integer primary key to behave as if it were specified with
INTEGER PRIMARY KEY AUTOINCREMENT. This is a special feature of the Tableland SQL Specification, and helps to ensure deterministic ordering of values within a table. While the
AUTOINCREMENT keyword is implied with
INTEGER PRIMARY KEY columns, the keyword itself is not allowed according to our specification. Any attempt to use the
AUTOINCREMENT keyword on any column results in an error.
Remove date-time functions
The last little bit of non-determinism that we identified this past season was around date/time functions. Like floating point math, date/time functionality is often messy and full of inconsistencies. Timezones make things complicated when doing math with dates, and more importantly, things like SQLite’s datetime functions all support the concept of “now” as a time value. “Now” will end up being lots of different things on lots of different machines in lots of difference places, so for the safety of the network, we have disabled these functions. The good news is these are all really just “helper” functions for dealing with dates and times expressed as strings. So you can still just use whatever client language you want and insert and update dates directly, so this isn’t a huge loss.
Custom transaction functions
Ok, this update is pretty cool! Wouldn’t it be handy if you could reference the transaction hash of the very transaction that is carrying your SQL statement… inside your SQL statement?! Or what about leveraging the current block number in your SQL statement? Well you’re in luck! The Tableland SQL Specification now includes several web3 native functions that simplify working with blockchain transactions. The list of custom functions may grow over time, but for now we’re starting with:
BLOCK_NUM (which inserts the number of the block that delivered the SQL event) and
TXN_HASH (which inserts the hash of the transaction that delivered the SQL event). These both work in write queries, whereas if
BLOCK_NUM is is called with an integer argument (i.e.,
BLOCK_NUM(<chain_id>)), the Validator will replace this text with the number of the last seen block for the given chain, is only works in read queries.
Support for UPSERTs
UPSERT is a clause added to
INSERT that causes the
INSERT to behave as an
UPDATE or a no-op if the
INSERT would violate a uniqueness constraint.
UPSERT is not standard SQL, but it is pretty common in SQL implementations, including SQLite and PostgreSQL. This has been a relatively long-time request both internally, and from external devs. We’ve finally added it to the Tabeland SQL Specification, and the list of use-cases for this type of syntax appears to be huge!
The simplest example of the utility of an
UPSERT is something like the following:
CREATE TABLE vocabulary(word TEXT PRIMARY KEY, count INT DEFAULT 1); INSERT INTO vocabulary(word) VALUES('jovial') ON CONFLICT(word) DO UPDATE SET count=count+1;
In that example, the statement attempts to insert the new vocabulary word "jovial", if that word is not already in the dictionary, or if it is already in the dictionary, it increments the counter. Nice! This is likely to be really handy for folks looking to conditionally update traits on their NFTs, for example.
Support for all JOIN types
Joining and relating data is a principal function of SQL databases — it’s the relational part of the whole thing! But when we first defined the Tableland SQL Specification, we weren’t sure we’d be able to support the whole gamut of possible
JOINs. Most SQL engines support
OUTER joins from
FULL, and now, so does Tableland! We even support
CROSS JOIN and compound select statements. In fact, we essentially support all of the
JOIN features supported by our underlying SQL engine, SQLite!
SELECT * FROM healthbot_80001_1, healthbot_5_1 WHERE healthbot_80001_1.counter = heathbot_5_1.counter;where we are joining between Polygon Mumbai and Goerli, but not something like this:
SELECT * FROM healthbot_80001_1, healthbot_1_1 WHERE healthbot_80001_1.counter = heathbot_1_1.counter;where we are joining between Polygon Mumbai and ETH Mainnet.
Similarly, compound select statements, which is what you get when two or more simple
SELECTstatements are connected together to form a “single” compound
SELECT using the
EXCEPT operator, have been added to the Specification. Compound select statements are super handy for creating complex queries from a series of simpler sub-queries.
Support for sub-SELECT
Speaking of sub-queries, I am pleased to report that the Tableland SQL Specification now supports the much-requested ability to use sub-selects within
INSERT statements. I anticipate this is going to be a major super-power of Tableland, for one very specific reason that I’ll illustrate below. With that in mind, there are a few caveats to note here as well, because we need to be able to maintain deterministic results across all Validators in all circumstances.
So first thing first, we only support flattened
SELECTs with direct table access within an
INSERT statement. This means no
JOINs, or further sub-queries. You are also only able to reference tables on the same chain. This is a slightly further restriction than the
JOIN stuff above, which allows cross-chain references as long as they are all testnet or mainnet tables. Additionally, we are (temporarily?) blocking the
GROUP BY clauses on
INSERT until we better understand how they behave in practice (again, avoiding indeterminate results).
Ok, but even with those limits in mind, we are still able to support one really killer use-case… data migrations/cloning/copying. This is a big win for NFT collections, folks migrating the structure of their app or service, and a whole lot more use-cases. And just look at how simple this is:
INSERT INTO table2 (column1, column2, column3, ...) SELECT column1, column2, column3, ... FROM table1 WHERE condition;
What isn’t entirely obvious here, is the huge level of compression we get from this simple statement. Rather than creating a whole new table and inserting 1000s of rows, you can create a new table, and submit a single statement to copy over those 1000s of rows for a fraction of the cost! This saves big time on gas for developers, and compute for node operators. It is pretty much win win, and I know there are a ton of use-cases this opens up.
Moved spec to GitHub
This one is pretty obvious if you’ve been following along and diving into my links throughout this post. We moved our official SQL Specification source of truth to GitHub. We’re hoping this change will make the Specification update process more transparent, and encourage external contributions from community members who want to help drive the evolution of our core specification.
So that is pretty much everything you need to know about the SQL Specification changes coming to Tableland this go around. As always, don’t hesitate to reach out if you have any questions or concerns, and now that we have our Specification on GitHub, please feel free to engage with us there, submit issues and PRs, and just otherwise help drive forward the SQL Specification for the worlds only web3 native SQL database.