A look at recent updates made to Tableland Rigs data storage
Rigs Data Storage Design Update
We take close look at changes made for optimizing image management, data updates and storage cost after experiencing difficulties with the previous design.
By @Aaron Sutula
We previously examined the design and implementation of the Rigs tables in Exploring the Rigs Tables. There, we discussed data composability and sparse vs. dense tables (data normalization), and then presented a tables design that balanced data normalization with easy queryability and minimal join clauses and subqueries needed. The main goal was to require very little external knowledge about the Rigs data model in order to understand the dataset. That design worked well for a while, but eventually presented some difficulties for us. In this post, Iβll discuss recent changes we made and why, providing some insights that may be useful for other NFT projects storing data on IPFS and Tableland.
Image Storage Management
Problem
Previously, each Rigβs imagery was stored on IPFS under its own content identifier (CID). The main Rigs table included four image-URL-related columns, and each of those columns included the imagery CID unique to that Rig:
// Schema of rigs_5_28
(
id integer primary key,
image text,
image_alpha text,
thumb text,
thumb_alpha text,
animation_url text
)
For example, selecting only the image
column (the image_alpha
, thumb
and thumb_alpha
columns are similar):
> tableland read "select image from rigs_5_28 where id = 1 or id = 2" --format pretty
βββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β image β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 'ipfs://bafybeid4gwmvbza257a7rx52bheeplwlaogshu4rgse3eaudfkfm7tx2my/image.png' β
β 1 β 'ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi/image.png' β
βββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
While this made the initial generation and storage of the imagery for a single Rig quick and easy, it proved to be a challenge for managing the imagery for all Rigs later on. When storing data on IPFS, it is necessary to use βpinning servicesβ to assure your data is readily available on the network. Having 3000 unique CIDs to pin or unpin became extremely difficult and time consuming.
Solution
The fix was to store all Rigs imagery under a single CID on IPFS, using a nested directory for each Rigβs images. The resulting data in the Rigs table looks like the following, for example:
> tableland read "select image from rigs_5_28 where id = 1 or id = 2" --format pretty
βββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β image β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 'ipfs://bafybeib3bbctx2gpuzicibprsa3h3zbusogxplccnlgbtmargfnh7bcria/1/image.png' β
β 1 β 'ipfs://bafybeib3bbctx2gpuzicibprsa3h3zbusogxplccnlgbtmargfnh7bcria/2/image.png' β
βββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This made it easy to add all Rigs imagery to a new pinning service or remove a test set of imagery from a pinning service with a single request. Much easier to manage.
Rig Image Updates
Problem
One Rigs holder found and reported a glitch in the image of a Rig they own. We were able to narrow the problem down to a single layer used to render the image and worked with the artists to correct it. Thatβs all good, but I soon realized that this single layer update was going to cause thousands of row updates in the Rigs tables!
This problem existed in two places. First, with the layers dataset where all layers images were already stored under a single CID on IPFS, but that CID was listed in each and every row of the layers table. Here, we can look at just two rows to see an example:
> tableland read "select fleet, rig_attributes_value, cid from layers_5_29 where id = 1 or id = 2" --format pretty
βββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β fleet β rig_attributes_value β cid β
βββββββββββΌβββββββββββββββΌβββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 'Airelights' β 'Dawn Varray' β 'ipfs://bafybeiepkpr4g4jzqyy6cjaegcxtmngyjqnf4vhekfrhadqappl5nuxvli/Airelights/Airframe/Cloudlifter_Dawn_Varray.png' β
β 1 β 'Airelights' β 'Sunset Varray' β 'ipfs://bafybeiepkpr4g4jzqyy6cjaegcxtmngyjqnf4vhekfrhadqappl5nuxvli/Airelights/Airframe/Cloudlifter_Sunset_Varray.png' β
βββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You can see that if a single layer image is updated, the root CID for all the layers will change, and then every single row of the layers table, all 1074 of them, would need to be updated. This is very inefficient and expensive.
A similar problem existed in the Rigs table as described in the Solution section under Image Storage Management above. You can see that since this one layer is updated, the rendered Rig images that use that layer will be updated, and therefore we get a new root CID which needs to be updated in all 3000 rows of the Rigs table.
Solution
The fix here is to remove all traces of the root CIDs from the layers and Rigs tables. Since the paths within the root CIDs are unique, those can stay, but the root CIDs were moved to a βlookupsβ table where bits of information that may change can be stored and combined with relevant data from other tables as needed:
> tableland read "select layers_cid, renders_cid from lookups_42161_10" --format pretty
βββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β layers_cid β renders_cid β
βββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 'bafybeiblrxikxpcwanbgs5g5j6yftmuhdfmesy4rohbhewn2gype3sqgue' β 'bafybeidpnfh2zc6esvou3kfhhvxmy2qrmngrqczj7adnuygjsh3ulrrfeu' β
βββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The updated layers table now looks slightly different, storing only the path to the image within the root CID:
> tableland read "select fleet, rig_attributes_value, path from layers_42161_8 where id = 1 or id = 2" --format pretty
βββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β fleet β rig_attributes_value β path β
βββββββββββΌβββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 'Airelights' β 'Dawn Varray' β 'Airelights/Airframe/Cloudlifter_Dawn_Varray.png' β
β 1 β 'Airelights' β 'Sunset Varray' β 'Airelights/Airframe/Cloudlifter_Sunset_Varray.png' β
βββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You can see how lookups_42161_10.layers_cid
can be combined with layers_42161_8.path
to create the full IPFS path needed to resolve an image on IPFS. An equivalent change was made to the Rigs table. Going forward, if a root CID changes, an update to a single column of the single row in the lookups table will be all that is needed.
Table Writes Cost
Problem
Since we were migrating the Rigs tables to Arbitrum One, it was a good opportunity to do some analysis around the cost of writing all the Rigs data to Tableland. At the time (mid November 2022), writing data to Arbitrum One was costing us about 0.029 ETH/MB (~$35 USD/MB at the time of writing this post). The Rigs data was represented by about 3 MB of SQL insert statements, so on the order of $100 USD to get all the data inserted. This isnβt too much money, but it felt like it would be nice to reduce the cost and not feel so bad if we have to re-insert or migrate the data to new tables in the future.
Solution
After making the changes discussed previously in this post (storing imagery under single root CIDs, and moving root CID values to the lookups table), the main Rigs table looked very repetitive. For example, selecting the image path and animation URL for a couple different Rigs:
> tableland read "select image, animation_url from rigs_5_28 where id = 1 or id = 2" --format pretty
βββββββββββ¬ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β (index) β image β animation_url β
βββββββββββΌββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β '1/image.png' β 'https://rigs.tableland.xyz/0.html' β
β 1 β '2/image.png' β 'https://rigs.tableland.xyz/1.html' β
βββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββ
You can see that the only thing changing from row to row was the Rig id part of each column value. This was true for all columns in the Rigs table. It seemed like a waste of money to write all this repetitive data to Tableland when we could simply look up the static/repetitive parts and combine them with the Rig id when necessary.
To achieve this, we added more columns to the lookups table. Its current schema looks like this:
// Schema of lookups_42161_10
(
renders_cid text,
layers_cid text,
image_full_name text,
image_full_alpha_name text,
image_medium_name text,
image_medium_alpha_name text,
image_thumb_name text,
image_thumb_alpha_name text,
animation_base_url text
)
Importantly, there is only one row stored in the lookups table. Given a Rig id, using the data in the lookups table, we can construct the URI for any of the six Rig images and its animation URL. Using pseudocode, hereβs how you would construct the IPFS URI for the thumbnail image for a Rig with id rig-id
:
ipfsUri = "ipfs://" + renders_cid + "/" + rig-id + "/" + image_thumb_name
Similar string building logic is used in the updated SQL query the Rigs smart contract uses to deliver Rigs NFT metadata.
This change allowed us to completely eliminate the Rigs table since the only things left in it were the Rig ids. This cut the amount of necessary data storage in half and allowed us to store all the Rigs data on Arbitrum One for ~0.044 ETH, or on the order of $50 USD.
Updated Metadata Query
The SQL query used in the metadata URI returned by the Rigs smart contract had to be adjusted to work with the new lookups table and removal of the Rigs table. You can see the new query here:
select json_object(
'name','Rig #'||rig_id,
'external_url','https://garage.tableland.xyz/rigs/'||rig_id,
'image','ipfs://'||renders_cid||'/'||rig_id||'/'||image_full_name,
'image_alpha','ipfs://'||renders_cid||'/'||rig_id||'/'||image_full_alpha_name,
'image_medium','ipfs://'||renders_cid||'/'||rig_id||'/'||image_medium_name,
'image_medium_alpha','ipfs://'||renders_cid||'/'||rig_id||'/'||image_medium_alpha_name,
'thumb','ipfs://'||renders_cid||'/'||rig_id||'/'||image_thumb_name,
'thumb_alpha','ipfs://'||renders_cid||'/'||rig_id||'/'||image_thumb_alpha_name,
'animation_url',animation_base_url||rig_id||'.html',
'attributes',json_array(
json_object(
'trait_type','status',
'value','pre-reveal'
)
)
) from rig_attributes_42161_15 join lookups_42161_10 where rig_id=<rig-id> group by rig_id;
The main change is that there is no reference to the old Rigs table, and we join the attributes table to the lookups table without specifying specific columns to join on. The end effect is that the single row that exists in the lookups table is appended onto every row in the result set. We can then reference columns from the lookups table while building string values using the SQL string concatenation operator ||
. For example, our image
JSON keyβs value is created as:
'ipfs://'||renders_cid||'/'||rig_id||'/'||image_full_name
renders_cid
and image_full_name
are values that come from the lookups table.
Summary
The previous Rigs tables design focused on minimizing required external knowledge in order to understand the Rigs dataset. It presented problems around image storage management, data updates, and storage cost. With a slight tradeoff in the amount of required external knowledge to understand the dataset β You now must know how to combine data from the lookups table with the Rig id and other information β we can now easily and cheaply address those concerns. Hopefully the solutions outlined here give you ideas about how to design your Tableland tables in a future-proof way. Happy building!