Marigold - Understand your bigmap

#Analysis

Introduction

The `Map`, also known as `Dictionary`, is a key-value data structure widely used in various software and system. Each key-value pair is a `record` in a `Map` and we can look up a `value` by query with a `key`. Michelson fully supports the `Map` itself and all the handy operation functions.

When the dataset is large, however, the required gas of accessing data in a `Map` is very expensive. Because a `Map` is considered as one whole structure and thus will be loaded entirely into system from the storage. This is inefficient when we need to manipulate a big dataset. No matter it's big in terms of "total data stored" or "*the number of records*".

Therefore, the `BigMap` is implemented in Michelson. It is a specialized implementation of `Map` that can be loaded partially. To be more precise, only the requested data will be loaded into system. This is extremely useful when we need to store a lots of data in a contract.

Notice that, the bigmap is not a silver bullet. There are still a few restrictions. For example, because the comparison of two bimaps requires fully loading of all the records, comparing a bigmap with another is disallowed in Michelson. Similarily, it's also disallowed to be `PUSH`ed or `PACK`ed. Additionally there are also few things that cannot be stored within a bigmap, such as the `bigmap` itself, `contract`, `operation` and `sapling_state`. More details can be found in the Michelson document and the language reference.

‍

What we can learn from the existing bigmaps

We gathered data starting from the protocol Hangzhou: from the level 1916929 (Dec 4, 2021) to the level 2147760 (Feb 25, 2022). Overall, there are more than 230000 blocks and include 62745 non-empty bigmaps.

The very first observation is, almost 2/3 bigmaps have never modified its records after they got inserted. This immediately gives us a strong impression of how bigmaps are used currently.

How's the records

A fair question one would want to ask is, given all the non-empty bigmaps, how many records do they have? To find out, we group all bigmaps based on the their records. It turns out, there is two extreme and yet interesting results.

There are over 40% bigmaps that has only 1 record. And inside all the single-record bigmap, 21301 of them, about 80%, has actually never changed after their creation. It would be a good idea to look into these cases and think about if it’s actually a proper use case of bigmap.

On the other hand, there is only a very few of all bigmaps, 0.67%, have a considerable amount of records. If we look closely and pick bigmaps with large than 100000 records, we can notice that they are all NFT related!

‍

owner organization	bigmap id (name)	records
hit et nunc	511 (ledger)	4994914
hit et nunc	6072 (swaps)	932116
hit et nunc	513 (operator)	844507
fx(hash)	22785 (ledger)	809982
hit et nunc	514 (token_metadata)	683132
hit et nunc	522 (royalties)	683132
fx(hash)	22788 (token_data)	489716
fx(hash)	22789 (token_metadata)	489716
fx(hash)	22782 (leder_gentk)	282139
fx(hash)	22787 (operators)	188117
PixelPotus	5627 (ledger)	184689
objkt	5909 (asks)	178203
fx(hash)	22799 (token_metadata)	173659
Versum	75550 (ledger)	144096
Kalamint	857 (ledger)	120968
sweet	95994 (token_metadata)	100000
sweet	95992 (ledger)	100000

‍

How are bigmaps used for

Given all the bigmaps we have so far, it’s natural to ask: what is the most common way of using bigmaps in the existing contracts?

There are plenty of creative ways of using bigmaps so here we focus only on some commonly seen cases. We selected some extreme bigmaps into two groups:

group 1: bigmaps who has great amount of records yet has never been updated
group 2: bigmaps who gets updated frequently

‍

group 1

For the first group, we picked bigmaps who has more than 10000 records yet has never been updated so far. There are only 29 bimaps and most of them are used as `token_metadata` (24, 82.3%). Others are used for storing some kind of history data where a new record will be added regularly. A similar observation can be made in the table in the above section.

group 2

For the second group, we picked top 30 bigmaps with the highest update rate in the whole given period. It turns out all the use cases in this group can be categorized as follows.

[1] For transfer allowance:

This is one of the most common pattern can been found in the existing contracts. In this use case, the bigmap is mostly named and known as `operators` and used for realizing the idea of transfer allowance described in FA1.2 and FA2.

Whenever a contract, `A`, permits another agent contract, `B`, to spend a certain amount, `n`, of `A`'s token. A new record of this allowance, `(A,B,n)`, will be added into this `operators` bigmap and, after the transfer is finished, this very record will be removed.

The way of using bigmap in this case makes an interesting observation: this kind of bigmap usually contains only a few records. It might grow fast if this token is popular. But, since a allowance will be removed later soon, this bigmap doesn't actually consume a big storage space.

[2] For storing real-time data feed:

Another common use case of bigmap is to use for storing real-time data feed such as stock price or even weather. This usually works as the on-chain part of an oracle.

In this case, a bigmap has extraordinary number of updates which is proportional to the lifetime of its contract. Yet, unlike contracts who keep entire history in bigmap, an this bigmap usually monitors only a fixed group of real world information. So the size of this kind of bigmap are also fixed, and hence the used size of storage.

[3] For describing ledger:

It's nature to use bigmap for implementing a ledger in contract. Especially for contracts who defines digital asserts like FA1.2 or FA2. Whenever an assert gets transferred, the ledger will be updated. So the popular an assert is, the more owner exists in its ledger, and thus the bigger this ledger is.

This is commonly seen in the scenario of NFT or DeFi. But in fact the situation might be different for them.

For the case of NFT, if it's multi-asset contract, there could be plenty of records. The bigmap could grow big and could be updated very often. On the other hand, for the single-asset contract, who is very commonly used for defining one NFT, since there is only fixed amount of owners, the size of the bigmap is not as large as one might expect, yet it may be updated often.

For DeFi, things are different. A bigmap may not be used for storing the ownership of things, instead, it stores the exchange rate between tokens. This rate will be updated whenever an exchange happens. It's a common case that can be found in exchange contract (aka DEX).

‍

Conclusion

The bigmap is very practical structure in Michelson and indeed it has been used here and there. Yet, it's not a magic potion. One still needs to pay the gas for using it. It is important that we can learn from others. This post tried to show a glimpse into what we can learn from what we have now. Hope it could help you on your smart contract development!

Moreover, thanks to the blockchain-compactible cache introduced in protocol Hangzhou, it's now possible to reduce the gas cost of accessing bigmap by caching the frequently-used cases. This is what Marigold is working on!

‍

If you want to know more about Marigold, please follow us on social media (Twitter, Reddit, Linkedin)!