Use SHA-256 hash of transaction header as transactionID

Dear community,

With this email, I would like to kickoff the discussion for a LIP regarding the roadmap objective “Replace transaction ID system”.

The IDs for transactions in Lisk, namely transactionIDs, should be computed in a way such that collisions have a negligible probability. Although a collision does not pose a security risk, it is inconvenient as it corrupts their intended usage: being a unique identifier for a transaction.

The length of the currently used transactionIDs, 64 bits, does not provide a negligible probability considering the potential number of transactions that could get included in the Lisk blockchain in the next years. The expected value for the number of transactions after which there is a collision of transactionIDs is ~2^32. The proposed change regarding the block size (LIP 0002) will allow up to 128 transactions per block (assuming the current size of transaction objects). If every block contains 128 transactions, a collision is expected after approximately 11 years. Hence, 64 bits is obviously too short.

We propose to use a length of 256 bits. Assuming that each block contains 128 transaction, the probability of having a collision within the next 50 years is below 10^(-56). Moreover, we propose to use SHA3-256 as the hash function in order to use the same hash function family as proposed for the blockIDs and the network identifier (see LIP 0009).

I’m looking forward to your responses and feedback.

Andreas

···
-- Andreas Kendziorra
Cryptographer, Lightcurve

andreas.kendziorra@lightcurve.io
www.lightcurve.io

We recently concluded that it is better to use only one hash function within the Lisk protocol. Therefore, we will use SHA-256 in all cases including for transactionIDs (a similar argument was already mentioned by @Simon_Warta some time ago in a different conversation).

Here is now a complete LIP for the transactionID format that uses SHA-256 instead of SHA3-256:

LIP: <LIP number>
Title: Use full SHA-256 hash of transaction as transactionID
Author: Andreas Kendziorra <andreas.kendziorra@lightcurve.io>
Discussions-To: https://research.lisk.io/t/use-sha3-256-hash-of-transaction-header-as-transactionid/
Type: Standards Track
Created: <YYYY-MM-DD>
Updated: <YYYY-MM-DD>

Abstract

This LIP proposes to use the full SHA-256 hash of a transaction as the transactionID to avoid potential collisions of transactionIDs.

Copyright

This LIP is licensed under the Creative Commons Zero 1.0 Universal.

Motivation

The length of the currently used transactionIDs is rather short considering the potential number of transactions that could get included in the Lisk blockchain in the next years. The probability of having collisions in the upcoming years is therefore non negligible. A collision would result in a rejected transaction due to the uniqueness requirement for transactionIDs. Moreover, it would disallow to uniquely identify a transaction from the set of all transactions (transactions included in the blockchain and rejected transactions). For this reason, we propose to use a length that implies a negligible probability of collisions.

Rationale

Length

The currently used transactionIDs have a length of 64 bits. The expected value for the number of transactions after which there is a collision of transactionIDs is ~232. The proposed changes regarding the block size will allow up to 128 transactions per block (assuming the current size of transaction objects). If every block contains 128 transactions, a collision is expected after approximately 11 years.

We propose to use a length of 256 bits. This reduces the probability for a transactionID collision sufficiently. Assuming that each block contains 128 transactions, the probability of having a collision within the next 50 years is below 10-56 (the probability of a collision after n transactions can be approximated by 1-exp(-n2/(2*2256))).

Usage

On the protocol level, the new transactionID format shall be used for every transaction, including transactions that were included in the blockchain before this change is active. For the user interfaces, the old format should still be supported. That means, clients like Lisk Hub should be able to request transactions via http at least from some nodes with the old transactionID. These nodes need to maintain a mapping from the old transactionIDs to the new ones, e.g., as a database table.

Specification

Let transactionBytes denote the function that maps every transaction to the byte array, that is currently used as the input for SHA-256, from which the reversed first 8 bytes are used as the transactionID (we do not want to elaborate on how this byte array is generated, since the specification may change over time, e.g. when transaction properties are added or removed).

In the proposed protocol, the transactionID of a transaction tx is generated as follows:

transactionID = SHA-256(transactionBytes(tx)).

Usage of the New Format

The new format has to be applied to every transaction, including the transactions included in the blockchain in the past, and has to be used in every aspect of the protocol, i.e., for validating the transactionID uniqueness.

When transactionIDs are used in RPCs, then they must be represented as hexadecimal strings without the prefix “0x”. For non protocol related usages, hexadecimal strings are recommended as well (in some cases, the usage of the prefix “0x” may be advantageous, e.g., in user interfaces).

Backwards Compatibility

This proposal introduces a hard fork, whereby the causes depend on the implementation of LIP 0012:

  • If LIP 0012 is not implemented, then transaction JSON objects using the proposed transactionID format get rejected by nodes following the current protocol due to invalid transactionIDs.

  • If LIP 0012 is implemented, then from two transactions that have the same transactionID in the current format but different transactionIDs in the proposed format one is rejected by nodes following the current protocol whereas both are accepted by nodes following the proposed protocol.

Independent of LIP 0012, this change introduces the following incompatibilities:

  • RPCs that use transactionIDs lead to miscommunication between nodes. For example, the planned message postTransactionsAnnouncement in LIP 0004, in which a list of transactionIDs is announced to other nodes, is only of value for nodes using the same transactionID format. Nodes using a different format will ignore the provided transactionIDs as they appear invalid.
  • Functions of the Lisk Core API that use transactionIDs, like getTransactions, will not work as expected. For example, if a node using the current transactionID format is requested to provide a transaction for a given transactionID in the proposed format, it will not return the transaction even if the corresponding transaction is known to the node.

Appendix

Example

Consider the transaction represented by the following JSON:

defaultTransaction = {
  type: 0,
  fee: "10000000",
  amount: "9312934243",
  recipientId: "17243547555692708431L",
  timestamp: 79289378,
  asset: {},
  senderPublicKey: "0eb0a6d7b862dc35c856c02c47fde3b4f60f2f3571a888b9a8ca7540c6793243",
  senderId: "18278674964748191682L",
  signature: "2092abc5dd72d42b289f69ddfa85d0145d0bfc19a0415be4496c189e5fdd5eff02f57849f484192b7d34b1671c17e5c22ce76479b411cad83681132f53d7b309",
}

The corresponding byte array for this transaction (assuming the protocol implemented in Lisk SDK 2.0.0) is the following:

byteArray = '0x0022dcb9040eb0a6d7b862dc35c856c02c47fde3b4f60f2f3571a888b9a8ca7540c6793243ef4d6324449e824f6319182b020000002092abc5dd72d42b289f69ddfa85d0145d0bfc19a0415be4496c189e5fdd5eff02f57849f484192b7d34b1671c17e5c22ce76479b411cad83681132f53d7b309'

The transactionID is then

transactionID = SHA-256(byteArray) = '0xda63e78daf2096db8316a157a839c8b9a616d3ce6692cfe61d6d380a623a1902'

For comparison, the transactionID according to the current format is as follows:

currentID = '15822870279184933850'

I also change the title of the thread from “SHA3-256” to “SHA-256”.