How to Build Data Integrity into Blockchain Applications

Blockchain and distributed ledger technology have the promise of providing data integrity, but there are issues to address to make blockchain a practical and effective approach.

data integrity blockchain

You would never buy a house or a car without a clear and confirmed title – why would you let data into your critical business processes that hasn’t been certified as unaltered? Having accurate, unaltered data is key to nearly every business process.

Unfortunately, data tampering is a rapidly emerging threat. One study predicts that, by 2020, 50 percent of organizations will have suffered damage caused by fraudulent data and software. A recent CNBC story points to the increasing importance of data integrity, in the wake of recent issues with an altered video spreading misinformation, criminal cases overturned and product quality falsified — to name a few.

To ensure data integrity, teams have traditionally turned to the use of WORM (Write once, read many) storage, to a data certification service (like a digital notary or “neutral” third party), or to sophisticated access control systems. Organizations are now beginning to explore the use of blockchain or other distributed ledger technologies because these technologies are generally considered tamper-proof.

However, when using blockchains, teams often first assume they must write all data into blockchains to take advantage of their tamper-proof nature. Unfortunately, writing all data to a blockchain is prohibitively expensive and slow. Also, there may be reasons why one would not want to make all data visible to all parties on the blockchain, or why one might want to delete or alter data for a noble purpose (such as correcting an IoT output error).

These are some of the reasons that Deloitte recently recommended to “never store personal identification information or any other sensitive data on the blockchain; instead, consider storing a hash of encrypted data on the chain and the encrypted data either in a co-located, off-chain database or an existing source system.”

So, What Are the Options for Following This Advice?

  1. Store a hash of data directly onto a public (unpermissioned) blockchain such as Bitcoin or Ethereum. There are two disadvantages to this approach: Financial risk and cost. If coding directly, you need to not only establish and maintain a wallet but also to whether the volatility of cryptocurrency and take on the risk of having your wallet stolen or lost. In addition, the cost of putting hashes on-chain can be substantial. For example, one recent study estimated that it would cost about $30 to publish one single transaction to the Bitcoin blockchain, let alone thousands or millions, which is a more typical number for data items.
  2. Store a hash of data onto a private (permissioned) blockchain. However, teams often do not realize that while public (unpermissioned) blockchains are virtually tamper-proof, private (permissioned) blockchains can be circumvented and their history potentially rewritten. There are also significant costs in money and staffing involved in setting up and maintaining nodes.
  3. Use a data anchoring service or software package to anchor data into blockchains. Many of these services allow developers to make calls using standard languages like Java, Go, or Python.

What Is the Process for Using a Data Anchoring Service or Anchoring Data Directly into a Blockchain?

Best practices indicate that you first need to hash the digital asset (such as a performed transaction or file), using an algorithm such as SHA2 or SHA3. There are a variety of freely available hashing utilities to help you accomplish this.

Once you have created the hash, you then can submit that hash via the service’s API.

The time to anchor a transaction in a blockchain generally relies on a few different parameters, including the “heartbeat” of a particular blockchain. The heartbeat refers to the amount of time to generate a full block; for example, Bitcoin’s heartbeat is usually around 10 minutes, while Ethereum’s heartbeat is usually around 15 seconds. The processing time can also be partially determined by your spend, which affects the priority of your transaction. For example, in Ethereum, “gas” is fixed for a constant-time operation, but the amount a user pays per gas — gas price — is dynamic and dictated by market conditions. When users send transactions, they specify the gas price in Gwei/Gas, and the total fee paid is equal to gas_price times gas_used. Miners are paid this fee and prioritize transactions with a higher gas price. The higher the price you are willing to pay, the faster your transaction will be processed.

Because of this added complexity, some anchoring services manage wallets, payments, and price fluctuations for you, while other services or software require you to maintain your own wallets.

For many applications, registration in a blockchain within 12-24 hours is sufficient, because most use cases involve sealing data either to prove at a later date to a third party (such as government investigator or judge) that the data has remained unaltered or as an insurance policy against future data manipulation.

Once a blockchain confirms a particular transaction, a seal or other proof is returned to the customer for each original hash. At any later date, customers can verify the integrity of their data directly by querying the blockchains to which their seals were written. Some services also support submitting the anchor to one or more blockchains, as desired by the customer, in which case you generally would receive one seal or record for each blockchain transaction.

How Can a Third-Party Validate Data Integrity?

For a third party to then validate that the data has remained unaltered, there are usually two approaches. To validate seals directly using the original anchoring service, you can usually access a verification API provided by the service in your application to allow you or any third party to check current hashes of your data against previously registered hashes or verify a previously issued seal.

Also, if you are using an anchoring service, it should also provide you with a way to validate your transactions directly with the blockchains in question, independently from the service, only involving you and the blockchain. This allows you to use any public blockchain explorer to validate the integrity of your data or documents.

The accelerating level of in-place data hacking and manipulation has made it difficult for organizations to trust their data, making it inherently “toxic” for their uses. Blockchain and distributed ledger technology have the promise of providing a much higher level of authenticity and integrity of data. While storing data directly in blockchains has appeal because of the perceived immutable nature of that technology, public blockchains have significant cost and data privacy issues while private blockchains may have compromised security and reliability. “Anchoring” data in blockchains by storing hashes there, while maintaining the data in a secure location appears to provide the best solution to these impediments.  

Catherine Woneis
Catherine Woneis is the head of product management at Cryptowerk, which provides enterprise-grade Data Integrity solutions that make it easy for organizations in multiple industries to securely exchange information and rapidly create and deploy blockchain-enabled applications. Catherine specializes in startups that are centered on making leading-edge technologies useful for the enterprise. Her work spans innovations in AI, machine learning, natural language processing, knowledge management and smart workflows.