Subscribe to our blog to get the latest articles straight to your inbox.

The popularity of Bitcoin, Ethereum, and alt-coins have led to an explosion of demand in Blockchain related work. There are use cases in real estate, supply chains, antiques, art, etc. Yet despite their rise in popularity, it is difficult to find examples of applications that successfully incorporate data from a blockchain online other than blockchain explorers.

 

It’s Different

Consuming data from a blockchain is different than other data sources web and mobile developers are accustomed to. Data has to be retrieved via Remote Procedure Calls (RPC) to a blockchain node, or server, that the developer must set up and maintain. This is very different from the RESTful APIs and managed databases developers generally work with.

In addition to being different from broadly-adopted technologies, blockchain interfaces and implementations vary quite a bit. For example, Bitcoin and Ethereum work on entirely different models, and their data is consumed differently. Anyone familiar with PostgreSQL, SQL Server, and Oracle will realize they are all databases. Oftentimes, you can even execute the same query on all three systems, but their internal workings are different. They optimize queries differently, they have nuances in their query languages and they have unique plugins. Much like relational databases, blockchains have a very similar set of differences. While Bitcoin and Ethereum store their data in a distributed ledger, data access and structure are vastly different.

Data Structure

Bitcoin does not have the idea of an account, sender, receiver, balances, etc. as it uses the Unspent Transaction Output (UTXO) model. This means each bitcoin transaction contains an array of inputs and an array of outputs. The input of any bitcoin transaction must be the signed output of another transaction. While this is thoroughly confusing, it is best explained by example.

Alice needs to pay Bob for renting his car and Bob has agreed to receive 0.1 Bitcoin as payment. Both Alice and Bob have a private/public encryption key pair that allows them to create transactions in Bitcoin. In order for Alice to send Bob Bitcoin, she needs to be able to successfully sign the output of another Bitcoin transaction to pay Bob. Dan has previously paid Alice 0.25 Bitcoin and she has not signed this output. Alice uses her private key to sign the transaction output from Dan in order to pay Bob. The outputs of this new transaction with Bob are 0.1 Bitcoin that is now signable by Bob and 0.15 Bitcoin that is signable by Alice. Think of the 0.15 Bitcoin that is signable by Alice as “change” from the transaction.

In comparison, Ethereum implements the Account model which does have the idea of accounts, balances, senders, receivers, etc. To think of this another way the Bitcoin and Ethereum blockchains are different in the way that relational and NoSQL databases are different. Both are distributed ledgers with vastly different implementations and interfaces.

Remote Procedure Calls

Remote Procedure Calls (RPC) are used for executing code that is not a part of your program. Before SOAP and REST APIs, RPC was the go-to for retrieving data from another program or service. Similar to an RDBMS, you will have to run the blockchain on its own server and communicate with its RPC API. This can quickly clutter up your application infrastructure as you will need to maintain more servers or virtual machines to serve data for your application.

Standing Up The Chain

When bringing a blockchain node online for the first time, the entire chain—from the Genesis block until the latest block—must be retrieved. At the time of our project, downloading the entire chain took around half an hour. If for some reason there is a catastrophic failure, all of the block data could be lost and will have to be retrieved again.

What Documentation?

If the blockchain you need data from is forked from another chain, there is a high probability there won’t be much documentation available. For working with a blockchain like QTUM, a blockchain that forked Bitcoin, you have to rely mostly on the Bitcoin documentation.

The Use Case

Recently we were tasked with building an investment management platform for a cryptocurrency mining service. Clients would send funds to the investment address and would receive a portion of mining rewards. The system needed to accept investments from multiple addresses for one client, show them a balance of their entire investment, allocate a portion of mining rewards to their account, and graph the balance growth over time. We needed to initially support a web interface and later support a native mobile application with wallet integration for easier investment. As always, the service needed the ability to scale to demand and be highly available.

The Solution

Given the infrastructure requirements, we immediately reached for AWS. In order to serve data to the web interface, a Serverless API written in Python on Lambda seemed like the right choice given our past experience and expertise. To keep track of app information such as user accounts and addresses we used RDS to spin up a PostgreSQL database. Since we were building an API, building a React SPA deployed to S3 with a CloudFront distribution was also an easy choice. The biggest unknown was the blockchain.

Deploying the Blockchain Nodes

Given that we needed the blockchain nodes to have high availability and needed them on a server we knew that we were going to need EC2. To achieve high availability and scalability, we started with 2 EC2 instances behind a load balancer with an auto scaling group. However, whenever a new instance would come online it would need to sync the entire chain. Therefore block data was regularly stored in an EBS-backed AMI so any new nodes could start with a relatively recent amount of data and sync only the blocks that were missing.

In addition to deploying the nodes, we needed to keep the configuration consistent. To accomplish this we used Packer to create an AMI for us and deployed it to the EC2 instances.

Script All the Things

Now that we knew the AWS services we were going to use, we needed a quick, easy, and repeatable way to stand up and update the infrastructure. It is entirely possible to set up everything in the AWS console and with enough documentation that can be repeatable, but it is by no means quick or easy. AWS has developed CloudFormation so that an entire infrastructure can be built as code and more easily updated and maintained. For setting everything up using CloudFormation we chose to use Stacker as there are many blueprints for the types of services we needed and it would ultimately be easier to manage than a lengthy JSON file.

For deploying our Lambda functions we used Serverless as it is well documented and quickly configured. To execute our database migrations, we wrote a tiny Lambda function that we could hit to execute them for us.

With everything scripted we were able to update our infrastructure, code, and database as part of our CI process when pull requests were merged into master.

It Looked Like What?

architecture

Everything was deployed to AWS in one region with in two availability zones for high availability.

Lessons Learned

While Lambda is a powerful tool, it can be a pain for local development. While you can run your handlers and their tests locally, there aren’t great options for running the entire stack locally due to the dependence on API Gateway. If you are used to the workflow that Elixir, Ruby, or NodeJS provide with hot reloading and quick feedback cycles it can definitely slow you down.

A side effect of this is that debugging is also difficult since you can’t set a breakpoint and step through the code one line at a time. You would think that you could get around this with verbose logging but the CloudWatch logs leave much to be desired as they are not available as quickly as you would hope and are much more difficult to read and search.

We worked around most of these limitations by having each developer deploy their own stack, which can be expensive if everyone needs their own database deployed as well.

Another important consideration is the Lambda cold starts. When a function is hit for the first time, or the first time in a long time, it will have to spin up. This can cause the first request to hang for a few seconds and cause users to think the request is failing. There are a lot of discussions online about this problem. Hitting the function periodically to keep it warm is a potential solution, but this felt messy and a bit like overkill for our use case.

Everything is Setup… Now What?

In order to retrieve data, you will need to connect through a proxy service. For our use case we used python-bitcoinrpc as the blockchain we needed data from was a fork of bitcore. The bitcore guide Run Bitcoin Standalone documents the methods you can use to request data. A simple request to get the transaction data for an address in Python would look like this:

from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
import json

# rpc_user and rpc_password are set in the bitcoin.conf file 
rpc = AuthServiceProxy("http://%s:%s@127.0.0.1:8332"%(rpc_user, rpc_password)) 
payload = {
  'addresses': ['INSERT ADDRESS HERE'],
  'start': STARTING BLOCK,
  'end': ENDING BLOCK,
}
deltas = rpc.getaddressdeltas(json.dumps(payload))

print(deltas)

The output would look something like the following:

[
  {
    "satoshis": 1,
    "txid": "TRANSACTION ID",
    "index": 2,
    "blockindex": 236,
    "height": 220151,
    "address": "INSERT ADDRESS HERE"
  },
  {
    "satoshis": 30000,
    "txid": "TRANSACTION ID",
    "index": 0,
    "blockindex": 165,
    "height": 228208,
    "address": "INSERT ADDRESS HERE"
  }
]

Summary

The keys to consuming blockchain data in your web and mobile apps are understanding the chain you are working with and getting the infrastructure in place. Once that is done retrieving data and performing calculations are tasks that you already perform in your everyday development.