NFT Commands ============ NFT commands are for the purpose of crawling blockchains to store data in a database. .. warning:: Running the `crawl` or `tail` commands more than once for the same block will cause inaccuracies in the NFT data. The `load` command is the only NFT command which is idempotent. Data Version ------------ Data version is used by the NFT commands in the Block Crawl to ensure data accuracy during data loads. Data is processed by both the `crawl` and `tail` commands in an asynchronous manner and data version is one manner in which data accuracy is ensured. Common Options -------------- :--blockchain: The blockchain being processed. It must be one of the accepted values of "ethereum-mainnet" or "polygon-mainnet". This value is for identifying the blockchain in the database. :--evm-rpc-nodes: This is a repeatable option for specifying a websocket RPC API endpoint and the number of simultaneous connections to make. For example, the value of `wss://some.rpc.node 25` will operate with 25 connections to the `wss://some.rpc.node` RPC API endpoint. If you use multiple providers or multiple accounts with the same provider, include this option multiple times to fully utilize available resources. :--rpc-requests-per-second: The maximum number of requests per second per RPC node. This is not maximum across all nodes. :--dynamodb-endpoint-url: This is an override of the endpoint the AWS client will use to connect to the DynamoDB service. It is normally used only for non-standard endpoints such as locally hosted DynamoDB for testing. Normally the location of the DynamoDB database would be determined from the AWS default region as part of the AWS config. :--dynamodb-timeout: Maximum time in seconds to wait for connect or response from DynamoDB. This does not need to be altered ordinarily. The Block Crawler greatly reduces the connect and response timeout values which are defaulted to 5 minutes by the AWS client to identy connectivity issues in a timely manner. If you experience timeout issues with DynamoDB, increasing the value with this option may improve your experience. :---dynamodb-table-prefix: The table prefix for DynamoDB tables for data managed by the Block Crawler. DynamoDB table names are global and often prefixed for multiple environments or to use in access restriction. :--log-file: This is the location of a file to which the Block Crawler should log. By default, STDOUT is the only logging location. :--debug: Enable debug level logging. This will be an extremely verbose level of logging. It is not meant for long-term use in production systems. AWS Configuration ----------------- Many of the commands utilize DynamoDb and require AWS credentials setup. Instructions for configuring the credentials can be found in the `Boto3 documentation`_. Common Log Output ----------------- Many of the commands write logs which follow a standard as follows; Items with two values separated by a `/` are a count and an average time is milliseconds to execute such as `123/45.67` represents a count of 123 with an average processing time of 45.67 milliseconds. :Blocks: The start and end of the current block chunk being processed. :Conn: Connection statistics :C: Connections all time - This includes initial connections plus any reconnects :X: RPC clients reconnected :R: Connection resets from the endpoint :RPC: RPC Request Statistics :S: Requests sent :D: Requests delayed due to request limitations or delays resulting from too many request results returned by the endpoint :T: Number of "too many request" results returned by the endpoint :R: Responses received :Write: Data write statistics :D: Delayed - the number of writes delayed due to request limitations from the database :C: Collection records written :T: Token records written :TU: Token records updated :X: Token Transfer records written :O: Owner records written :OU: Owner records updated Output lines example: .. line-block:: 2023-02-15 21:18:24,975 Blocks [12,400,001:12,401,000] -- Conn [C:46 X:28 R:0] RPC [S:923,930 D:24,817 T:0 R:923,908/445] -- Write [D:3,228 C:112/153 T:83,748/7 X:252,576/10 O:151,185/9] 2023-02-15 21:27:45,802 Total Time: 8:29:19.63 -- Blocks 11,900,001 to 12,400,000 at block height 16,378,614 Load ---- The `load` command will load NFT data up to a declared block height by processing each collection as its creation is discovered while traversing the blockchain in reverse order. The specific block height is necessary to ensure each collection's data is accurate to the same block height at which time the `crawl` and `tail` commands can traverse any remaining blocks to bring the NFT data up to dat with the current block height. Processing blocks in reverse order is necessary It was created to reduce the time and number of RPC requests necessary to load NFT data from large blockchains. Arguments +++++++++ :STARTING_BLOCK: The lowest block number you wish to process in this run of the `load` command. :ENDING_BLOCK: The highest block number you wish to process in this run of the `load` command. :BLOCK_HEIGHT: The block height chosen for this data load process. This value should be consistent if the `load` command is interrupted and re-run. The command loads log entries for the collection from the creation of the collection to the block height value. As such, it must be consistent for the duration of a data load to ensure all collections are accurate to the same block height and the `crawl` or `tail` command can reliably continue after that block. Options +++++++ :--increment-data-version: Incrementing the data version should only occur for the initial execution of the `load` command for loading data. :--block-chunk-size: The number of blocks to process at one time. Restricting the number of blocks processed simultaneously provides two benefits. First, it limits the computing resources utilized for attempting to process large quantities of blocks. Second, it allows for a graceful stop at a known break point should it be necessary to stop the command. The command will wait until all blocks in the block chunk are fully processed before exiting to end in a known state in which there is no risk of processing the same block twice. :--dynamodb-parallel-batches: THe number of DynamoDB parallel batch writes to perform simultaneously. In order to maximize performance, you want to keep batches as full as possible. Tuning this value can improve data write performance accordingly. :--block-time-cache-filename: Location and filename for the block time cache. The block time cache is critical for reducing RPC calls to get block times. As the `load` command traverses the blockchain in reverse order, it stores the block time for each block it processes. To ensure any stoppage of the command does not lose the stored block times, it will store it is a CSV formatted file. It will then load the data from the file when it starts the next time. This persistence of the block times is critical to reduce the number of RPC calls to get the block time as the command must retrieve the block time from the block chain if it cannot find it in its own memory. .. warning:: Running multiple versions of the `load` command will require separate block time cache filenames lest they overwrite each other's data. Crawl ----- The `crawl` command will crawl each block of a blockchain in ascending order for NFT data. It process data in chunks of blocks. It discovers new collections, token transfers, token updates, and owner updates by processing data contained within blocks. It is faster than the `tail` command but much slower and uses considerably more RPC requests than load. The command is meant to be used after a `load` command and before a `tail` command to reduce the number of blocks the the `tail` command will have to process. Arguments +++++++++ :STARTING_BLOCK: The block at which the crawl begins :ENDING_BLOCK: The block at which the crawl ends Options +++++++ :--increment-data-version: Incrementing the data version should only occur in a scenario in which the `crawl` command will be used to re-load data in place over a previous data load from the origin block. .. note:: Due to the time and resources necessary to initiate a data load via `crawl`, it is highly encouraged that you use the `load` command to initiate any data load. :--block-chunk-size: The number of blocks to process at one time. Restricting the number of blocks processed simultaneously provides two benefits. First, it limits the computing resources utilized for attempting to process large quantities of blocks. Second, it allows for a graceful stop at a known break point should it be necessary to stop the command. The command will wait until all blocks in the block chunk are fully processed before exiting to end in a known state in which there is no risk of processing the same block twice. Tail ---- The `tail` command will continuously check for new blocks and process them in the same manner as the `crawl` command. The main differences between `crawl` and `tail` are the tail process one block at a time and persists the last block it has processed. The first time you attempt to run the `tail` command, it requires having run hte `seed` command to record the last block processed from wch the `tail` command will continue forward. Another differentiator for this command will run until it is interrupted. It is meant to be run as a service to keep the database up to date with the latest changes from the blockchain. Arguments +++++++++ There are no arguments for the command Options +++++++ :--trail-blocks: The number of blocks to trail behind the last block. This option exists for two reasons, nodes can be ad different stages of completion in with regard to the latest block. One node can be completed and list it as the latest block while another may not have completed and either error or return partial data. It's common to see nodes return a block with no transaction hashes when retrieving the incomplete blocks. The second is dealing with reorgs caused by blockchain forking. Staying far enough behind any reorg is important until the tail command is advanced enough to back out the results of reorganized blocks. :--process-interval: How often to check for new blocks. The command is currently based on polling for the current block of the blockchain to identify new blocks need to be processed. To reduce unnecessary process and cost from checking the block height, the command will not perform two subsequent checks in less than the interval specified. If processing the latest blocks exceeds the interval, it will not wait to check again and do so immediately after processing the last block it knows. Seed ---- The seed command sets the last block processed in the database utilized by the `tail` command to identify its starting point when processing. Arguments +++++++++ :LAST_BLOCK_ID: The last block processed by one of the other commands. Verify ------ Verify that the collection data stored in the database matches the data in the blockchain. Arguments +++++++++ :COLLECTION_ID: The collection ID to verify :BLOCK_HEIGHT: The block height at which to verify. Blockchain data is constantly being updated. As such, it can only be verified at a specific block height. .. _Boto3 documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration