NFT Commands
============

NFT commands are for the purpose of crawling blockchains to store data in a database.

.. warning::

    Running the `crawl` or `tail` commands more than once for the same block will cause
    inaccuracies in the NFT data. The `load` command is the only NFT command which is
    idempotent.

Data Version
------------

Data version is used by the NFT commands in the Block Crawl to ensure data accuracy
during data loads. Data is processed by both the `crawl` and `tail` commands in an
asynchronous manner and data version is one manner in which data accuracy is ensured.


Common Options
--------------

:--blockchain: The blockchain being processed. It must be one of the accepted values
    of "ethereum-mainnet" or "polygon-mainnet". This value is for identifying the
    blockchain in the database.

:--evm-rpc-nodes: This is a repeatable option for specifying a websocket RPC API
    endpoint and the number of simultaneous connections to make. For example,
    the value of `wss://some.rpc.node 25` will operate with 25 connections to the
    `wss://some.rpc.node` RPC API endpoint. If you use multiple providers or multiple
    accounts with the same provider, include this option multiple times to fully
    utilize available resources.

:--rpc-requests-per-second: The maximum number of requests per second per RPC node. This
    is not maximum across all nodes.

:--dynamodb-endpoint-url: This is an override of the endpoint the AWS client will use
    to connect to the DynamoDB service. It is normally used only for non-standard
    endpoints such as locally hosted DynamoDB for testing. Normally the location of the
    DynamoDB database would be determined from the AWS default region as part of the
    AWS config.

:--dynamodb-timeout:  Maximum time in seconds to wait for connect or response from
    DynamoDB. This does not need to be altered ordinarily. The Block Crawler greatly
    reduces the connect and response timeout values which are defaulted to 5 minutes by
    the AWS client to identy connectivity issues in a timely manner. If you experience
    timeout issues with DynamoDB, increasing the value with this option may improve
    your experience.

:---dynamodb-table-prefix: The table prefix for DynamoDB tables for data managed by the
    Block Crawler. DynamoDB table names are global and often prefixed for multiple
    environments or to use in access restriction.

:--log-file: This is the location of a file to which the Block Crawler should log. By
    default, STDOUT is the only logging location.

:--debug: Enable debug level logging. This will be an extremely verbose level of
    logging. It is not meant for long-term use in production systems.

AWS Configuration
-----------------

Many of the commands utilize DynamoDb and require AWS credentials setup. Instructions
for configuring the credentials can be found in the `Boto3 documentation`_.

Common Log Output
-----------------

Many of the commands write logs which follow a standard as follows;

Items with two values separated by a `/` are a count and an average time is milliseconds
to execute such as `123/45.67` represents a count of 123 with an average processing time
of 45.67 milliseconds.

:Blocks: The start and end of the current block chunk being processed.

:Conn: Connection statistics

       :C: Connections all time - This includes initial connections plus any reconnects

       :X: RPC clients reconnected

       :R: Connection resets from the endpoint

:RPC: RPC Request Statistics

       :S: Requests sent

       :D: Requests delayed due to request limitations or
            delays resulting from too many request results returned by the endpoint

       :T: Number of "too many request" results returned by the endpoint

       :R: Responses received

:Write: Data write statistics

       :D: Delayed - the number of writes delayed due to request limitations from the
            database

       :C: Collection records written

       :T: Token records written

       :TU: Token records updated

       :X: Token Transfer records written

       :O: Owner records written

       :OU: Owner records updated


Output lines example:

.. line-block::

    2023-02-15 21:18:24,975 Blocks [12,400,001:12,401,000] -- Conn [C:46 X:28 R:0] RPC [S:923,930 D:24,817 T:0 R:923,908/445] -- Write [D:3,228 C:112/153 T:83,748/7 X:252,576/10 O:151,185/9]
    2023-02-15 21:27:45,802 Total Time: 8:29:19.63 -- Blocks 11,900,001 to 12,400,000 at block height 16,378,614

Load
----

The `load` command will load NFT data up to a declared block height by processing each
collection as its creation is discovered while traversing the blockchain in reverse
order. The specific block height is necessary to ensure each collection's data is
accurate to the same block height at which time the `crawl` and `tail` commands can
traverse any remaining blocks to bring the NFT data up to dat with the current block
height. Processing blocks in reverse order is necessary It was created to reduce the time and number of RPC requests necessary to load NFT data from large blockchains.


Arguments
+++++++++

:STARTING_BLOCK: The lowest block number you wish to process in this run of the `load`
    command.

:ENDING_BLOCK: The highest block number you wish to process in this run of the `load`
    command.

:BLOCK_HEIGHT: The block height chosen for this data load process. This value should be
    consistent if the `load` command is interrupted and re-run. The command loads log
    entries for the collection from the creation of the collection to the block height
    value. As such, it must be consistent for the duration of a data load to ensure all
    collections are accurate to the same block height and the `crawl` or `tail` command
    can reliably continue after that block.

Options
+++++++

:--increment-data-version: Incrementing the data version should only occur for the
    initial execution of the `load` command for loading data.


:--block-chunk-size: The number of blocks to process at one time. Restricting the
    number of blocks processed simultaneously provides two benefits. First, it limits
    the computing resources utilized for attempting to process large quantities of
    blocks. Second, it allows for a graceful stop at a known break point should it
    be necessary to stop the command. The command will wait until all blocks in the
    block chunk are fully processed before exiting to end in a known state in which
    there is no risk of processing the same block twice.

:--dynamodb-parallel-batches: THe number of DynamoDB parallel batch writes to perform
    simultaneously. In order to maximize performance, you want to keep batches as full
    as possible. Tuning this value can improve data write performance accordingly.

:--block-time-cache-filename: Location and filename for the block time cache. The block
    time cache is critical for reducing RPC calls to get block times. As the `load`
    command traverses the blockchain in reverse order, it stores the block time for each
    block it processes. To ensure any stoppage of the command does not lose the stored
    block times, it will store it is a CSV formatted file. It will then load the data
    from the file when it starts the next time. This persistence of the block times
    is critical to reduce the number of RPC calls to get the block time as the command
    must retrieve the block time from the block chain if it cannot find it in its own
    memory.

    .. warning::

        Running multiple versions of the `load` command will require separate block time
        cache filenames lest they overwrite each other's data.


Crawl
-----

The `crawl` command will crawl each block of a blockchain in ascending order for NFT data.
It process data in chunks of blocks. It discovers new collections, token transfers,
token updates, and owner updates by processing data contained within blocks. It is
faster than the `tail` command but much slower and uses considerably more RPC requests
than load. The command is meant to be used after a `load` command and before a `tail`
command to reduce the number of blocks the the `tail` command will  have to process.


Arguments
+++++++++

:STARTING_BLOCK: The block at which the crawl begins

:ENDING_BLOCK: The block at which the crawl ends

Options
+++++++

:--increment-data-version: Incrementing the data version should only occur in a
    scenario in which the `crawl` command will be used to re-load data in place
    over a previous data load from the origin block.

    .. note::
        Due to the time and resources necessary to initiate a data load via `crawl`,
        it is highly encouraged that you use the `load` command to initiate any data
        load.

:--block-chunk-size: The number of blocks to process at one time. Restricting the
    number of blocks processed simultaneously provides two benefits. First, it limits
    the computing resources utilized for attempting to process large quantities of
    blocks. Second, it allows for a graceful stop at a known break point should it
    be necessary to stop the command. The command will wait until all blocks in the
    block chunk are fully processed before exiting to end in a known state in which
    there is no risk of processing the same block twice.

Tail
----

The `tail` command will continuously check for new blocks and process them in the same
manner as the `crawl` command. The main differences between `crawl` and `tail` are the
tail process one block at a time and persists the last block it has processed. The first
time you attempt to run the `tail` command, it requires having run hte `seed` command
to record the last block processed from wch the `tail` command will continue forward.
Another differentiator for this command will run until it is interrupted. It is meant
to be run as a service to keep the database up to date with the latest changes from the
blockchain.

Arguments
+++++++++

There are no arguments for the command

Options
+++++++

:--trail-blocks: The number of blocks to trail behind the last block. This option
    exists for two reasons, nodes can be ad different stages of completion in with
    regard to the latest block. One node can be completed and list it as the latest
    block while another may not have completed and either error or return partial
    data. It's common to see nodes return a block with no transaction hashes when
    retrieving the incomplete blocks. The second is dealing with reorgs caused by
    blockchain forking. Staying far enough behind any reorg is important until the
    tail command is advanced enough to back out the results of reorganized blocks.

:--process-interval: How often to check for new blocks. The command is currently based
    on polling for the current block of the blockchain to identify new blocks need to
    be processed. To reduce unnecessary process and cost from checking the block height,
    the command will not perform two subsequent checks in less than the interval
    specified. If processing the latest blocks exceeds the interval, it will not wait to
    check again and do so immediately after processing the last block it knows.


Seed
----

The seed command sets the last block processed in the database utilized by the `tail`
command to identify its starting point when processing.

Arguments
+++++++++

:LAST_BLOCK_ID: The last block processed by one of the other commands.

Verify
------

Verify that the collection data stored in the database matches the data in the
blockchain.

Arguments
+++++++++

:COLLECTION_ID: The collection ID to verify

:BLOCK_HEIGHT: The block height at which to verify. Blockchain data is constantly being
    updated. As such, it can only be verified at a specific block height.

.. _Boto3 documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration