Data Onboarding Guide

Participating in the Banyan interchange allows financial service providers to access SKU level resolution on their clients' transactions. This will provide a better customer experience across online banking, fraud detection, budgeting, and many other use cases.

Integration Methods

Banyan supports both batch and API methods for sending transaction data to be enriched by the Banyan matching algorithm. If the transaction sent corresponds to an in-network merchant there will be a subsequent match record to retrieve. How you retrieve the match records and the receipt item information is covered in *this section of our guide

API Integration
Using the standard RESTful practices, transactions can be sent into Banyan one by one. We plan to support parallelization of ingestion in the near future. Explanations of authentication, payload, responses and error codes can be found here

Batch Integration

Batch Push

Daily batches are the most common cadence for sending data to Banyan. Ensuring that there are no gaps in the data and that we receive all transactions is paramount to a more frequent push of data. A common practice is to have a cutoff time for a day's transactions. For example, a daily file that gets uploaded at 11pm UTC on June 1, 2021 will have transactions through 5pm UTC (the cutoff time) on June 1, 2021. Subsequently, the June 2, 2021 file will contain all transactions from June 1st 5:00:01pm UTC - June 2nd 5:00pm UTC.

Our pipeline does not reject data based on date of transaction. If there are interruptions in your data feed to Banyan, simply send all missed data alongside your current day's data. It will be processed without issue.

Banyan's data lake is built on Google Cloud Platform using Google Cloud Storage. Each client will have a dedicated bucket that will be provisioned for read and write access using Google Service Accounts and Roles. If your data is already staged in a different environment such as Amazon S3, we will take care of porting it over to the Banyan GCS data lake.

Banyan Preferred Method: GCS

During data onboarding process we will provide you with a GCP service account, which will have the permissions necessary to push data into the bucket. The bucket contains a folder "input" in which batch files can be uploaded. Errors we encounter will be placed into an "errors" folder in the bucket, which is viewable by the provided service account. We'll work with you to resolve any encountered errors in the data.

Batch Pull

We can also pull data from you in batch, this can be done via FTP/SFTP, or direct GCS or S3 bucket access. Our preference is for data to be pushed to us given that a partner (you) will have greater visibility into when data is ready/complete.

Preferred Data Formats

Parquet (compressed with snappy)
Avro
TSV/CSV (compressed with gzip/lzo)
JSON

Our preference is that files in these formats not exceed a size of 5GB uncompressed. If this is unacceptable or difficult please talk to your onboarding partner to determine the best resolution.


Did this page help you?