Batch: Google Cloud Storage (GCS)

The Banyan data lake is built atop of Google Cloud Storage (GCS). Each Financial Services Provider brought onto the network will be provisioned their own GCS bucket(s) that only they will be allowed to access. The data within the bucket(s) will be encrypted at rest.

Folder Structure and File Naming Formats

Each bucket will contain 5 folders:

Folder Name	Description	FinX Access
Input	Where FinX partners place their transaction data	Read/Write
Output	Where Finx partners retrieve their enriched transaction data. Data will be written in Avro format.	Read
Error	If data anomalies are found, those specific records will be written to files and placed in the Error folder for examination.	Read
Historical	Once data is loaded into the folder, an automated ETL process is kicked off by Banyan and the data is moved into the historical folder. This data is not transformed in any way, it is an indicator that it has been processed.	None
Staging	Transient folder that is used during the ingestion process. This will only be present in the bucket if there is active ingestion happening and there are files placed in it. Can be ignored	None

File Naming Formats

Input
Input files should be chunked to < 2GB compressed or uncompressed. File names should have a date or date+ hour if you are uploading data more than once per day.

Example input file name:
AcmeBank-Avro-2022-07-12-19-19-49.avro

Error
Banyan wants to be transparent with any records that don't meet our validation requirements. We will write records to this folder once per day if any are rejected. We will also monitor the size of records written to this folder and reach out if there are any sudden spikes.