Batch: Google Cloud Storage (GCS)
The Banyan data lake is built atop of Google Cloud Storage (GCS). Each Financial Services Provider brought onto the network will be provisioned their own GCS bucket(s) that only they will be allowed to access. The data within the bucket(s) will be encrypted at rest.
Folder Structure and File Naming Formats
Each bucket will contain 5 folders:
|Folder Name||Description||FinX Access|
|Input||Where FinX partners place their transaction data||Read/Write|
|Output||Where Finx partners retrieve their enriched transaction data. Data will be written in Avro format.||Read|
|Error||If data anomalies are found, those specific records will be written to files and placed in the Error folder for examination.||Read|
|Historical||Once data is loaded into the folder, an automated ETL process is kicked off by Banyan and the data is moved into the historical folder. This data is not transformed in any way, it is an indicator that it has been processed.||None|
|Staging||Transient folder that is used during the ingestion process. This will only be present in the bucket if there is active ingestion happening and there are files placed in it. Can be ignored||None|
File Naming Formats
Input files should be chunked to < 2GB compressed or uncompressed. File names should have a date or date+ hour if you are uploading data more than once per day.
Example input file name:
Banyan wants to be transparent with any records that don't meet our validation requirements. We will write records to this folder once per day if any are rejected. We will also monitor the size of records written to this folder and reach out if there are any sudden spikes.
Updated 10 months ago