Enriched Transaction Egress
Overview
Now that you have successfully sent Banyan your transaction data, we will produce enrichments (eTx) and store them within our cloud environments. Batches of enrichments can be shared with you by way of S3 or GCS, whichever is your preference. The S3 or GCS bucket will be a Banyan-owned bucket and we will provision you the appropriate access through IAM.
NOTE: If you are also sending us transactions via batch, we will be creating output directories within the bucket you already have access to.
Even though this is a batch egress process, there is no schedule on which data will be sent to your output folder. This is due to the enrichment of transactions being a streaming process. As soon as merchant data enters our platform, it is sent to our matching algorithm and saved with its most likely match. When that match is logged, it is also sent to any buckets that belong to those who have requested batch egress.
While merchants are typically sharing data with Banyan on a consistent cadence, it does vary merchant by merchant. Therefore, there isn't a time of day you should assume all data is finished publishing. We recommend pulling data from our cloud storage environment multiple times per day to take advantage of a lower latency integration.
Output Details
Format Data will be written in Avro format which means each file will have one OCF definition. In addition, Banyan will be sending the full enriched transaction schema for each record regardless of the vertical that transaction's merchant is in. For example, if the transaction was for a retail client like Costco,
Compatibility Data will have backwards transitive compatibility guarantees. This means the following will be true:
- Any schema from the latest file will be supported by earlier versions of the data.
- Adding new fields will mean that the field will be included in older output if requested with a
null
value. - If we ever had the need to release a breaking change, we would create a new "v2" path and would write to both outputs until you were able to switch to the new version.
File SizeData will be sent after a certain file size is hit or it has been a certain amount of time since data was last written to the /output path. Our current threshold is set to 10MB file or every 10 minutes. Scenarios below:
- Enrichments start streaming at 00:00 and accumulate through 00:45 resulting in a 5MB file. If no new enrichments get written for you until 00:55, you will see a 5MB file in your bucket.
- Enrichments start streaming at 00:00 and accumulate through 00:45 resulting in a 5MB file. Then, at 00:50 there are more enrichments that bring the total size to 15MB, a 10MB file will be present in the bucket with the other 5MB waiting 10 more minutes or for more enrichments to bring it to 10MB.
Heartbeat Files In order for you to know that Banyan's system is not down or in a degraded state, we have instituted a practice of sending schema-only heartbeat files throughout the day. This will happen at a minimum 4x per day (or up to 15x) depending on how often our services are restarted on that day. These files are easily recognizable as they will all be the same file size. Please ensure your process is aware of these files.
Duplicates Banyan does not have a uniqueness guarantee for our streaming to batch data. It is very common for streaming services to restart and resend data it has previously sent. We highly recommend building your integration with deduplication downstream. Duplicates will most likely not occur over a period of days, but will happen frequently within the same day.
Data Updates
There is a potential for a better receipt enrichment to be discovered after we have already matched your transaction. Banyan will only send an improvement to a previous match. If you find duplicates in your data, Banyan advises you to use the enrichment with the greatest match_ts. There are a few IDs within the enrichment that will afford you the opportunity to remove duplicates in your process:
- byn_transaction_id: This ID is assigned by Banyan using 3 inputted fields from the transaction itself: purchase_ts, partner_id, and amount. Even if we assign a new receipt to your transaction, this ID will remain the same.
- finx_transaction_id: This ID is the one provided by you to Banyan and is your unique ID for the transaction. This will also be included in the body of the enrichment and can be used for deduplication purposes.
- byn_etx_id: This ID is assigned by Banyan to every eTx generated including a new receipt for a previously matched transaction
- byn_match_id: This is similar to the byn_etx_id in that it will also be unique for each enrichment created.
"fields": [
{
"default": null,
"name": "byn_etx_id",
"type": [
"null",
"string"
]
},
{
"default": null,
"name": "byn_match_id",
"type": [
"null",
"string"
]
{ "name": "transaction",
"type": {
"fields": [
{
"name": "byn_transaction_id",
"type": "string"
},
{
"default": null,
"name": "finx_transaction_id",
"type": [
"null",
"string"
]
}
Billing
Banyan will be using the usage data delivered to your cloud storage bucket to calculate your monthly invoice. Banyan will only invoice for a transaction a single time even if data is sent more than once or updated with a new enrichment.
Contact Us!
When you decide that you would like to receive data from Banyan via batch, please reach out to your account representative with your choice of S3 or GCS. They will help set up the configuration on our side and reach back out with any keys or set up details you will need to get started. For more detailed information please see our sections on S3 or GCS delivery.
Updated over 1 year ago