Data Flow
For BeanHub Import to import transactions, you need to know that it's not easy because CSV files or other types of transaction files come in different shapes and formats. To overcome the problem, we built an open-source library, beanhub-extract. The job of beanhub-extract is simple. It reads all kinds of CSV and transaction files and then provides a standardized transaction data structure for BeanHub Import to consume.
Currently, we only support a handful of financial institutions. The Plaid CSV file format is the most important for our customers' use case because it covers all the major financial institutions through BeanHub's Connect or Direct Connect features. It allows BeanHub users to automatically pull transaction CSV files from banks in a standard format. Therefore, we cover all the major banks Plaid supports by supporting the Plaid CSV file format.
Despite BeanHub's Connect and Direct Connect features being only for our paid users, it doesn't necessarily mean we don't support formats other than Plaid. On the contrary, we hope the BeanHub open-source ecosystem will also strive to be useful for non-BeanHub users. We will try our best to add support in different formats. You can let us know what format you would like us to support by opening a GitHub issue. You can also open a pull request for the format you want to support.
Here are the currently available fields in the Transaction
data structure beanhub-extract provides:
extractor
- name of the extractorfile
- the filename of import sourcelineno
- the entry line number of the source filereversed_lineno
- the entry line number of the source file in reverse order. comes handy for CSV files in desc datetime ordertransaction_id
- the unique id of the transactiondate
- date of the transactionpost_date
- date when the transaction postedtimestamp
- timestamp of the transactiontimezone
- timezone of the transaction, needs to be one of timezone value supported by pytzdesc
- description of the transactionbank_desc
- description of the transaction provided by the bankamount
- transaction amountcurrency
- ISO 4217 currency symbolcategory
- category of the transaction, like Entertainment, Shopping, etc..subcategory
- subcategory of the transaction, like Entertainment, Shopping, etc..pending
- pending status of the transactionstatus
- status of the transactiontype
- type of the transaction, such as Sale, Return, Debit, etcsource_account
- Source account of the transactiondest_account
- destination account of the transactionnote
- note or memo for the transactionreference
- Reference valuepayee
- Payee of the transactiongl_code
- General Ledger Codename_on_card
- Name on the credit/debit cardlast_four_digits
- Last 4 digits of credit/debit cardextra
- All the columns not handled and put intoTransaction
's attributes by the extractor goes here
Import flow
Now, with beanhub-extract, we can easily extract transaction data from different sources as a standard data structure. Next, it would be the job of beanhub-import to look at those transactions provided by beanhub-extract and see what rules they match, then generate corresponding Beancount transactions for you. Unlike most Beancount or other plaintext accounting importing tools, beanhub-import not only generates the transactions for you but is also smart enough to look at your existing Beancount transactions and update them for you. Here's how it works:
Step 1. Match input CSV files
Input rules are defined as shown in this example:
inputs:
- match: "import-data/mercury/*.csv"
config:
extractor: mercury
default_file: "books/{{ date.year }}.bean"
prepend_postings:
- account: Assets:Bank:US:Mercury
amount:
number: "{{ amount }}"
currency: "{{ currency | default('USD', true) }}"
First, we must find all the matched CSV files based on the rule.
For example, the import-data/mercury/*.csv
rule will match files such as
- import-data/mercury/2023.csv
- import-data/mercury/2024.csv
- import-data/mercury/2025.csv
If no extractor
provided, we will detect the extractor automatically.
Now that we know which CSV files to extract transactions from, the next step is to use beanhub-extract to do so.
Step 2. Match transactions
We will go through all the matching rules defined in the imports
section like this:
imports:
- name: Gusto fees
match:
extractor:
equals: "mercury"
desc: GUSTO
actions:
# ...
- name: DoorDash
match:
extractor:
equals: "mercury"
desc: DoorDash
actions:
# ...
Each transaction from the input step will flow through all of these match statement and see if it matches.
Step 3. Perform actions
If there is a match, corresponding actions, usually adding a transaction, will be performed. If there's multiple matches, we only perform actions in the first one.
imports:
- name: Gusto fees
match:
extractor:
equals: "mercury"
desc: GUSTO
actions:
# This action will be performed
- txn:
narration: "Gusto subscription fee"
postings:
- account: Expenses:Office:Supplies:SoftwareAsService
amount:
number: "{{ -amount }}"
currency: "{{ currency | default('USD', true) }}"
# by default, add_txn action type will be used if not provided, so we can omit here
# type: add_txn
The matched CSV transaction attributes will be provided as the values to render the Jinja2 template of the Beancount transaction.
If it's a add_txn
action, and the input file has prepend_postings
or append_postings
, we will combine the postings into the generated transactions.
Step 4. Collect existing Beancount transactions
To avoid generating duplicate transactions in the Beancount file, we need to traverse the Beancount folder and find all the existing transactions that were previously imported.
Step 5. Compute change sets and apply changes
Now, with the generated transactions from the import rules and the existing Beancount transactions we previously inserted into Beancount files, we can compare and compute the required changes to make it up-to-date.
Finally, with the change sets generated from the previous step, we use our beancount-parser to parse the existing Beancount files as syntax trees, transform them accordingly, and then write them back with our beancount-black formatter.
And that's it! Now, all the imported transactions are up-to-date.