Bank run

Even if one's bank offers a CSV, parsing statements and allotting individual expenses can still be a pain for shared accounts. One hacky method follows. Uses BSD (MacOS) bash tools and xsv. Adjust to taste.

Assumed input

A tab separated stream of plaintext with three columns:

  1. TEXT: Entity name
  2. NUMBER: Debit
  3. NUMBER: Credit (for refunds)

Sum columns

Much like a GROUP BY statement in SQL, it's best to accumulate all charges from the same entity into one row. I find awk ugly, but it does make this easy. This script is doubly nasty because it must inject a bash argument into an awk command. Since the argument contains literal column references like "$2", it cannot be passed in with -v; it must be manually interpolated.

#!/bin/sh

expr="$1"

awk -F '\t' "{sum[\$1] += $expr} END {for (key in sum) print key \"\\t\" sum[key]}"

I saved this as /usr/local/bin/sum-columns and did chmod +x. This command will be used twice, which is why it's dynamic at the expense of readability.

Select relevant columns

Assumes your CSV has headers. You probably don't want more than name, debit, and credit. I spit them headerless to stdout, separated by tabs. The delimiter swap may not be necessary but this seems safer to me.

xsv select "name,debit,credit" $file | xsv fmt -t $'\t' | tail -n +2

Example.

$ xsv select "name,debit,credit" transactions.csv | xsv fmt -t $'\t' | tail -n +2 | head -n 3
AAA                145.99
BBB        -68.42
AAA        -5.00

It's not immediately clear, but the first line is a refund, and lives in column $3, while the other two are expenses, and live in $2.

Grouping expenses

Just sum $2 and $3. Since debits are negative and credits are positive, it's all good. awk also knows to treat empty columns as 0.

$ xsv select "name,debit,credit" transactions.csv | xsv fmt -t $'\t' | tail -n +2 | sum-columns '$2 + $3' | sort
AAA 140.99
BBB -8

You can see the literal awk expression injected as a raw string.

Now you're left with two columns: an entity name and a monetary amount. Both AAA rows have been added together for a new total. This is already nice enough to do some manual calculations on, but I prefer a more interactive workflow.

Tagging expenses

This two column stream must be turned back into a three column one, with a new first column containing an arbitrary string identifying the owner of the expense. Something like

j   AAA 140.99
n   BBB -8

This will allow one more round of sum-columns to say who owes what. You could just do this tagging in a text editor or Excel or whatever, but I find working with tabs to be error prone, and I obviously don't bother with Microsoft products. I've instead written another script that reads a line of stdin, stops for a user input tag, and then prints the tagged output. If you get anything wrong, you can then re-open the output in a text editor.

#!/bin/bash

while IFS= read -r r; do
    read -p "$r: " i </dev/tty
    echo "$i    $r"
done

I saved this one as tag-transactions. The pipe now looks something like

$ xsv select "name,debit,credit" transactions.csv | xsv fmt -t $'\t' | tail -n +2 | sum-columns '$2 + $3' | sort | tag-transactions

You can output this to a file with > to double check everything, but if you just wanted a breakdown by tag, it'd be piped into another sum-columns.

$ xsv select "name,debit,credit" transactions.csv | xsv fmt -t $'\t' | tail -n +2 | sum-columns '$2 + $3' | sort | tag-transactions | sum-columns '$3'
j       -983.31
n       -1803.15
b       -1101.03

These tags are arbitrary, so they don't have to be tied to a specific person. One could represent expenses that are split 50-50, another for something that was 80-20, etc.

I'm sure there's a subtle risk somewhere in this pipeline, but it's handy enough for personal accounting.