Disclaimer: The main idea of this post might only be interesting for ten other people in the entire world but the general notion of keeping a double-entry booking journal might also appeal to you. Note though that I am not a finance expert, so your mileage may vary.
Double-entry bookkeeping with ledger
For a very long time, I have not taken particular interest in my finances and my main strategy has always been spending less than what I earned. This has worked quite well, however, with regular income and a large variety of expenses such as insurances, retirement funds, daily things etc., I got overwhelmed a bit and felt losing control. To get rid of this uneasy feeling, I had to record whatever I am gaining or spending. As a layperson one has two options to record this: keep running totals in a spreadsheet – also known as single-entry bookkeeping – or use a double-entry bookkeeping software to model the flow of money between different accounts representing income, expenses, assets, liabilities and equities. Although the former is probably easier to grasp at the beginning, you will run into limitations such as difficult error detection soon enough. As a CLI person, I was thus soon attracted to the ledger-like family of double-entry bookkeeping systems.
The idea of ledger is simple: you record transactions between accounts in a simple journal text file and then let ledger report balances. Unlike GUI programs such as GnuCash, ledger never touches your data. This separation of concerns eases editing (e.g. Vim), revision control (e.g. Git) and greatly improves interoperability. The latter is also the main reason for the many different ledger clones of which I want to highlight hledger. Contrary to ledger that is written in C++, hledger is based on Haskell and is in my experience a lot more stable. To get more information about accounting with ledger-likes, please refer to the very extensive documentation of both ledger and hledger.
Data import with hledger
Now, here’s the actual piece covering the title of this post. I have a checking
account with Deutsche Bank1 that provides export of all transactions in CSV
format. As with probably all corporate data formats, the CSV transaction format
has many quirks and pecularities that need to be fixed before appending
everything to a ledger journal. I used to use a homegrown Python script to fix
up the CSV format and output ledger format, however hledger provides a fantastic
csv command that already does the last step perfectly fine. So if you are in
the same boat and want to import Deutsche Bank data into your ledger journal
here is what you need to do:
tail -n +6 $1 | head -n -1 | \ iconv -f ISO8859-1 -t utf8 | \ sed -E 's/([0-9]+)\.([0-9]+),([0-9]+)/\1\2\.\3/' | \ sed -E 's/([0-9]+),([0-9]+)/\1\.\2/' | \ sed -E 's/"//g' | \ sed -E 's/([A-Z][a-z]+), ([A-Z][a-z]+)/\2 \1/' | \ sed -E 's/,/./g' | \ sed -E 's/;/,/g' | \ hledger -f - --rules-file=db.csv.rules print | \ sed -E '/^$/d' | \ sed -E 's/(.*) ; (.*)/\1\n ; \2/' | \ sed -E 's/^(20[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]) (.*)/\1 \2/' | \ sed -E 's/ *$//'
Let’s go through this line-by-line in case you want to add your own adaptions.
First we skip the funny six line header and remove the totaling sum at the
bottom. Then we need to convert the encoding from ISO 8859 to UTF-8, otherwise
hledger would give up reading the data. In the next lines, we convert the amount
format from German 1234.00,50 EUR to 1234.50 EUR, remove quotes, replace “Doe,
John” with “John Doe”, commas with dots and semicolons with commas. Then we push
the cleaned data to hledger which uses a
db.csv.rules file to interpret the
CSV file and determine account transformations based on some heuristics. Here
is the shortened
db.csv.rules file that should get you running:
account1 Assets:Checking account2 ReplaceMe fields date,,,description,type,,,,,,,,,,out,in, date-format %d.%m.%Y comment %type amount %in%out EUR # ... patterns to determine accounts
Because I like to specify amounts as “1234 EUR” rather than “EUR1234”, I have to
construct them manually with the
amount statement. After letting hledger do
its job the shell script takes over for final post-processing, including removal
of empty lines between transactions2, date formatting and trailing white
space removal. Quite a bit of upfront work, but tremendously helpful for