Ledger import of Deutsche Bank data

Disclaimer: The main idea of this post might only be interesting for ten other people in the entire world but the general notion of keeping a double-entry booking journal might also appeal to you. Note though that I am not a finance expert, so your mileage may vary.

Double-entry bookkeeping with ledger

For a very long time, I have not taken particular interest in my finances and my main strategy has always been spending less than what I earned. This has worked quite well, however, with regular income and a large variety of expenses such as insurances, retirement funds, daily things etc., I got overwhelmed a bit and felt losing control. To get rid of this uneasy feeling, I had to record whatever I am gaining or spending. As a layperson one has two options to record this: keep running totals in a spreadsheet – also known as single-entry bookkeeping – or use a double-entry bookkeeping software to model the flow of money between different accounts representing income, expenses, assets, liabilities and equities. Although the former is probably easier to grasp at the beginning, you will run into limitations such as difficult error detection soon enough. As a CLI person, I was thus soon attracted to the ledger-like family of double-entry bookkeeping systems.

The idea of [ledger][] is simple: you record transactions between accounts in a simple journal text file and then let ledger report balances. Unlike GUI programs such as GnuCash, ledger never touches your data. This separation of concerns eases editing (e.g. Vim), revision control (e.g. Git) and greatly improves interoperability. The latter is also the main reason for the many different ledger clones of which I want to highlight [hledger][]. Contrary to ledger that is written in C++, hledger is based on Haskell and is in my experience a lot more stable. To get more information about accounting with ledger-likes, please refer to the very extensive documentation of both ledger and hledger.

Data import with hledger

Now, here’s the actual piece covering the title of this post. I have a checking account with Deutsche Bank1 that provides export of all transactions in CSV format. As with probably all corporate data formats, the CSV transaction format has many quirks and pecularities that need to be fixed before appending everything to a ledger journal. I used to use a homegrown Python script to fix up the CSV format and output ledger format, however hledger provides a fantastic csv command that already does the last step perfectly fine. So if you are in the same boat and want to import Deutsche Bank data into your ledger journal here is what you need to do:

tail -n +6 $1 | head -n -1 | \
iconv -f ISO8859-1 -t utf8 | \
sed -E 's/([0-9]+)\.([0-9]+),([0-9]+)/\1\2\.\3/' | \
sed -E 's/([0-9]+),([0-9]+)/\1\.\2/' | \
sed -E 's/"//g' | \
sed -E 's/([A-Z][a-z]+), ([A-Z][a-z]+)/\2 \1/' | \
sed -E 's/,/./g' | \
sed -E 's/;/,/g' | \
hledger -f - --rules-file=db.csv.rules print | \
sed -E '/^$/d' | \
sed -E 's/(.*) ; (.*)/\1\n    ; \2/' | \
sed -E 's/^(20[0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]) (.*)/\1  \2/' | \
sed -E 's/ *$//'

Let’s go through this line-by-line in case you want to add your own adaptions. First we skip the funny six line header and remove the totaling sum at the bottom. Then we need to convert the encoding from ISO 8859 to UTF-8, otherwise hledger would give up reading the data. In the next lines, we convert the amount format from German 1234.00,50 EUR to 1234.50 EUR, remove quotes, replace “Doe, John” with “John Doe”, commas with dots and semicolons with commas. Then we push the cleaned data to hledger which uses a db.csv.rules file to interpret the CSV file and determine account transformations based on some heuristics. Here is the shortened db.csv.rules file that should get you running:

account1 Assets:Checking
account2 ReplaceMe

fields date,,,description,type,,,,,,,,,,out,in,
date-format %d.%m.%Y

comment %type
amount %in%out EUR

# ... patterns to determine accounts

Because I like to specify amounts as “1234 EUR” rather than “EUR1234”, I have to construct them manually with the amount statement. After letting hledger do its job the shell script takes over for final post-processing, including removal of empty lines between transactions2, date formatting and trailing white space removal. Quite a bit of upfront work, but tremendously helpful for day-to-day operations.


Yes, yes, I know … 2: They just look plain wrong with vim-ledger’s collapsed folds. [ledger]: http://ledger-cli.org/ [hledger]: http://hledger.org