Thanks to John O’Gorman for converting the NAMA PDF of foreclosed properties to spreadsheet format. The spreadsheet is of the latest NAMA foreclosure list published last week and has been uploaded to Google docs here. The spreadsheet format should allow you to sort the data and make more sense out of what NAMA is doing.
What do we learn? Out of the 1,054 properties that have been foreclosed, 668 are currently not for sale. We see that accountancy firms that did very well indeed out of the boom, are also doing very well during the bust with KPMG involved in 133 receiverships, PwC in 108 and Ernst and Young with 34. We can see that of the 1,054 properties foreclosed 341 are multiple units, that might indicate two apartments or an estate of 100 homes, so we have difficulty seeing how many properties NAMA now controls via receivers. We can see NAMA has appointed receivers to 33 farms (or more accurately, property termed “agricultural land”) and 24 hotels, a staggering 52 pubs and 73 retail properties of which 34 are “multiple units”. Here are four selected analyses from the data, which is only possible by being able to sort the data.
Firstly, we have the type of property and whether the property is a single or multiple unit.
Secondly the location of the property – this will be updated later when the Northern Ireland data is more properly cleaned.
Thirdly the receivers appointed to the property.
And lastly the sales agents.
It is hoped that future foreclosure lists from NAMA will be converted to spreadsheet data as the lists are published. NAMA had said it had intended publishing the data in manipulable format seven months ago.
UPDATE: 7th February, 2012. The data in the spreadsheet has been “cleaned” further, so that Northern Ireland entries no longer have the asset description appended to the country. The update version of the spreadsheet is here. And this is an extract of the data.
Good work guys. if you wish i can get cracking with some streetview links too.
@2pack, the work is all John O’Gorman’s though there was some checking (and nit-picking!) at this end. The approach is presently to convert the entire NAMA foreclosure list because there will be additions AND deletions each month. It would be terrific to have streetviews, not sure how you would deal with property whose most granular address is “Listowel” though! Also NAMA doesn’t provide precise addresses as we recently saw with the occupation of the Great Strand Street property – no building no and indeed NAMA got its Dublin postal number wrong. But having said all of that, a streetview link might help in a lot of cases. Perhaps you can send me an email 2pack and if John is amenable the two of you might talk – there’s nothing precious about the ownership of the data at this end!
There are ways to do this, but in my experience it takes a bit of trial and error.
http://www.economicsnetwork.ac.uk/tips/pdf2excel.htm
Hi KOR,
The link you post is generally useful but not specifically in this case. The normal way to take data out of a pdf in tabular format is use the copy table option on the context click of the table. Its a few minutes work.
However, the table in the nama document is not marked as a table by the authors with the appropriate tag and as such the copy table option is not avaliable from the right click.
By not marking the table as a table it also removes other tools from the equation such as the useful PDF2Table utility which can, as the name suggests, convert from a pdf table to a strutured html table.
So the result is that a copy select gets converted into a continous column of data, with each cell of data and each line within that cell getting its own line in the document you copy to. Thus the data becomes a continous stream and since the number of columns is variable on each row the table is not recoverable from this process.
In addition the authors have chosen to create blank cells (essentially gaps in the table) for some for some fields and to use empty cells (table still exists just no data) for other gaps in the data.
When combined with the fact that some of the cells are merged cells and not really seperate (e.g. When a GB address 3 is over a certain length it merges with the country column) extracting the data becomes “fun”.
On the scale of 1 to 10 this document is a 9 for recovering structured data from.
One can also use Softi Free OCR. Eats PDFs. Especially the grubby FOI type that was deliberately printed and then scanned
http://softi-freeocr.en.softonic.com/