The web offers many interesting documents in Adobe pdf format. Quite often the documents do contain nice tables of data. To visualize this data the way I like to, I first need to extract the data from the pdf document and store it in a processable format. For example Excel or comma separated values (csv). Here is an example of how I did this.
The wrong way… at least for big tables
The document I use contains one big table divided over more than 100 pages. My first thought was to manually copy and paste the data from the table. But by using select all, copy and paste the data to an Open Office Spreadsheet (or Excel) I lost all formatting of the table. No option. Same problem by using Adobe reader and the menu option File – Save as Text. Of course for a small table this method works fine. In this case you can quickly reformat the data manually.
How I succeeded for big tables
In two steps I managed to get all data (about 2000 rows) in one spreadsheet. First I did use this pdf to xls conversion tool on the web. The result: an Excel document with 118 sheets, one sheet for every page of the pdf document. Of course I did not feel like spending my time on the boring job of repeating copy and paste for 118 times. I got this Excel Visual Basic macro from the web and was able to combine the all the sheets automatically into one sheet.
In both steps I encountered a minor problem. The pdf conversion service works with a huge delay. It took many hours before I did receive my email with the result in the attachment. The Excel visual basic macro I took from the web did not work perfectly at once. Problem seemed to be the active cell on each of the individual sheets. After selecting all sheets together and making A1 the active cell on all of them, the macro did work excellent.