Data transformation is the heart
Data transformation is the second step after data retrieval. It’s about about data translation, categorization, merging and statistics. For most people not the most sexy part of Data Alchemy. But I do consider it the heart of Data Alchemy. It’s about pumping around the data from the the source to the destination. Without data transformation no new insights.
Overview of tools
- Google Refine: is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases(note: application was named Freebase Gridworks before Freebase got bought by Google).
- TalenD: open source data integration, data quality and master data management solutions.
- Spreadsheets: common spreadsheet applications like Microsoft Excel or Open Office Spreadsheets.
- R: a language and environment for statistical computing and graphics.