Data Cleansing / Integration
He lists five available technologies for smarter data preparation work:
- Data Tamer, which focuses on integration and is still being developed at MIT.
- Open Refine, formerly Google Refine, which helps with clean-up.
- Data Wrangler, a cleaning and transformation tool developed by Stanford.
- Reshape2 packages, which let you restructure and aggregate data.
- Plyr, which uses a split-apply-combine strategy for R.
http://www.cloveretl.com/ - CloverETL® is data integration platform scaling from open source desktop to a commercial cloud cluster.
It's a Java-based open platform that helps design, automate, and monitor data integration processes.