Data Cleansing / Integration

He lists five available technologies for smarter data preparation work:

    • Data Tamer, which focuses on integration and is still being developed at MIT.
    • Open Refine, formerly Google Refine, which helps with clean-up.
    • Data Wrangler, a cleaning and transformation tool developed by Stanford.
    • Reshape2 packages, which let you restructure and aggregate data.
    • Plyr, which uses a split-apply-combine strategy for R.

SyncSort DMX-h (Both ETL and SORT) - CloverETL® is data integration platform scaling from open source desktop to a commercial cloud cluster.

It's a Java-based open platform that helps design, automate, and monitor data integration processes.