1- Hadoop : Hadoop is basically 2 things, a distributed file system (HDFS) which constitutes Hadoop's storage layer and a distributed computation framework(MapReduce) which constitutes the processing layer. You should go for Hadoop if your data is very huge and you have offline, batch processing kinda needs. Hadoop is not suitable for real time stuff. You setup a Hadoop cluster on a group of commodity machines connected together over a network(called as a cluster). You then store huge amounts of data into the HDFS and process this data by writing MapReduce programs(or jobs). Being distributed, HDFS is spread across all the machines in a cluster and MapReduce processes this scattered data locally by going to each machine, so that you don't have to relocate this gigantic amount of data.