Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For the Map Reduce specifically the one of the big issues was the speed at which you could read data from a HDD and transfer across the network. The MapReduce paper benchmarks were done with computers with 160 GB HDDs (so 3x smaller than typical NVMe SSD today) which had sequential read of maybe 40MB/s (100x smaller than a NVMe Drive today!) and random reads of <1MB/s (also very much smaller than a NVMe Drive today).

On the other hand they had 2GHz Xeon CPUs!

Table 1 in the paper suggests that average read throughput per worker for typical jobs was around 1MB/s.



Maybe more like 80MB/s? But yeah, good point, sequential reads were many times faster than random access, yet on a single disk the sequential transfer rate increases were still not keeping up with storage rate increases, nor CPU speed increases. MapReduce/Hadoop gave you a way to have lots of disks operating sequentially.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: