http://ift.tt/1hkADB0

Please Visit: http://ift.tt/1ajReyV



Hadoop Join

http://ift.tt/1drbDce

http://ift.tt/11mLshv

It requires the data inputs to be partitioned and sorted in the same way.

A given key has to be in the same partition in each dataset so that all partitions that can

hold a certain key are joined together. For this to work, all datasets should be partitioned

using the same partitioner and moreover, the number of partitions in each dataset should

be identical.



The sort order of the data in each dataset must be identical. This requires that all datasets

must be sorted using the same comparator.



Having the relevant partitions as its input

and before calling the map function, each map task evaluates the join. The latter is conducted

in-memory, yields no I/O cost and the results are presented to the map function.



http://ift.tt/1hkBwJP



from Google Plus RSS Feed for 101157854606139706613 http://ift.tt/1drbDce

via LifeLong Community

No comments:

Post a Comment