Please Visit: http://ift.tt/1ajReyV
Hadoop Join
http://ift.tt/1drbDce
http://ift.tt/11mLshv
It requires the data inputs to be partitioned and sorted in the same way.
A given key has to be in the same partition in each dataset so that all partitions that can
hold a certain key are joined together. For this to work, all datasets should be partitioned
using the same partitioner and moreover, the number of partitions in each dataset should
be identical.
The sort order of the data in each dataset must be identical. This requires that all datasets
must be sorted using the same comparator.
Having the relevant partitions as its input
and before calling the map function, each map task evaluates the join. The latter is conducted
in-memory, yields no I/O cost and the results are presented to the map function.
http://ift.tt/1hkBwJP
from Google Plus RSS Feed for 101157854606139706613 http://ift.tt/1drbDce
via LifeLong Community
Hadoop Join
http://ift.tt/1drbDce
http://ift.tt/11mLshv
It requires the data inputs to be partitioned and sorted in the same way.
A given key has to be in the same partition in each dataset so that all partitions that can
hold a certain key are joined together. For this to work, all datasets should be partitioned
using the same partitioner and moreover, the number of partitions in each dataset should
be identical.
The sort order of the data in each dataset must be identical. This requires that all datasets
must be sorted using the same comparator.
Having the relevant partitions as its input
and before calling the map function, each map task evaluates the join. The latter is conducted
in-memory, yields no I/O cost and the results are presented to the map function.
http://ift.tt/1hkBwJP
from Google Plus RSS Feed for 101157854606139706613 http://ift.tt/1drbDce
via LifeLong Community
No comments:
Post a Comment