Learning Big Data Hive(SQL), Pig(Script) Data Ingestion: Sqoop: transfer data between hadoop and db ...

Please Visit: http://ift.tt/1ajReyV

Learning Big Data
Hive(SQL), Pig(Script)
Data Ingestion:
Sqoop: transfer data between hadoop and db
--REST API
Flume: Event-based data collection and processing
http://ift.tt/142ovvQ
Agent, source, sink, channel
transport data into Hadoop or HBase

Oozie: work flow engine
Point-to-point workflow
Fan-out workflow: fork
Capture-and-decide Workflow: decision, switch
Frequency scheduling
dataset

Impala: Heavy use of Memory, low-latency query

Sqoop: efficiently transfer bulk data between Hadoop and db
http://ift.tt/1J48Tze

Kafka
HBase
Streaming: Samza, Storm and Spark(time window)
Samza(milliseconds),near real-time processing 

Trident
exactly once, 
micro-batching: streams are handled as batches of tuples

File Formats:

Extract, Load, Transform
Extract data from sources such as ERP or CRM applications
Transform that data into a common format that fits other data
in the warehouse
Load the data into the data warehouse for analysis


from Public RSS-Feed of Jeffery yuan. Created with the PIXELMECHANICS 'GPlusRSS-Webtool' at http://gplusrss.com http://ift.tt/1F9GDFO
via LifeLong Community

No comments:

Post a Comment