Learning Big Data
Hive(SQL), Pig(Script)
Data Ingestion:
Sqoop: transfer data between hadoop and db
--REST API
Flume: Event-based data collection and processing
http://ift.tt/142ovvQ
Agent, source, sink, channel
transport data into Hadoop or HBase
Oozie: work flow engine
Point-to-point workflow
Fan-out workflow: fork
Capture-and-decide Workflow: decision, switch
Frequency scheduling
dataset
Impala: Heavy use of Memory, low-latency query
Sqoop: efficiently transfer bulk data between Hadoop and db
http://ift.tt/1J48Tze
Kafka
HBase
Streaming: Samza, Storm and Spark(time window)
Samza(milliseconds),near real-time processing
Trident
exactly once,
micro-batching: streams are handled as batches of tuples
File Formats:
Extract, Load, Transform
Extract data from sources such as ERP or CRM applications
Transform that data into a common format that fits other data
in the warehouse
Load the data into the data warehouse for analysis
Hive(SQL), Pig(Script)
Data Ingestion:
Sqoop: transfer data between hadoop and db
--REST API
Flume: Event-based data collection and processing
http://ift.tt/142ovvQ
Agent, source, sink, channel
transport data into Hadoop or HBase
Oozie: work flow engine
Point-to-point workflow
Fan-out workflow: fork
Capture-and-decide Workflow: decision, switch
Frequency scheduling
dataset
Impala: Heavy use of Memory, low-latency query
Sqoop: efficiently transfer bulk data between Hadoop and db
http://ift.tt/1J48Tze
Kafka
HBase
Streaming: Samza, Storm and Spark(time window)
Samza(milliseconds),near real-time processing
Trident
exactly once,
micro-batching: streams are handled as batches of tuples
File Formats:
Extract, Load, Transform
Extract data from sources such as ERP or CRM applications
Transform that data into a common format that fits other data
in the warehouse
Load the data into the data warehouse for analysis
from Public RSS-Feed of Jeffery yuan. Created with the PIXELMECHANICS 'GPlusRSS-Webtool' at http://gplusrss.com http://ift.tt/1F9GDFO
via LifeLong Community
No comments:
Post a Comment