Data
Types:
- structured (goes into table)
- unstructured (loose files)
- semi-structured - no schema (log files, CSV, XML)
Batch Data
- Often easier to implement
- When efficient pre-processing is needed
- When combining multiple sources
vs. Streaming Data
- Near real-time processing
- Data velocity is high