HDInsight
-
Similar to Azure Data Lake Analytics
-
Open Source, which is free and community supported
-
Includes Apache Hadoop, Spark and Kafka
-
Fully managed service with open-source analytics capabilities on cloud Hadoop cluster in Azure
-
HDInsight allows you to use HDFS, YARN, and MapReduce to analyze batch data
-
HDInsight has multiple cluster types like Spark, Kafka, Storm, HBase, etc. supporting multiple programming languages
Advantages over Hadoop
- Low Cost
- Automated Cluster Creation
- Managed Hardware and Configuration
- Simplified Version Management
- Security and Compliance Integration with other Azure Services
- BLOB Storage
- Cosmos DB
- Azure Data Factory, etc.
Features of HDInsight
- Easy to Spin Clusters
- Reduced Costs
- Enterprise-grade security
- Optimized Components
Open-source Components
Create Clusters with Open-source Libraries:
- Spark
- Hadoop
- Storm
- Kafka, etc.
Clusters come with default components like:
- Avro
- YARN
- Zookeeper
- Tez
- Oozie
- Sqoop, etc.