HDInsight

  • Similar to Azure Data Lake Analytics

  • Open Source, which is free and community supported

  • Includes Apache Hadoop, Spark and Kafka

  • Fully managed service with open-source analytics capabilities on cloud Hadoop cluster in Azure

  • HDInsight allows you to use HDFS, YARN, and MapReduce to analyze batch data

  • HDInsight has multiple cluster types like Spark, Kafka, Storm, HBase, etc. supporting multiple programming languages

Advantages over Hadoop

  • Low Cost
  • Automated Cluster Creation
  • Managed Hardware and Configuration
  • Simplified Version Management
  • Security and Compliance Integration with other Azure Services
    • BLOB Storage
    • Cosmos DB
    • Azure Data Factory, etc.

Features of HDInsight

  • Easy to Spin Clusters
  • Reduced Costs
  • Enterprise-grade security
  • Optimized Components

Open-source Components

Create Clusters with Open-source Libraries:

  • Spark
  • Hadoop
  • Storm
  • Kafka, etc.

Clusters come with default components like:

  • Avro
  • YARN
  • Zookeeper
  • Tez
  • Oozie
  • Sqoop, etc.

References