What is Azure Data Factory
Azure data factories are composed of the following components:
- Linked services: Connectors to the various storage and compute services. For example, we can have a pipeline that will use the following artifacts:
- HDInsight cluster on demand: Access to the HDInsight compute service to run a Hive script that uses HDFS external storage
- Azure Blob storage/SQL Azure: As the Hive job runs, this will retrieve the data from Azure and copy it to an SQL Azure database
- Datasets: There are layers for the data used in pipelines. A dataset uses a linked service.
- Pipeline: The pipeline is the link between all datasets. It contains activities that initiate data movements and transformations. It is the engine of the factory; without pipelines, nothing will move in the factory.