Mirroring in Microsoft Fabric is Not Free
Author: Ryan Shiva
26 August, 2024
Mirror, Mirror on the Wall, Microsoft this feature isn’t free at all
At Xorbix, we are always committed to conducting thorough due diligence and research when it comes to the technologies we recommend and integrate for our clients. Our goal is to ensure that our clients are getting the best, most cost-effective solutions available on the market.
Recently, we took a closer look at Microsoft’s Fabric Mirroring for one of our integrations, and here is what we found.
Ingestion Methods for Transactional Data
Organizations store critical data, like inventory and sales, in transactional databases such as Azure SQL. There are three primary methods to ingest this data for analytics:
- Query Federation – allows you to query the transactional data without replication for exploratory analysis.
- Batch Ingestion – replicate the entire transactional database at once, often scheduled hourly or daily.
- Change Data Capture (CDC) – incrementally replicates transactional data as real-time changes occur.
Comparison of Ingestion Options
In the Azure ecosystem, Microsoft Fabric, Azure Databricks, and Azure Synapse Analytics address these ingestion options in varying capacities. This blog focuses on how Microsoft Fabric handles them.
Ingestion Type | Microsoft Fabric | Azure Synapse Analytics | Azure Databricks |
Query Federation | ✘ | ✘ | ✔ Lakehouse Federation |
Batch Ingestion | ✔ Fabric Data Factory – Subset of batch ingestion from ADF | ✔ Synapse Pipelines – Subset of batch ingestion from ADF | ✔ Lakeflow Connect |
Change Data Capture | ✔ Mirroring – Subset of ADF CDC No Private Link | ✔ Synapse Pipelines CDC 9 supported sources Private Link | ✔ Lakeflow Connect – Azure SQL CDC, Private Link |
Microsoft Fabric does not currently support query federation. Through Fabric Data Factory, Microsoft Fabric supports a subset of ADF’s batch ingestion, and through Fabric Mirroring, Microsoft Fabric supports a subset of ADF’s CDC capabilities.
Hidden Costs and Missing Features
Though Fabric Mirroring is marketed as innovative, it’s built on decades-old Change Data Capture (CDC) technology that began with IBM DB2 in the 1990s. Let’s separate the marketing hype from the reality of Fabric Mirroring and focus on the hidden costs and missing features.
The True Costs of Fabric Mirroring
Microsoft advertises Mirroring as “free”, but hidden costs from OneLake and the capacity model tell a different story. Here’s a breakdown of these costs:
Cost | |
Export/Backup | $ Costs incurred on source system from export or backup. |
Data Lake Write | ✔ No capacity units (CUs) consumed. |
Licensing Fees | $ Fabric capacity licensing fees are charged even when replication is not running. |
Storage at Rest | ~ Storage at rest charged after discount exhausted or capacity is paused. |
Merge/Deduplication | $ Not part of Fabric Mirroring. Additional complexity and cost (CUs) are required to reflect the source system. |
Data Access Fee | $ 3X storage transaction costs to access from other tools. |
The chart above highlights that Fabric Mirroring’s costs are anything but “free”.
Export/Backup: All CDC systems incur costs when querying or backing up the source (e.g., SQL Server, Cosmos DB). Fabric Mirroring is no exception – see here for Azure SQL, here for Cosmos DB, and here for Snowflake.
Data Lake Write: Typically, CDC systems charge for writing data to a data lake. Fabric doesn’t charge for this, encouraging data storage in OneLake—a move that locks your data into the Fabric ecosystem, making external access or export more costly.
Licensing Fee: Third-party CDC tools like Fivetran and Qlik charge licensing fees, but Azure Databricks and Azure Synapse Analytics do not. Microsoft Fabric, a SaaS tool, requires a licensing fee in the form of capacity units—fees you pay even when not using mirroring. Pause the capacity to save costs, and your mirrored data becomes inaccessible and your progress is lost.
Merge/Deduplication: While other CDC tools offer deduplication and merge capabilities, Fabric does not support merge. Therefore, multiple operations are needed for deduplication, which incurs additional costs.
Data Access Fee: While writing data to OneLake is free, Microsoft’s “open APIs” come with a 3X upcharge when accessing this data outside Fabric or migrating to another platform, locking you into their capacity fees. CDC tools using ADLS Gen2 avoid these surcharges, offering more flexibility.
Missing Features of Fabric Mirroring
Supported Connectors: Fabric Mirroring supports only three cloud sources—Azure SQL, Cosmos DB, and Snowflake—compared to nine sources in ADF CDC and even more in tools like Fivetran and Qlik.
Security and Governance: While ADF CDC, Azure Databricks Lakeflow Connect, and other CDC tools support Private Link to connect to your source systems through a private IP, Fabric Mirroring does not support Private Link at this time.
Upsert/Merge: While Fabric Mirroring helps customers benefit from incrementally replicating data, a complete CDC system also requires an upsert statement to avoid re-processing duplicate data. Fabric does not currently support upserts (T-SQL MERGE), which means additional costs from doing this deduplication downstream.
Mirror, Mirror, on the Wall
Based on our testing, we advise customers to be cautious of products that claim to be “free” to bring their data in. With the rise of open table formats and lakehouse architecture, organizations are increasingly moving away from traditional vendor-locked systems to maintain flexibility and reduce future migration costs. Although Fabric Mirroring might seem like the latest innovation, it’s essentially a re-packaging of decades-old technology designed to lock you into Fabric’s licensing fees, making switching and integrating with the rest of your data stack difficult and expensive.