26 August, 2024
At Xorbix, we are always committed to conducting thorough due diligence and research when it comes to the technologies we recommend and integrate for our clients. Our goal is to ensure that our clients are getting the best, most cost-effective solutions available on the market.
Recently, we took a closer look at Microsoft’s Fabric Mirroring for one of our integrations, and here is what we found.
Organizations store critical data, like inventory and sales, in transactional databases such as Azure SQL. There are three primary methods to ingest this data for analytics:
In the Azure ecosystem, Microsoft Fabric, Azure Databricks, and Azure Synapse Analytics address these ingestion options in varying capacities. This blog focuses on how Microsoft Fabric handles them.
Ingestion Type | Microsoft Fabric | Azure Synapse Analytics | Azure Databricks |
Query Federation | ✘ | ✘ | ✔ Lakehouse Federation |
Batch Ingestion | ✔ Fabric Data Factory – Subset of batch ingestion from ADF | ✔ Synapse Pipelines – Subset of batch ingestion from ADF | ✔ Lakeflow Connect |
Change Data Capture | ✔ Mirroring – Subset of ADF CDC No Private Link | ✔ Synapse Pipelines CDC 9 supported sources Private Link | ✔ Lakeflow Connect – Azure SQL CDC, Private Link |
Microsoft Fabric does not currently support query federation. Through Fabric Data Factory, Microsoft Fabric supports a subset of ADF’s batch ingestion, and through Fabric Mirroring, Microsoft Fabric supports a subset of ADF’s CDC capabilities.
Though Fabric Mirroring is marketed as innovative, it’s built on decades-old Change Data Capture (CDC) technology that began with IBM DB2 in the 1990s. Let’s separate the marketing hype from the reality of Fabric Mirroring and focus on the hidden costs and missing features.
Microsoft advertises Mirroring as “free”, but hidden costs from OneLake and the capacity model tell a different story. Here’s a breakdown of these costs:
Cost | |
Export/Backup | $ Costs incurred on source system from export or backup. |
Data Lake Write | ✔ No capacity units (CUs) consumed. |
Licensing Fees | $ Fabric capacity licensing fees are charged even when replication is not running. |
Storage at Rest | ~ Storage at rest charged after discount exhausted or capacity is paused. |
Merge/Deduplication | $ Not part of Fabric Mirroring. Additional complexity and cost (CUs) are required to reflect the source system. |
Data Access Fee | $ 3X storage transaction costs to access from other tools. |
The chart above highlights that Fabric Mirroring’s costs are anything but “free”.
Export/Backup: All CDC systems incur costs when querying or backing up the source (e.g., SQL Server, Cosmos DB). Fabric Mirroring is no exception – see here for Azure SQL, here for Cosmos DB, and here for Snowflake.
Data Lake Write: Typically, CDC systems charge for writing data to a data lake. Fabric doesn’t charge for this, encouraging data storage in OneLake—a move that locks your data into the Fabric ecosystem, making external access or export more costly.
Licensing Fee: Third-party CDC tools like Fivetran and Qlik charge licensing fees, but Azure Databricks and Azure Synapse Analytics do not. Microsoft Fabric, a SaaS tool, requires a licensing fee in the form of capacity units—fees you pay even when not using mirroring. Pause the capacity to save costs, and your mirrored data becomes inaccessible and your progress is lost.
Merge/Deduplication: While other CDC tools offer deduplication and merge capabilities, Fabric does not support merge. Therefore, multiple operations are needed for deduplication, which incurs additional costs.
Data Access Fee: While writing data to OneLake is free, Microsoft’s “open APIs” come with a 3X upcharge when accessing this data outside Fabric or migrating to another platform, locking you into their capacity fees. CDC tools using ADLS Gen2 avoid these surcharges, offering more flexibility.
Supported Connectors: Fabric Mirroring supports only three cloud sources—Azure SQL, Cosmos DB, and Snowflake—compared to nine sources in ADF CDC and even more in tools like Fivetran and Qlik.
Security and Governance: While ADF CDC, Azure Databricks Lakeflow Connect, and other CDC tools support Private Link to connect to your source systems through a private IP, Fabric Mirroring does not support Private Link at this time.
Upsert/Merge: While Fabric Mirroring helps customers benefit from incrementally replicating data, a complete CDC system also requires an upsert statement to avoid re-processing duplicate data. Fabric does not currently support upserts (T-SQL MERGE), which means additional costs from doing this deduplication downstream.
Based on our testing, we advise customers to be cautious of products that claim to be “free” to bring their data in. With the rise of open table formats and lakehouse architecture, organizations are increasingly moving away from traditional vendor-locked systems to maintain flexibility and reduce future migration costs. Although Fabric Mirroring might seem like the latest innovation, it’s essentially a re-packaging of decades-old technology designed to lock you into Fabric’s licensing fees, making switching and integrating with the rest of your data stack difficult and expensive.
Get in touch today to discover how our Databricks expertise can transform your data ingestion, processing, and analytics, empowering your success in today’s data landscape.
Discover how our expertise can drive innovation and efficiency in your projects. Whether you’re looking to harness the power of AI, streamline software development, or transform your data into actionable insights, our tailored demos will showcase the potential of our solutions and services to meet your unique needs.
Connect with our team today by filling out your project information.
802 N. Pinyon Ct,
Hartland, WI 53029
(866) 568-8615
info@xorbix.com