8 May, 2024
In today’s fast-paced business world, organizations are constantly seeking ways to harness the power of their data to drive insights and make informed decisions. SQL Server has long been a reliable workhorse for storing and managing data, but as data volumes grow and the need for real-time analytics intensifies, many are turning to more advanced platforms like Databricks. In this step-by-step guide, we’ll explore how Databricks can supercharge your data analytics capabilities, allowing you to seamlessly import data from SQL Server and create dynamic, interactive dashboard visualizations that bring your data to life.
Databricks offers a host of advantages over traditional SQL Server setups, from its ability to handle massive volumes of structured and unstructured data, to its powerful data processing and machine learning capabilities. With Databricks, you can easily create foreign catalogs to read external data without the need for ingestion and use CTAS statements to efficiently save data into optimized Delta tables. We’ll walk through an example using bike sales data, showing you how to leverage your Delta tables to power stunning dashboard visualizations that provide at-a-glance insights into your business performance.
Whether you’re a data analyst, business intelligence professional, or simply someone who wants to get more out of their data, this guide will equip you with the knowledge and skills you need to take your analytics to the next level with Databricks.
Databricks and SQL Server are both powerful tools for managing and analyzing data, but Databricks has a number of key advantages over SQL Server. Let’s take a closer look at some of the benefits Databricks offers over traditional SQL Server setups:
Of course, SQL Server still has its place and may be the right choice for certain use cases, particularly those involving heavily transactional workloads or legacy systems. But for organizations looking to modernize their data analytics and take advantage of the latest advances in big data processing and machine learning, Databricks offers a compelling alternative that can help them stay ahead of the curve.
One of the most powerful features of Databricks is its ability to read external data without the need to ingest it into the platform. This is made possible through the use of foreign catalogs, which allow Databricks to query data stored in external systems like SQL Server. By creating a foreign catalog, data teams can:
Creating a foreign catalog in Databricks is a straightforward process. First, users must define the external data source they wish to connect to, specifying details like the server name, database name, and authentication credentials. This can be done using the Databricks UI or through a SQL command.
Here, we will walk through the steps of setting up a foreign catalog to read bike sales data from SQL Server into Databricks:
That’s it! You now have a foreign catalog in Databricks which is configured to read data from your SQL Server database. You may query the catalog with SQL or Python the same way you would with a standard catalog stored in Databricks.
This approach offers significant benefits in terms of flexibility and efficiency. Rather than having to move large volumes of data into Databricks, users can leave the data where it is and simply query it as needed. This not only saves time and resources but also allows organizations to maintain a single source of truth for their data, reducing the risk of inconsistencies or errors.
The bike sales data we’ll be working on within this guide is a dataset that captures various aspects of a bike retailer’s sales operations. At its core, the data includes detailed information on each individual bike sale, such as the date of purchase, the model & category of the product, and the price paid by the customer. This granular, transaction-level data allows for the analysis of sales patterns and trends over time.
This information can help the retailer optimize their product mix, identify popular or slow-moving items, and ensure they have the right stock on hand to meet customer demand. By analyzing this product data alongside the sales and customer data, the retailer can make data-driven decisions to improve their overall business performance.
Now let’s dive into how we can use tables to power our dashboard visualizations in Databricks. Our aim is to use SQL Query to pull data and create some insightful visualizations. In our case, we are going to create two charts: one showing total sales per year and another showing daily sales for product categories.
To create a new Dashboard, navigate to “Dashboards” in the navigation Drawer and then click the “Create Dashboard” button.
The first chart is an “Area” type visualization that displays total sales for each calendar year. The X-axis represents the year, and the Y-axis represents the total sales for the year.
SELECT YEAR(soh.OrderDate) AS Year,
SUM(sod.LineTotal) AS TotalSales
FROM bikeworkscatalog.saleslt.salesorderheader soh
JOIN bikeworkscatalog.saleslt.salesorderdetail sod
ON soh.SalesOrderID = sod.SalesOrderID
GROUP BY YEAR(soh.OrderDate)
ORDER BY YEAR(soh.OrderDate);
This query groups sales based on the year of the order date sums up the total sales for each group, and orders the result by year. In the Databricks dashboard, you simply need to set ‘Year’ as the X-axis and ‘SUM(TotalSales)’ as the Y-axis. Select the dataset you created from the SQL Query. Then, select ‘Area’ as the chart type to get a graphical representation of the total sales for each calendar year.
In the second chart, we focus on daily sales for each product category. This chart employs a date picker function that allows the user to select a date range for the order dates to display on the chart. This chart is a line graph where each line represents daily sales for a single category. The user can select which categories they want to be displayed on the chart by clicking one or more product categories listed to the right of the visualization. The X-axis represents the Date, and the Y-axis shows the Daily Sales.
Below is the SQL Query for this visualization, configured to display data for the month of April 2024:
WITH SalesByDay AS (
SELECT pc.Name AS CategoryName,
DATE(soh.OrderDate) AS SalesDate,
SUM(sod.LineTotal) AS TotalSales
FROM bikeworkscatalog.saleslt.salesorderheader soh
JOIN bikeworkscatalog.saleslt.salesorderdetail sod ON soh.SalesOrderID = sod.SalesOrderID
JOIN bikeworkscatalog.saleslt.product p ON sod.ProductID = p.ProductID
JOIN bikeworkscatalog.saleslt.productcategory pc ON p.ProductCategoryID = pc.ProductCategoryID
GROUP BY pc.Name, DATE(soh.OrderDate)
)
SELECT CategoryName,
SalesDate,
SUM(TotalSales) AS TotalSales
FROM SalesByDay
GROUP BY CategoryName, SalesDate
ORDER BY SalesDate, CategoryName;
This query first creates a temporary view of sales by each day and for each product category. It then groups these results by sales date and category, sums up the total sales for each group, and orders the output by sales date and category name.
On the Databricks dashboard, you’ll need to set ‘SalesDate’ as the X-axis and ‘SUM(TotalSales)’ as the Y-axis. Then, choose ‘Line’ as the chart type. To allow selection of the product category for the line chart, assign ‘CategoryName’ to “Color/Group by”.
We’ve seen how Databricks’ scalability, flexibility, and advanced analytics capabilities make it a compelling alternative to traditional SQL Server setups, particularly for organizations dealing with large volumes of diverse data. Through our example using bike sales data, we’ve demonstrated how you can use your data to create dynamic, interactive dashboards that provide valuable insights at a glance.
But perhaps most importantly, we’ve shown how Databricks can empower you to take control of your data and make better, more informed decisions for your business. Whether you’re a seasoned data professional or just starting out on your analytics journey, Databricks provides the tools and capabilities you need to succeed in today’s data-driven world.
So why wait? Start exploring the possibilities of Databricks today and discover how it can help you unlock the full potential of your data. With its intuitive interface, powerful processing capabilities, and rich ecosystem of tools and integrations, Databricks is the ultimate platform for anyone looking to take their analytics to the next level. Happy data wrangling!
Xorbix Technologies, a partner of Databricks, can guide you through every step of your journey. Contact us today to explore how Databricks can revolutionize your analytics capabilities and drive your business forward. Connect now!
Discover how our expertise can drive innovation and efficiency in your projects. Whether you’re looking to harness the power of AI, streamline software development, or transform your data into actionable insights, our tailored demos will showcase the potential of our solutions and services to meet your unique needs.
Connect with our team today by filling out your project information.
802 N. Pinyon Ct,
Hartland, WI 53029
(866) 568-8615
info@xorbix.com