Query insights - Microsoft Community Hub
Question: How can I identify unused data in a modern data platform built with Azure Synapse and the medallion architecture using Log Analytics? I’m working with a client who has built a modern data platform based on the medallion architecture, leveraging Azure Synapse and Azure Storage Accounts. Users access the data in various ways within Synapse workspaces: some through Python scripts, others through serverless SQL endpoints, and others via dedicated SQL pools (utilizing views and stored procedures). We log a significant amount of information via Log Analytics, which means that all select statements executed on the data are essentially logged. The client now wants to identify which data is not actively used, in order to reduce storage costs by removing unused datasets. In a traditional SQL data warehouse, the Query Store could be used for this, but in this platform, we only have access to the log data stored in Log Analytics.
How can we, based on the logs in Log Analytics, determine which data (tables, views, etc.) is processed through the various layers of the medallion architecture but not actually used? The goal is to remove unused data to save costs. Any suggestions or insights would be greatly appreciated!
Written by moritzelsaesser1 on September 16, 2024
Written by FrancisRomstad on January 25, 2022
Written by varun-dhawan on October 15, 2024
How can I identify unused data in a modern data platform built with Azure Synapse and the medallion architecture using Log Analytics? I’m working with a client who has built a modern data platform based on the medallion architecture, leveraging Azure Synapse and Azure Storage Accounts. Users access the data in various ways within Synapse workspaces: some through Python scripts, others through serverless SQL endpoints, and others via dedicated SQL pools (utilizing views and stored procedures). We log a significant amount of information via Log Analytics, which means that all select statements executed on the data are essentially logged. The client now wants to identify which data is not actively used, in order to reduce storage costs by removing unused datasets. In a traditional SQL data warehouse, the Query Store could be used for this, but in this platform, we only have access to the log data stored in Log Analytics.
How can we, based on the logs in Log Analytics, determine which data (tables, views, etc.) is processed through the various layers of the medallion architecture but not actually used? The goal is to remove unused data to save costs. Any suggestions or insights would be greatly appreciated!
Written by saqlaintahir on October 14, 2024
Written by varun-dhawan on September 30, 2024
Written by Mattcc on September 27, 2024




















