The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hello,
I'm writing to see if anyone can help me understand why my Fabric capacity is seeing an explosion of Current Storage in the Fabric Capacity Metrics App. Our data footprint should only be .39GB, and our Current Storage was sitting around .39GB until the last few days when it ballooned up to over 6,000GB, yes, you heard me correctly, 6TB of Current Storage. While this only costs about $150 a month to store, our storage should still not be anywhere near that amount and I'm wondering if someone can help me understand why.
There is a Lakehouse in the workspace that contains about 60 tables, and the largest table is about 12 million rows, so nothing crazy. There is a pipeline activity that has many Copy data activities in it, these Copy data activities load the 60 NetSuite tables over ODBC three times daily, with the option of Overwrite. The largest table like I said has 12 million rows, and the parquet file is about 200,000KB, so still not that large. I understand that upon each run the parquet files will accumulate, so that storage will slowly grow over time, but to 6TB in 3 days time? Help me understand this or how I can troubleshoot where this so called "Current Storage" is because it's so confusing to me.
I've even navigated to the workspace and Lakehouse in Lakehouse file explorer and it only is showing me 1.84GB of storage, which seems much more in line with what I would be expecting. Can someone help explain to me why the Metrics App would show 6,014GB? Where is the other 6,012GB that I am supposedly storing and going to be paying for?
Thanks!
I just don't understand how with each load/overwrite of the tables it could possibly grow from .5GB for all 60 tables to 6TB of data in a matter of a few days. This is my DESCRIBE HISTORY on my Transaction table, second largest table at 500,000 million rows, second only to the TransactionLine table.
Here are two examples, one of Transaction and one of Customer (tables). The Transaction table is loaded and in overwrite mode via DFG2 while the Customer table is a Copy Data activity with Overwrite option on. Both histories show both an Update and a Replace table each time the pipeline or DFG2 runs.
I am using an Invoke Pipeline activity from a "Main Pipeline" that handles more of the orchestration that just copying the tables from NetSuite into Fabric. I suppose I did start using that Invoke Pipeline feature (in Preview) about 3-4 days ago and that's when the Current Storage blew up, so maybe there is a bug with that at the moment?
Not sure what else I could be doing wrong here, my pipeline is extremely simple, just very simple Copy Data activities that run a SQL statement over ODBC to NetSuite to fetch the data table and load it to a Lakehouse with overwrite on. I've checked the Transaction table for duplicates and there are no duplicate primary keys so it's not appending, and even if it was it should not contribute to end up writing 6TB of data in 3 days.
The only thing I can think of is that there is corruption somewhere along the line that is causing this, but I am at a loss for how to troubleshoot it. The pipeline has only been online for about 7 days now, and has only run for maybe 15-20 times. Each run of the pipeline should simply overwrite the existing table with the query results from the new call to NetSuite with the updated table/rows. Most tables are very small, maybe 40 rows in a dimension.
Is there no way to dig through the OneLake file explorer and find what files may be contributing to 6TB of Current Storage? Seems like a huge miss on Microsoft's end to not allow us to dig into the details here and find the cause of the problem, or maybe they just want to collect on Storage Costs by billing us for storing 6TB of temporary files when our data footprint is 2GB 😂
Hi there,
It might be one of those reasons,