The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Question:
You have a Fabric workspace.
You have semi-structured data.
You need to read the data by using T-SQL, KQL, and Apache Spark. The data will only be written by using Spark.
What should you use to store the data?
Solved! Go to Solution.
I was gonna say that the mention of KQL in this sentence: "You need to read the data by using T-SQL, KQL, and Apache Spark," is the clue. How the data is written is set in stone: It will be Spark and nothing else.
So the real question pertains to how do we want to be able to read the data. You "cannot" read from a lakehouse with KQL. I should be more specific here: the Fabric Lakehouse engine doesn’t spin up a Kusto cluster for direct KQL—it “virtualizes” the Lakehouse files into your KQL database via external tables or shortcuts, so it's not as direct, you need to set up stuff and so on, whereas with an eventhouse, you don't need to perform all these extra steps.
If the workload is predominantly batch analytics, machine learning or BI with Power BI direct‑Lake, the Lakehouse remains the right choice. But as soon as “read with KQL” becomes a core requirement, the Eventhouse is the storage layer you want.
At first I wanted to answer with eventhouse because of KQL. But As I read throught back in the documentation https://learn.microsoft.com/en-us/fabric/onelake/create-onelake-shortcut
Microsoft Fabric allows you to create shortcuts in a KQL database that reference data stored in a Lakehouse. This setup enables you to query the Lakehouse data using KQL without duplicating the data.
Documentation:
Read the paragraph you quote again: "... as the source for your shortcut."
This is talking about creating a OneLake shortcut INSIDE a Fabric lakehouse. It is not the lakehouse that is the data source. The source is the artifact to which the shortcut points.
In other words, the shortcut is a proxy for the data source. So, if the shortcut points to a lakehouse, the source is the lakehouse. If it points to a data warehouse, then the data warehouse is the source. And if it points to a KQL DB, then the KQL DB is the source.
The paragraph you quote does NOT state that the shortcut can be created inside a KQL DB.
Putting it all together, therefore, you persistence layer should be an eventhouse. You can then create a shortcut inside a lakehouse to the eventhouse, and use the lakehouse SQL endpoint with T-SQL to query the eventhouse through the shortcut. But not vice-versa.
Hi @samir023041,
Thank you for reaching out to Microsoft Fabric Community.
Thank you @lbendlin and @Element115 for the prompt response.
As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by the super user's resolved your issue?
If our super user's response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.
Thanks and regards,
Anjan Kumar Chippa
Hi @samir023041,
We wanted to kindly follow up to check if the solution provided by the super user's resolved your issue?
If our super user's response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.
Thanks and regards,
Anjan Kumar Chippa
This is the clue: "The data will only be written by using Spark."
I was gonna say that the mention of KQL in this sentence: "You need to read the data by using T-SQL, KQL, and Apache Spark," is the clue. How the data is written is set in stone: It will be Spark and nothing else.
So the real question pertains to how do we want to be able to read the data. You "cannot" read from a lakehouse with KQL. I should be more specific here: the Fabric Lakehouse engine doesn’t spin up a Kusto cluster for direct KQL—it “virtualizes” the Lakehouse files into your KQL database via external tables or shortcuts, so it's not as direct, you need to set up stuff and so on, whereas with an eventhouse, you don't need to perform all these extra steps.
If the workload is predominantly batch analytics, machine learning or BI with Power BI direct‑Lake, the Lakehouse remains the right choice. But as soon as “read with KQL” becomes a core requirement, the Eventhouse is the storage layer you want.