The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredAsk the Fabric Databases & App Development teams anything! Live on Reddit on August 26th. Learn more.
You are implementing a new data analytics solution using Microsoft Fabric. Your team consists of data engineers skilled in PySpark and SQL, and data analysts who primarily use Microsoft Power BI. The data to be ingested includes structured, semi-structured, and unstructured formats from various sources.
Select a data store that accommodates diverse data formats and supports both PySpark and SQL operations for data transformation and analysis.
Which two data stores provide a complete solution?
Your Answer
This answer is incorrect.
This answer is correct.
Correct Answer
This answer is correct.
This answer is correct.
Azure Synapse Analytics is suitable because it supports all data formats and allows operations using T-SQL and Spark, accommodating the team's diverse skill set and data needs. Microsoft Lakehouse is also appropriate as it supports structured, semi-structured, and unstructured data formats and allows operations using both PySpark and SQL, aligning with the team's skills and data requirements. In contrast, Microsoft Fabric Eventhouse is designed for real-time analytics and supports diverse data formats but is not optimized for batch processing or large-scale data transformation using PySpark and SQL, which are required in this scenario. Azure Data Lake Storage Gen2, while capable of storing diverse data formats, does not natively support PySpark and SQL operations, making it unsuitable for this scenario.
how can Azure Synapse Analytics be a Data Store? Isn't they used ADLS (Azure Data Lake Storage) as a Data Store?
Solved! Go to Solution.
Hi @tan_thiamhuat ,
I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas. Feedback submitted here is often reviewed by the product teams and can lead ...
Thanks,
Prashanth Are
MS Fabric community support
Hi @tan_thiamhuat ,
I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas. Feedback submitted here is often reviewed by the product teams and can lead ...
Thanks,
Prashanth Are
MS Fabric community support
This is equally puzzling, how can Data Flows be used for Real-time?
@tan_thiamhuat, Thanks for actively participating in MS Fabric community. your inputs are truely a vauluable resources to community and Fabric aspirants.
Thanks,
Prashanth Are
MS Fabric community support
I would like others to comment whether am I correct? there are so many answers which I think is wrong. Just want some verifications.
Your company uses a lakehouse architecture with Microsoft Fabric for analytics. Data is ingested and transformed using Microsoft Data Factory pipelines and stored in Delta tables.
You need to enhance data transformation efficiency and reduce loading time.
What should you do?
(a) Configure Spark pool with more worker nodes. (wrong)
(b) Use session tags to reuse Spark sessions.
This answer is correct.
Using session tags to reuse existing Microsoft Spark sessions minimizes startup time and enhances the efficiency of data transformation processes. Configuring the Microsoft Spark pool to use more worker nodes might seem beneficial for handling more tasks but does not directly improve transformation efficiency.
--> I find the answer above not really correct: My version below:
(a) Configure Spark pool with more worker nodes.
More worker nodes = more compute resources.
This allows Spark to process partitions in parallel, reducing the overall time it takes to transform and load data.
Especially helpful for large datasets stored in Delta tables.
(b) Use session tags to reuse Spark sessions.
Session reuse via tags helps reduce startup time for small or interactive workloads.
But it does not improve the actual data transformation performance or reduce load time significantly for heavy ETL workloads.
It is more about efficiency of session management, not compute scale.
Your company uses Microsoft Fabric to manage a data warehouse that supports various analytical workloads. Users report slow query performance, especially during the initial execution of queries.
You need to optimize query performance for the initial execution without relying on automatic statistics.
Each correct answer presents part of the solution. Which two actions should you take?
Your Answer
This answer is incorrect.
This answer is incorrect.
Correct Answer
This answer is correct.
This answer is correct.
Manually creating statistics for frequently queried tables enhances the query optimizer's ability to select efficient execution plans, which is crucial for improving initial query performance. Using Direct Lake mode in Power BI reduces latency by allowing direct data access from the lakehouse, thus improving query performance. Enabling query parallelism might seem beneficial, but it does not specifically address the issue of initial query execution performance without relying on automatic statistics.
--> Direct Lake is a Power BI feature that enhances performance for Power BI visuals, but it doesn’t directly affect the initial execution performance of queries in the data warehouse itself, isn't it?
-->
Query caching helps speed up repeated queries by storing the result set of a query so subsequent executions return faster. It also benefits initial performance if the same query is run multiple times, even by different users.