Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Ask the Fabric Databases & App Development teams anything! Live on Reddit on August 26th. Learn more.

support openrowset/polybase

Data providers typically deliver data in formats such as parquet, JSON, or CSV. In traditional data warehouse scenarios, these files often land in a persistent staging area. Many organize these staging folders in a structured manner such as year/month/week/day. Currently, these files need to be converted to delta format with an additional workload before they can be consumed in the data warehouse.

The support of openrowset/create external table, as with parquet file globbing, would greatly simplify the load and transform to final delta files in the warehouse workload without the need for additional workloads/datalflow/pipelines complexity.

Status: Completed

OPENROWSET is available in Fabric DW and can read CSV, Parquet, and JSONL files. With the views on OPENROWSET you can reference external files.

 

If you need external tables vote for separate item: Support external tables for parquet/csv in Fabric ... - Microsoft Fabric Community

Comments
admin176
New Member

we use this pattern extensively at work, ADF, azure storage, staging table in sql server, our data engineer dismissed Fabric because of the lack of this feature.

vasu_n
New Member

Hello,


Currently we are serving many BI workloads(Power BI Reports, Excel Tables, etc) through Synapse Serverless SQL openrowset with data from datalake in different formats and from different layers (in medallion architecture).


We are trying to move to Fabric now, but all our data is in parquet format in ADLS2 and existing pipelines are also ingesting the data into ADLS2 (external data, export setupts, etc).


Shortcut is good but it can't be served to BI through fabric without loading it into delta tables. So there needs another ETL to load the ADLS2 data into Delta Tables through spark ETL which is killing mosquito with hammer.


Hope to see OPENROWSET in fabric soon.


Thanks.

vasu_n
New Member

Hello Admin, is this being considered?


Thanks.

a_portoles
New Member

He use OpenRowset in all our projects.


Allow us

  • Implement data Virtualization, and avoid Staging tables
  • Grant access to data that is in the LZ but not in the User SQL tables with Views_LZ


Important to have

  • The ALDS / OneLake must be under firewall ==> Private endpoint compatible (**)
  • Use SAS or Acces Keys . . .., But compatible with AAD (User MI or System MI) (**)
  • Most of the Cases we use Authentication SQL <==> Storage: Delegation / Impersonate
  • In some cases for audit purpolses we need use Pass-Through Authentication, thus the AAD SQL user will appear on ADLS logs
  • The T-SQL Openrowset must be able to compile on Azure Data studio & Azure DevOps, currently is not able to do it and block all CI/CD progress (**)
  • Format files (FMT) are really obsolete, need maintenance and are not part of Database Project / GIT, need to find a 2024 replacement (**)
  • We hope that Select disctinct (col_1) [for Dim Tables] On parquet will be faster than on CSV, but sometimes is not (**)
  • This Synapse "sugar" is very appreciatee: '*/*/*/myfile*.parquet'
  • Currently select from openrowset INTO #Temptable is not supported (**)
  • All the others: AGG pushdown, Metadata, Vertipaq / V-Order, PurView , , . . . future??


(**) Hugo Queiroz knows about these issues and some of them are ACK by Azure support

Jugi
Microsoft Employee

I think this idea asks for a similar approach https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=6bd2453c-3eed-ee11-a73d-000d3a7c5b9a

Sanjay_Rathod1
New Member

Hello Jovan Popovic


Am using synapse serverless pool for converting JSON files on the fly to rows and columns


So openrowset is definitely required in Fabric


Thanks

Sanjay

Maxime_Garneau_
New Member

I agree with folks; this is a must-have feature in a lakehouse: the ability to easily separate the physical layer from the abstract layer without constantly copying or materializing data.


Use case examples:

  • Rapidly profile and explore raw data (CSV, JSON, etc.) in the lakehouse/files using T-SQL/SQL OPENROWSET in SQL ENDPOINT.
  • MVVM style design pattern: Implement an interface (view) on top of raw data files (CSV, JSON, etc.) in SQL ENDPOINT using OPENROWSET to facilitate downstream data flows/ETL
  • de-serialize and standardize json schema easily in a view using OPENROWSET and OPENJSON
  • expose data in a "fast track mode" to consumer


So basically, just reply with the Synapse Serverless feature. 😉


Furthermore, it says that this idea is in "Planned" status, so can we get an ETA on it? This information could be very helpful in preparing our migration plan from Synapse to Fabric (or not).



Thanks! 🙂

CharlesWebbMSFT
Microsoft Employee
Folks, please feel free to share more details about the scenarios this would unblock for you.
fbcideas_migusr
New Member
Status changed to: Planned
 
jovanpop-msft
Microsoft Employee
Status changed to: Completed

OPENROWSET is available in Fabric DW and can read CSV, Parquet, and JSONL files. With the views on OPENROWSET you can reference external files.

 

If you need external tables vote for separate item: Support external tables for parquet/csv in Fabric ... - Microsoft Fabric Community