This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
Data ingestion is one of the most important actions in the Data Warehouse solutions. In Microsoft Fabric Data Warehouse, the OPENROWSET function provides a powerful and flexible way to read data from files stored in Fabric OneLake or external Azure Storage accounts. Whether you’re working with Parquet, CSV, TSV, or JSONL files, the OPENROWSET function makes it easy to bring your data into your warehouse.
The OPENROWSET function is not just for reading data—when combined with an INSERT statement, it becomes a powerful option for file ingestion.
In this article, we’ll explore how to use OPENROWSET within an INSERT statement to load data, and we’ll highlight the key differences between this approach and the standard COPY INTO statement.
OPENROWSET is a versatile function that allows you to query the contents of files stored in a data lake. It returns the file data in a relational, tabular format, making it easy to work with. By simply providing the file’s URI to OPENROWSET, you can retrieve its contents as structured rows and columns:
Ingest_files_into_your_Fabric_Data_Warehouse_using_the_OPENROWSET_function
OPENROWSET is more than just a file reader—it’s a feature-rich T-SQL function designed for flexibility and easy to use.
This flexibility makes OPENROWSET ideal for ETL pipelines, where you need to extract and transform data before loading it into your warehouse.
In ETL or ELT processes, OPENROWSET represents the Extract (E) phase. When combined with the INSERT statement, it becomes a powerful tool for loading data into Fabric warehouse tables.
Typically, loading data involves two steps:
Before we start data ingestion, we need to create a target table in Fabric Data Warehouse. We can use CTAS (CREATE TABLE AS SELECT) with schema inference to create table based on sample data:
CREATE TABLE [dbo].[bing_covid-19_data] AS SELECT TOP 0 * FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet' )
We can use the source file intended for ingestion as a template to define the table structure.
In this example, I will use the schema as-is without any changes.
Once the table is created, we can ingest the Parquet file. Here’s an example using INSERT and OPENROWSET to load the Bing COVID-19 dataset:
INSERT INTO [dbo].[bing_covid-19_data] SELECT * FROM OPENROWSET(BULK 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet') WHERE country_region <> 'Worldwide'
This statement reads the file content, filters out all rows where country_region is 'Worldwide', and inserts the remaining rows into the target table.
Here is the summary of OPENROWSET and COPY INTO differences:
| OPENROWSET | COPY INTO | |
| Transformations | Yes (during the query execution) | No (must be done after load) |
| Schema inference | Based on source files | Based on target table |
| Best for | ETL with filtering or minor changes | 1:1 bulk ingestion in ELT |
Let’s recap the scenarios where OPENROWSET is a good choice for data ingestion:
In summary, understanding the strengths of both COPY INTO and INSERT ... OPENROWSET is key to building efficient ingestion workflows. COPY INTO excels in high-volume, production-grade loads with robust error handling and performance optimizations, while INSERT ... OPENROWSET offers flexibility for ad-hoc ingestion, schema inference, and quick data exploration. By combining these approaches, you can select the right tool for each scenario—ensuring your data pipelines are both adaptable and optimized for your specific needs.
To learn more about the ingestion with OPENROWSET function, refer to the documentation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.