topic Re: Spark SQL query data in one datalake and store transformed data in another data lake in Data Engineering

Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Fri, 21 Feb 2025 11:10:32 GMT

Hi everyone!

Hope the question I bring today has a simple solution, as i'm trying to figure out the best way of realizing data transformations accross medallion architecture layers.

So, supose you have 2 lakehouses, resembling medallion architecture:

bronze_lh
tableA

silver_lh

Question here is:
Using a notebook with Spark SQL, how would you write a spark sql query that would fetch data from tableA, in bronze_lh, transform it and then store the resulting data in a tableA_silver in the silver_lh ?
I know this is possible with PySpark, it's what i've been using so far. But how is it correctly done using SparkSQL?

The reason I ask is because I'm far more confortable using SQL than Python, especially for all data transformations not involving pivoting the data, but I feel Spark SQL lacks the capabilities of doing operations like the ones I'm describing here. Please prove i'm wrong 🙂
Furthermore, using wharehouse and pure sql notebooks is not an option, I beleive, as the data is composed by large millions of records with a fair amount of columns/data fields, and I'm unsure how well would SQL perform in this scenario.

If the question is somewhat ambiguous, please feel free to ask for specific details. I'm glad to provide them as far they don't go off to what's reasonably possible to share.
Apretiate in advance your feedback.

Re: Spark SQL query data in one datalake and store transformed data in another data lake

nilendraFabric — Fri, 21 Feb 2025 12:29:03 GMT

Hello @SergioTorrinha

Assume you’ve created a shortcut named bronze_tableA in your Silver workspace that points to the original tableA in Bronze. Then your notebook cell could look like this

%%sql
CREATE OR REPLACE TABLE tableA_silver AS
SELECT
col1,
col2,
col4,
col5
FROM bronze_tableA

Re: Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Fri, 21 Feb 2025 13:02:39 GMT

Hi @nilendraFabric !

Thanks for you'r input.
Do I really need to create shortcuts, for this kind of task, even if these lakehouses are within the same tenant/domain/workspace?
Imagine I have multiple tables in bronze_lh, for which I would like to do the same sort of operations. Would I need to create those shortcuts in the silver_lh? How is one automating such task?

Thank you.

Re: Spark SQL query data in one datalake and store transformed data in another data lake

nilendraFabric — Fri, 21 Feb 2025 13:02:30 GMT

Hello @SergioTorrinha

In your notebook, you need access to the Bronze table. If both lakehouses share the same workspace, you can reference the table directly (for example, using its fully qualified name). Otherwise, create a shortcut in your Silver lakehouse that points to the Bronze table

-- Using fully qualified name (catalog.schema.table)
CREATE OR REPLACE TABLE silver_lh.tableA_silver AS
SELECT
Cola
FROM bronze_lh.tableA

Re: Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Fri, 21 Feb 2025 13:05:08 GMT

Ok, now this is making more sense, I'll have to try this one out.

Thank you for your help!

Re: Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Thu, 27 Feb 2025 09:51:50 GMT

Hi again @nilendraFabric !

Sorry to pull this topic again, but after testing I didn't quite got the results I was expecting after your input. I somewhat feel this has a really simple solution, but aparently I cant get there alone.

Below code:

%%sql CREATE OR REPLACE TABLE lh_silver.dbo.test_table_pls_drop_me AS SELECT * FROM dbo.bronze_table LIMIT 10

Throws the following error message:
[REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `lh_silver`.`dbo`.

Knowing that:

- the notebook I am running this code is associated to lh_bronze

- both lh_bronze and lh_silver are located in the same workspace

what am I doing wrong?
Apretiate your help in advance.

Re: Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Fri, 28 Feb 2025 10:23:55 GMT

Hi everyone!

can someone from the support team, help with this one, please?
Much apretiated.

Re: Spark SQL query data in one datalake and store transformed data in another data lake

SergioTorrinha — Mon, 03 Mar 2025 09:10:01 GMT

Hi everyone!

I keep facing this issue, can someone help with this one, please?
Thank you.

Re: Spark SQL query data in one datalake and store transformed data in another data lake

v-prasare — Mon, 10 Mar 2025 21:39:58 GMT

@SergioTorrinha , As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?

@nilendraFabric, thanks for your promt repsonse

Thanks,

Prashanth Are

MS Fabric community support

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

Re: Spark SQL query data in one datalake and store transformed data in another data lake

v-prasare — Mon, 17 Mar 2025 09:09:50 GMT

Hi @SergioTorrinha , Hope your doing well.

In this scenario i suggest you to raise a support ticket here. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Thanks,

Prashanth Are

MS Fabric community support

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query