The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi everyone!
Hope the question I bring today has a simple solution, as i'm trying to figure out the best way of realizing data transformations accross medallion architecture layers.
So, supose you have 2 lakehouses, resembling medallion architecture:
bronze_lh
tableA
silver_lh
Question here is:
Using a notebook with Spark SQL, how would you write a spark sql query that would fetch data from tableA, in bronze_lh, transform it and then store the resulting data in a tableA_silver in the silver_lh ?
I know this is possible with PySpark, it's what i've been using so far. But how is it correctly done using SparkSQL?
The reason I ask is because I'm far more confortable using SQL than Python, especially for all data transformations not involving pivoting the data, but I feel Spark SQL lacks the capabilities of doing operations like the ones I'm describing here. Please prove i'm wrong 🙂
Furthermore, using wharehouse and pure sql notebooks is not an option, I beleive, as the data is composed by large millions of records with a fair amount of columns/data fields, and I'm unsure how well would SQL perform in this scenario.
If the question is somewhat ambiguous, please feel free to ask for specific details. I'm glad to provide them as far they don't go off to what's reasonably possible to share.
Apretiate in advance your feedback.
Solved! Go to Solution.
Hi @SergioTorrinha , Hope your doing well.
In this scenario i suggest you to raise a support ticket here. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
@SergioTorrinha , As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?
@nilendraFabric, thanks for your promt repsonse
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
Hello @SergioTorrinha
Assume you’ve created a shortcut named bronze_tableA in your Silver workspace that points to the original tableA in Bronze. Then your notebook cell could look like this
%%sql
CREATE OR REPLACE TABLE tableA_silver AS
SELECT
col1,
col2,
col4,
col5
FROM bronze_tableA
Hi @nilendraFabric !
Thanks for you'r input.
Do I really need to create shortcuts, for this kind of task, even if these lakehouses are within the same tenant/domain/workspace?
Imagine I have multiple tables in bronze_lh, for which I would like to do the same sort of operations. Would I need to create those shortcuts in the silver_lh? How is one automating such task?
Thank you.
Hello @SergioTorrinha
In your notebook, you need access to the Bronze table. If both lakehouses share the same workspace, you can reference the table directly (for example, using its fully qualified name). Otherwise, create a shortcut in your Silver lakehouse that points to the Bronze table
-- Using fully qualified name (catalog.schema.table)
CREATE OR REPLACE TABLE silver_lh.tableA_silver AS
SELECT
Cola
FROM bronze_lh.tableA
Hi again @nilendraFabric !
Sorry to pull this topic again, but after testing I didn't quite got the results I was expecting after your input. I somewhat feel this has a really simple solution, but aparently I cant get there alone.
Below code:
%%sql
CREATE OR REPLACE TABLE lh_silver.dbo.test_table_pls_drop_me AS
SELECT
*
FROM dbo.bronze_table
LIMIT 10
Throws the following error message:
[REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `lh_silver`.`dbo`.
Knowing that:
- the notebook I am running this code is associated to lh_bronze
- both lh_bronze and lh_silver are located in the same workspace
what am I doing wrong?
Apretiate your help in advance.
Hi everyone!
can someone from the support team, help with this one, please?
Much apretiated.
Hi everyone!
I keep facing this issue, can someone help with this one, please?
Thank you.
Hi @SergioTorrinha , Hope your doing well.
In this scenario i suggest you to raise a support ticket here. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
Ok, now this is making more sense, I'll have to try this one out.
Thank you for your help!
User | Count |
---|---|
7 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
17 | |
16 | |
6 | |
5 | |
5 |