Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Miguel_Rojo
Helper I
Helper I

Power Query - Anomaly - Transformations truncate my dataset

Hello Community!
I am currently facing anomalous behavior when trying to perform transformations in Power Query. I attach a description of the phenomenon, hoping I can receive support from you to solve it:

 

  1. Using a query to a PostreSQL database, I extract a table with user data and additional metadata. As seen in the following image, the table has a total of 2.8 million rows:

Picture_1.png

As you can see, I can import it successfully.

 

  1. However, when I perform some transformation in Power Query to this imported Dataset (change the Data Type, or add a second conditional column, for example), after clicking “Close and Apply”, the tool “reduces/cuts my amount of data ”:
    • Instead of applying changes made in Power Query to all data; of the almost 3 Million rows; It only loads 50,000 rows (see attached image).
    • The transformations I requested in Power Query are performed, but my total Dataset is truncated to only 50,000 rows

Picture_2.png

 

Do you know why this happens, and what I can do to fix it? Could it be that my “Native Query” is incompatible with Power Query? Is there any way to apply the extra transformations that I need, using Power Query?

  • I attach example of the Query used:

 

 

SELECT DISTINCT platform_contact._contoso_company.marketingAutomation.contactIdExt platform_contactid -- Platform Contact ID

,platform_contact.workEmail.address platform_contact_email_address

,platform_contact._contoso_company.marketingAutomation.createdDate platform_contact_createdate-- Platform Contact Create Date

,platform_contact._contoso_company.marketingAutomation.webInteractionDetails.firstWebContactUpdateDate

,platform_contact._contoso_company.marketingAutomation.contactSourceFirst

,platform_contact._contoso_company.marketingAutomation.webInteractionDetails.lastWebContactUpdateDate

,platform_contact._contoso_company.marketingAutomation.webInteractionDetails.lastWebLogonDate

,platform_contact._contoso_company.marketingAutomation.lastLeadMktDetails.stdLeadEligible

,platform_contact._contoso_company.marketingAutomation.extendedWorkDetails.stdPrimaryBusiness

,platform_contact._contoso_company.marketingAutomation.lastLeadMktDetails.funnelLastAQLDate

,platform_contact._contoso_company.marketingAutomation.contactDateTimeLastModified

,platform_contact._contoso_company.marketingAutomation.webInteractionDetails.firstWebContactUpdateSiteLanguage

,platform_contact._contoso_company.marketingAutomation.contactSourceFirstTime

,platform_contact._contoso_company.marketingAutomation.webInteractionDetails.lastWebLogonSiteLanguage

,platform_contact._contoso_company.marketingAutomation.leadExceptionsBitmask1

,platform_contact._contoso_company.marketingAutomation.stdDateLastNormalized

,platform_contact._contoso_company.marketingAutomation.contactMIQ

,platform_contact._contoso_company.marketingAutomation.contactMIQRuleset

FROM platform_contact

,(SELECT platform_contact._contoso_company.marketingAutomation.contactIdExt -- Platform Contact ID

,MAX(platform_contact._contoso_company.marketingAutomation.contactDateTimeLastModified) elqLastRecord

FROM platform_contact

GROUP BY contactIdExt)elqcontact_lastmodified

WHERE platform_contact._contoso_company.marketingAutomation.contactIdExt  = elqcontact_lastmodified.contactIdExt

AND platform_contact._contoso_company.marketingAutomation.contactDateTimeLastModified = elqcontact_lastmodified.elqLastRecord

AND platform_contact._contoso_company.marketingAutomation.contactDateTimeLastModified >= TIMESTAMP '2023-01-01 00:00:00' 

LIMIT 100000000

 

 

 

 

1 ACCEPTED SOLUTION
v-jingzhan-msft
Community Support
Community Support

Hi @Miguel_Rojo 

 

First you can try using a tool to see the actual query submitted to the PostgreSQL server. Check if there is any filtering rows statement received there. 

Is there a PostgreSQL equivalent of SQL Server profiler? - Stack Overflow

 

Then you may try removing the transformation steps made by Power Query and check if this can bring all rows back.

 

Can you share the Power Query M code that's from the Advanced Editor after removing sensitive information? 

 

Best Regards,
Jing

View solution in original post

2 REPLIES 2
v-jingzhan-msft
Community Support
Community Support

Hi @Miguel_Rojo 

 

First you can try using a tool to see the actual query submitted to the PostgreSQL server. Check if there is any filtering rows statement received there. 

Is there a PostgreSQL equivalent of SQL Server profiler? - Stack Overflow

 

Then you may try removing the transformation steps made by Power Query and check if this can bring all rows back.

 

Can you share the Power Query M code that's from the Advanced Editor after removing sensitive information? 

 

Best Regards,
Jing

Hi @v-jingzhan-msft ,

 

Thanks for the advice!,

The problem was caused by the "Query Folding" feature. 

I haven't realized it was ebabled. I proceeded to desable this feature, and all the data was correctly loaded:

Query_Folding.png

 

I'll accept your advide as solution; but I'd like to make a final question (hope you can help me):

  • Scenario
    1. Once I connect to the PostgreSQL database, I successfully load the information I need (having disabled "Query Folding", I am able to load the 3.3 million rows).

    2. Subsequently, I perform a series of transformations (screenshot attached), until I reach a specific one: I perform a LEFT JOIN, in which I add to my main table (the one that contains 3.3 million rows) two extra columns: "Date" and " City" (these last two columns are extracted from another table, and a "Merge" is carried out of those that match by email:

                                 Table_3.3_Million_Rows.png

                                 Merging_Process.png

The LEFT JOIN increases the dataset of my "3.3 Millions Table" to 3.5 Millions aprox; but Power Query has an error and I can not perform a refresh, because different errors appears:

 

Load_Errors.png

Do you know if Power Query can not perform MERGES with high volume of data?, or if another issue is causing the error?

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors