Solved: Re: Power Query has fewer rows than power bi datam...

ppvinsights · ‎08-26-2024

Hi community,

I have a postgreSQL-database with orderItems, orders and invoices. I need to join the orderitems with the order and afterwards the result with the invoice - nothing special.

But:

I receive duplicate rows in power bi. I tested every relationship in the database - everything is fine.

I tested the number of rows in power query before loading into the datamodel - PQ says 9.145. But in Power BI I receive more than 10.000.

I thought that if I use the transformation "Table.RowCount()", there shouldn't be a difference between preview and "real" data.

So to sum up: In the query which is loaded to the datamodel the transformation "Table.RowCount" says "9.145". But after loading the query in power bi I receive 10.531 rows.

I checked the database and 9.145 is the correct value...

I

Any hint is appreciated!

Holger

ppvinsights · ‎08-30-2024

Hello again,

I stripped my problem down to the (I think) smallest piece of data an M-code. I attach all necessary files in a ZIP-file. It woulld be so nice if someone with a running postgreSQL instance can reproduce my problem - and if you can: someone can tell me where to send a bug report to microsoft?

The file contains the following files:

1) createDatatype: This creates an enumeration datatype. I need to select this column to reporoduce the error.

2) the ddl of the table with the data

3) the insert statements (about 23.000 rows with just 3 Columns)

and for sure: the power bi file. You just have to change the database server and the database.

I found these obervations:

- If there is no index on the "id" column, the duplication does not happen

- The transformation "select columns" is necessary. Otherwise no duplication happens

- you need to select the enumeration column. Otherwise no duplication shappens.

- the sort is necessary. Otherwise no dulication happens

I think the reason is hidden behind

- query folding

- paging with using the index

- the enumeration

Thanks everybody for your support!

Holger

PS: I was not able to insert attachments to this post - I always get exceptions like "txt is not supported, pbix is not supported,...". So here is a dropbox link:

https://www.dropbox.com/scl/fi/klcq6rusf83pgu7wj6ku6/postgreSQL_RowDuplication.zip?rlkey=imu2dp8qxlc...

View solution in original post

ppvinsights · ‎08-27-2024

I do not have any merge. I have nexted joins - but I deleted everything to find the root cause. Not I do not have any joins anymore. But: Right now I have a query, which loads just a table called "costitems". I have another query, which is a reference to the first one. In this second query I have a sort criteria. And here it happends: If I sort by the primary key asc, the rows are doubled. But not all rows. I looked every doubled row and took the id from the biggest one. I counted all rows with id lower than this biggest id. Resukt: 4096...

I tried to "sniff" the sql queries via "pg_stat_activity" - and I found:

select

"$Ordered"."id"

, "$Ordered"."itemnumber"

, ......

from "public"."costitems" "$Ordered"

limit 9223372036854775807

offset 4096

The "offset" is interesting... Any clue? Seems like the problem is this first "page"

v-xingshen-msft · ‎08-27-2024

Hi @ppvinsights ,
Based on past experience, if you have a large amount of data, Power BI may automatically page the data. When paging, OFFSET and LIMIT clauses are used to fetch data in batches. However, if there is any problem in the paging logic (such as overlapping paging boundaries or unstable ORDER BY), it may cause some rows to be repeated in different paging results.
If you are facing the problem of duplicate data, you can use remove duplicate values in power query to fix this error, hope my method will help you !

Hope it helps!

Best regards,
Community Support Team_ Tom Shen

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

ppvinsights · ‎08-28-2024

Sorry, but I have to come back with my problem:

Especially when it comes to duplicates (due to a buggy data connector...?) the solution cannot be to just use "Remove Duplicates". Because that means, that each and every row is loaded to memory - especially (!) when it comes to big datasets this cannot be the solution ?!

Here is my M-Code:

Query no1 (base query - not loaded):

############################

let
    Quelle = PostgreSQL.Database("localhost", "postgres", [CreateNavigationProperties=false]),
    public_costitems = Quelle{[Schema="public",Item="costitems"]}[Data]
in
    public_costitems

Query no 2 (loaded)

########

let
    Quelle = _costitems,
    #"Andere entfernte Spalten" = Table.SelectColumns(Quelle,{"id", "kind"}),
    #"Sortierte Zeilen" = Table.Sort(#"Andere entfernte Spalten",{{"id", Order.Ascending}})
in
    #"Sortierte Zeilen"

In Power BI I filter for an element with id=1495: the row is doubled. For no reason!

I have a guess - not really an explanation, but kind of a reason:

I logged all SQL-statements. I saw to sql-statements (pls ignore the quotation marks - I copied the statements out of the log files):

SELECT 
    ""$Ordered"".""id"", 
    ""$Ordered"".""itemnumber"", 
    ""$Ordered"".""overwrittenvalue"", 
    [...]
from ""public"".""costitems"" ""$Ordered"" 
order by ""$Ordered"".""id"" 
limit 4096"

And a second one which is nearly the same - onyl the limit/offset changed:

select

    [...]

from ""public"".""costitems"" ""$Ordered"" 
limit 9223372036854775807 offset 4096"

==>Why does it break the fold? Nothing special in the the query?

I looked at the datatypes of the columns of the tablle and I found, that there is a special datatype used for the column [kind] which is an "enumeration" in postgres. When I changed the selected columns and load e.g. the [orderid] instead of the column [kind], the rows and not doubled anymore. I can take an arbitrary set of columns of this table - as long as I do not use the field [kind] the query is folded and everything works fine. Only if I take the field [kind] in the selection, some rows are doubled.

But: I need this column. And it seems like a bug in the data connector. Anyone who can help me or give me a hint, where I can post the problem, if this forum is the wrong one?

Thanks

Holger

ppvinsights · ‎08-30-2024

Hello again,

I stripped my problem down to the (I think) smallest piece of data an M-code. I attach all necessary files in a ZIP-file. It woulld be so nice if someone with a running postgreSQL instance can reproduce my problem - and if you can: someone can tell me where to send a bug report to microsoft?

The file contains the following files:

1) createDatatype: This creates an enumeration datatype. I need to select this column to reporoduce the error.

2) the ddl of the table with the data

3) the insert statements (about 23.000 rows with just 3 Columns)

and for sure: the power bi file. You just have to change the database server and the database.

I found these obervations:

- If there is no index on the "id" column, the duplication does not happen

- The transformation "select columns" is necessary. Otherwise no duplication happens

- you need to select the enumeration column. Otherwise no duplication shappens.

- the sort is necessary. Otherwise no dulication happens

I think the reason is hidden behind

- query folding

- paging with using the index

- the enumeration

Thanks everybody for your support!

Holger

PS: I was not able to insert attachments to this post - I always get exceptions like "txt is not supported, pbix is not supported,...". So here is a dropbox link:

https://www.dropbox.com/scl/fi/klcq6rusf83pgu7wj6ku6/postgreSQL_RowDuplication.zip?rlkey=imu2dp8qxlc...

ppvinsights · ‎09-03-2024

Made a new topic with the problem

v-xingshen-msft · ‎08-26-2024

Hi All
Firstly @amitchandak thank you for your solution!
And @ppvinsights ,Based on your description, I just tested it in my environment. The data is also realisable in Power Query and BI, I think your problem may be that after loading into Power BI, there is an issue with the model relationships causing the data to reoccur.
Can you post a table structure with some sample data that reproduces the problem? It would be better to share the example pbix file. This will be more helpful for us to help you to solve the problem, I hope my answer can help you, here are the cases I found for you which are related to you, I hope it will be helpful for you.

Solved: Re: Expanding columns after a Query Merge increase... - Microsoft Fabric Community

Best regards,
Community Support Team_ Tom Shen

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

ppvinsights · ‎09-02-2024

Hi,

@v-xingshen-msft

I am not sure if you got notified by my answers. I tried to strip my problem down to the smallest set of data and M-code. You find it in my posts in this thread. Are you able to reproduce my problem?

Thanks

Holger

amitchandak · ‎08-26-2024

@ppvinsights , do you have any transformation steps like merge? Without merging it seems like a strange issue. Please share all the steps you have in the power query

if you have the Merge step, then check the rows post that.

Full Power BI Video 20 Hours YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

Power Query has fewer rows than power bi datamodel? #postgreSQL

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - January 2025

Fabric Community Update - January 2025

How to Get Your Question Answered Quickly

New Offer! Become a Certified Fabric Data Engineer

Power Query has fewer rows than power bi datamodel? #postgreSQL

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - January 2025

Fabric Community Update - January 2025

How to Get Your Question Answered Quickly