Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Element115
Power Participant
Power Participant

BUG::percentage data type error

ISSUE:  

Let's say you have an on-prem DB and a table with a numeric type column: NUMERIC(3,2), which represents percentages.  You also have 2 DFg2s in a chain, DF1-->DF2, so the 2nd DFg2 ingests from the 1st DFg2, ie the 1st DFg2 is the source of the 2nd DFg2. And the source of the 1st DFg2 is the on-prem DB.

 

In the 1st DFg2, you change the column type to Percentage and publish.  There is only one query and Enable staging is ON.

 

In the 2nd DFg2, that same column of percentages comes through as datatype decimal. So the change type operation has been lost.  You set this column again to datatype Percentage in the 2nd DFg2.  There is only one query and Enable staging is OFF.

 

You set the data destination to a new table in a user created lakehouse.  And you publish.  

 

Publish op terminates successfully. Refresh op starts and terminates with the following error:

 

 

104100 Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: Expression.Error: Error in replacing table's content with new data in a version: #{0}., InnerException: The number is out of range of a 64 bit integer value., Underlying error: The number is out of range of a 64 bit integer value. Details: Reason = Expression.Error;Message = The number is out of range of a 64 bit integer value.;Detail = 0.95;Message.Format = The number is out of range of a 64 bit integer value.;Microsoft.Data.Mashup.Error.Context = User

 

 

Now, you will notice that since we are dealing with values representing percentages, there are all within the range [0..1], or after conversion, [0..100]. There are no values that can blow up a 64 bit integer type. Besides, why are we even dealing with an int type here, since at the source we are already in the decimal domain?

 

Finally, the query outputs only about 2.2+ million rows. 

1 ACCEPTED SOLUTION

It seems we are dealing with a transient bug of some sort... now I can't reproduce it either! Unreal.  

 

re native queries vs folding, if I remember correctly, I opted to offload the sort to the DB engine rather than do it in the PQ engine because isn't it more efficient to have the source DB do as much of the heavy lifting as possible rather than PQ in the cloud, especially for a M script as this one that does nothing but navigate to the data and one type cast?

View solution in original post

19 REPLIES 19
miguel
Community Admin
Community Admin

It appears as if your column expects integers, but you are using a decimal during the output destination settings. Can you check if the data type in the data destination settings matches the data type that your table in your destination expects?

 

The Currency and Percentage data types act primarily as a data format in your Power Query editor, but they're both still saving things as decimals behind the scenes as there are no percentage data types in a Lakehouse or SQL Server; its just representations of decimals.

One more thing, after I change the type from percentage back to decimal, which I just did, all DFg2 execute without error.

 

So this raises an interesting question:  Why even have a percentage type in Power Query if it's only for display purposes?

Power Query is a product that is integrated in a wide range of other products such as Excel, Power BI, PowerApps, Power Automate and more. There are certain use cases and requirements for certain users where there is a need to see the data in a particular format and to display information in a particular way for better interpretation of the data.

 

For example, if you try the code below you'll see how they are effectively the same value but one gets truncated and the other two only have differences in data format inside of the data preview:

let
    Source = #table( type table [A = Percentage.Type, B= Decimal.Type, C=  Int64.Type ], {  {Percentage.From(0.2), Decimal.From(0.2), Int64.From(0.2)}})
in
    Source

miguel_0-1709003507099.png

Therefore, if you try loading a Percentage (which is effectively a decimal) to a column in your destination that expects integers then the operation will fail as you can't insert decimal values to a column that expects integers.

 

I still haven't been able to replicate the behavior that you originally mentioned, but if you can share how your data destination settings looks like with a screenshot or share a sample of what you see in the data preview of the column thats raising the error that could help us determine what could be happening. 

So, the second DF takes the previous DF as Source:

 

 

let
  Source = PowerPlatform.Dataflows([]),
  #"Navigation 1" = Source{[Id = "Workspaces"]}[Data],
  #"Navigation 2" = #"Navigation 1"{[workspaceId = "142e0ce2-c040-436d-8eb8-58aea822ed45"]}[Data],
  #"Navigation 3" = #"Navigation 2"{[dataflowId = "e233d8a0-0664-410b-89db-04d7d2f8937d"]}[Data],
  Navigation = #"Navigation 3"{[entity = "FACT_RWS_15M", version = ""]}[Data]
in
  Navigation

 

and the screenshot shows that the column has reverted back to decimal type

Screenshot 2024-02-26 223154.jpg

 

so using the UI, I change the type back to Percentage:

 

let
  Source = PowerPlatform.Dataflows([]),
  #"Navigation 1" = Source{[Id = "Workspaces"]}[Data],
  #"Navigation 2" = #"Navigation 1"{[workspaceId = "142e0ce2-c040-436d-8eb8-58aea822ed45"]}[Data],
  #"Navigation 3" = #"Navigation 2"{[dataflowId = "e233d8a0-0664-410b-89db-04d7d2f8937d"]}[Data],
  Navigation = #"Navigation 3"{[entity = "FACT_RWS_15M", version = ""]}[Data],
  #"Changed column type" = Table.TransformColumnTypes(Navigation, {{"RELATIVE_HUMIDITY", Percentage.Type}})
in
  #"Changed column type"

 

and I create a new table in the lakehouse like so:

Screenshot 2024-02-26 223518.jpg

and this time there is no error!!!  I'll be damned!  This is what the lakehouse table column looks like:

Screenshot 2024-02-26 223950.jpg

Ok so this is from DF1:

 

let
  Source                  = Sql.Database(
                                          "SERVER_NAME", 
                                          "ONPREM_DB_NAME", 
                                          [
                                            Query = "SELECT *#(lf)FROM vAll_ESS_RWS_15M#(lf)ORDER BY [DATE] ASC, M15 ASC;", 
                                            CreateNavigationProperties = false
                                          ]
                                        ),
  #"Changed column type"  = Table.TransformColumnTypes(
                                                        Source, 
                                                        {
                                                          {"DATE", type date}, 
                                                          {"M15", type datetime}, 
                                                          {"HOUR", type datetime},
                                                          {"ATMOSPHERIC_PRESSURE", Int64.Type}, 
                                                          {"BATTERY_STATUS", Int64.Type}, 
                                                          {"PAVEMENT_CHEMICAL_CONCENTRATION", Int64.Type}, 
                                                          {"PAVEMENT_BLOCK", Int64.Type}, 
                                                          {"PRECIPITATION_SITUATION", Int64.Type}, 
                                                          {"RELATIVE_HUMIDITY", Percentage.Type},  // do not use, causes INT64 overflow bug at random
                                                          {"VISIBILITY", Int64.Type}, 
                                                          {"WIND_GUST_DIRECTION", Int64.Type}, 
                                                          {"WIND_GUST_SPEED", Int64.Type}, 
                                                          {"SOLAR_RADIATION", Int64.Type}
                                                        }
                                                      ),
  #"Added index"          = Table.AddIndexColumn(
                                                  #"Changed column type", 
                                                  "Index", 
                                                  1, 
                                                  1, 
                                                  Int64.Type
                                                )
in
  #"Added index"

 

and screenshot

Screenshot 2024-02-26 222604.jpg

This DF is the source for the next DF. Next post is for next DF.

It is recommended that you do not use the native query approach when you're only connecting to a view so you can leverage all the capabilities of query folding.

 

If you absolutely need to use a native query, I'd highly recommend using the approach suggested in the article below with Value.NativeQuery to get the best performance possible by leveraging as much as possible the query folding capabilities:

Import data from a database using native database query - Power Query | Microsoft Learn

 

I tried doing this. Loaded the data using a dataflow to the dataflow storage. Then consumed such data using another Dataflow Gen2 and the Dataflow connector to only then load the data to a new table in my Lakehouse (staging disabled), but I'm still unable to repro the issue.

So it just occurred to me, why not ask Copilot! 😀  And this is the answer I got:

 

Based on the web page context, the best answer is:

- **Use a native SQL statement to query a database view and sort it in the database engine**. This approach is more efficient and faster than using Power Query to sort the data, because it leverages the capabilities of the database engine and reduces the amount of data that needs to be transferred and processed by Power Query. According to the web page, "When you use a native query, the query is executed by the data source, and only the results are returned to Power Query. This can improve performance and reduce memory consumption, especially for large data sets."

- **Not use a native SQL statement and do the sort with Power Query**. This approach is less optimal and slower than using a native SQL statement, because it requires Power Query to fetch all the data from the database view and then sort it locally. This can increase the network traffic and the memory consumption of Power Query, and potentially cause performance issues or errors. According to the web page, "When you don't use a native query, Power Query retrieves all the data from the data source and then applies any filters or transformations. This can result in transferring more data than necessary and consuming more memory resources."

: [Use native database queries in Power Query](https://docs.microsoft.com/en-us/power-query/connectors/native-database-queries)

 

Since I am not an optimization expert yet, what do you think? Is Copilot hallucinating 😉 or correct?  The issue is that although a small dataset now (2.2 million rows, it's gonna grow to become huge.  So if you don't have the right Fabric capacity, I am not sure how well a sort would fare in the cloud engine of PQ.

It seems we are dealing with a transient bug of some sort... now I can't reproduce it either! Unreal.  

 

re native queries vs folding, if I remember correctly, I opted to offload the sort to the DB engine rather than do it in the PQ engine because isn't it more efficient to have the source DB do as much of the heavy lifting as possible rather than PQ in the cloud, especially for a M script as this one that does nothing but navigate to the data and one type cast?

You can decide what works best for you and your workflow. We do highly recommend to use the approach that I shared in my previous reply should you ever want to consider to go the route of a native query (like SQL) instead of leveraging the direct connection to a view or another object and let query folding take care of everything else.

 

I would also highly recommend the article below to better understand how things get evaluated in Power Query:

https://learn.microsoft.com/en-us/power-query/query-folding-basics

I see... didn't realize the PQ engine was this sophisticated.  I got a question: Since the user can specify a native SQL query when using the Get Data dialog, in what kind of scenario would specifying a SQL statement to get data be useful sinceit  seems that the PQ engine can fold pretty much anything when it talks with as SQL Server?

It would be on a case by case basis, but if you have a really complex native query that you'd rather not recreate in Power Query (using the Power Query UI) then perhaps using that native query statement would be better. But even then, there's a conversation to have if perhaps "packaging" that as a view and consuming a view from your source is a better approach than using a Native Query.

Can you explain why it is recommended to use query folding when the source is a view?  And if I understand your statement from yesterday correctly, it also implies that if the source is a table and not a view, the native SQL queries are OK instead of query folding.  But why is query folding better for views?

It boils down to how things will be evaluated and how much of the evaluation can be pushed to your source. 

Here's a few other articles that explain better what query folding is and how important it can be so you can even try to see how it might impact your queries:

Query folding examples in Power Query - Power Query | Microsoft Learn

Query folding indicators in Power Query - Power Query | Microsoft Learn

Query plan - Power Query | Microsoft Learn

 

For more questions around query folding and query evaluation, I'd recommend creating another topic / thread in the forum.

I created a query using the code above and set my destination to be a new table in a Lakehouse:

miguel_0-1709004092487.png

When I published and refreshed the dataflow, it succeeded and I can see my new table and the data inside of it:

miguel_1-1709004256159.png

Do you think that you could share a set of repro steps that I could try to repro the issue? You can use the previous sample code or perhaps another sample data source like Northwind (which uses anonymous authentication services.odata.org/v4/northwind/northwind.svc/)

 

Hold on... The only diff between what you are doing and what I am doing is that I am casting from decimal to percentage.  I suspect internally there is somethign funky with this cast op.  Will post the result of the second DF now.

I shared the screenshot... and now I will revert back to percentage in both DFg2s... It's gonna take some time... 

I don't see how my column would expect integers as the source, an on-prem SQL Server view, has the column defined as a NUMERIC.  So it comes in with the decimal data type in the first DFg2.  And when the second DFg2 ingests from the first one, the data type also shows as decimal. 

 

Then, after having deleted the lakehouse table, I publish the second DFg2 to that same lakehouse in append mode and with create new table. 

 

It was working like this until a couple of hours ago when I thought to try to change the data type in the chain from decimal to percentage and that's when it all went wrong.

 

Therefore I will maintain we are dealing with a bug here. 😁

I haven't been able to repro this.

Could share a screenshot of how your data destination settings looks like? (specifically the section that showcases the mapping of columns and data types)

Here we go. Forgot to scroll down the dialog. It used to look like this:

 

Screenshot 2024-02-26 220714.jpg

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

March 2024 FBC Gallery Image

Fabric Monthly Update - March 2024

Check out the March 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.