Solved: Group by on eventstream's lakehouse target doesn't...

DennesTorres · ‎10-03-2023

Hi,

The problem is: The fields are not available to apply an aggregation function. I can make a count, but I can't make a sum, avg or other.

This was on the NYTaxi sample data, close the the existing real time walkthrough on Microsoft portal, but I included a lakehouse target (kusto and lakehouse, two targets) and this happened on the transformation for the lakehouse.

Is this a bug, or did I miss something? The image below illustrate the problem. The only thing which appears is "partitionId":

Kind Regards,

Dennes

v-nikhilan-msft · ‎10-06-2023

Hi @DennesTorres ,

As I understand you are facing issue in applying aggregate operations on the columns in the destination. Here we have two cases.

1)In the first case the destination in the eventstream is set to be Lakehouse table. The sample data which you are using (i.e) Yellow Taxi is converted to string by default in this case as we are reading the data from a CSV file which does not have type. Hence all the columns will be converted to string. For lake house, it is string by default but you can change the type using event processor no code editor.

2) In the second case the destination is set to be a KQL database. For kusto, it has the type as conversion by default. Hence the datatypes will not be changed and will be similar to that of the source datatypes as you have shown me in the screenshot.

So if you want to use the destination for the eventstream as Lakehouse and want to apply aggregate operation on your columns, we have to change the datatype of the columns. We cannot apply aggregate operation on a string datatype column. This can be achieved by using event processor no code editor. I have created a repro at my end and attaching the screenshots for your reference. I am able to apply aggregate operation on other columns as well.

Hence it is not a bug.
Please refer this link for changing the datatypes of the columns: link

Hope this helps. Do let us know if you have any further queries. Glad to help.

View solution in original post

DennesTorres · ‎10-12-2023

Hi,

This explain and works. But it doesn't change the fact it's a terrible idea to have transformations and types completely tied witht he target.

Is bad for us, who will use the technology, because it's prone to make maintenance more difficult. It's bad for the developers because it multiply the need of development and creates the requirement to keep the development in sync in two different places - and it's not in sync now, the targets don't have the same features.

This problem was not there before. When using streaming dataflows, you can notice by the image below, the transformations are done by the dataflow, independent of the target. I could add two targets receiving the same transformation result.

When using Stream Analytics, the transformatiosn were done by the stream analytics query language, and we could drop the result in multiple targets

The idea to keep the transformations and even the data types tied to the target doesn't seems a good idea at all and the previous technologies were not like this.

I hope this feedback reaches the development team.

Kind Regards,

Dennes

View solution in original post

v-nikhilan-msft · ‎10-04-2023

Hi @DennesTorres ,
Thanks for using Fabric Community .

For the sample data which you are using (i.e) Yellow Taxi, the incoming event stream on that sample data has only VARCHAR columns, except for partition_ID which is of the type INTEGER. An aggregate function can only run on integers or decimals. This is the reason why we are able to apply aggregate only on partion_ID. I have attached the screenshot for your reference.

Hope this helps. Do let us know if you have any further queries.

DennesTorres · ‎10-05-2023

Hi,

On the image below, you may notice 3 real fields and one long, from the Yellow taxi sample data.

I even converted one of the real fields into integer.

So, this explanation that "all the fields are string", would not possible, unless the bug with the lakehouse target is way bigger than I though and each target is recognizing different data types from the same source, what should not happen at all.

Did I miss something or is this a bug on how the targets are recognizing the data types?

Kind Regards,

Dennes

v-nikhilan-msft · ‎10-06-2023

Hi @DennesTorres ,

As I understand you are facing issue in applying aggregate operations on the columns in the destination. Here we have two cases.

1)In the first case the destination in the eventstream is set to be Lakehouse table. The sample data which you are using (i.e) Yellow Taxi is converted to string by default in this case as we are reading the data from a CSV file which does not have type. Hence all the columns will be converted to string. For lake house, it is string by default but you can change the type using event processor no code editor.

2) In the second case the destination is set to be a KQL database. For kusto, it has the type as conversion by default. Hence the datatypes will not be changed and will be similar to that of the source datatypes as you have shown me in the screenshot.

So if you want to use the destination for the eventstream as Lakehouse and want to apply aggregate operation on your columns, we have to change the datatype of the columns. We cannot apply aggregate operation on a string datatype column. This can be achieved by using event processor no code editor. I have created a repro at my end and attaching the screenshots for your reference. I am able to apply aggregate operation on other columns as well.

Hence it is not a bug.
Please refer this link for changing the datatypes of the columns: link

Hope this helps. Do let us know if you have any further queries. Glad to help.

DennesTorres · ‎10-12-2023

Hi,

This explain and works. But it doesn't change the fact it's a terrible idea to have transformations and types completely tied witht he target.

Is bad for us, who will use the technology, because it's prone to make maintenance more difficult. It's bad for the developers because it multiply the need of development and creates the requirement to keep the development in sync in two different places - and it's not in sync now, the targets don't have the same features.

This problem was not there before. When using streaming dataflows, you can notice by the image below, the transformations are done by the dataflow, independent of the target. I could add two targets receiving the same transformation result.

When using Stream Analytics, the transformatiosn were done by the stream analytics query language, and we could drop the result in multiple targets

The idea to keep the transformations and even the data types tied to the target doesn't seems a good idea at all and the previous technologies were not like this.

I hope this feedback reaches the development team.

Kind Regards,

Dennes

xujx · ‎10-12-2023

Thank you for sharing this feedback. We heard this feedback. The team is working on bringing the event processing logic design to the main canvas that can enable the common transformation logic be reused by multiple destinations.

v-nikhilan-msft · ‎10-12-2023

Hi @DennesTorres ,
Thanks for your valuable feedback. We can certainly keep an eye on it for upvote.

Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

v-nikhilan-msft · ‎10-05-2023

Hi @DennesTorres ,
We haven’t heard from you on the last response and was just checking back to see if your issue got resolved. Please let us know if you have any further queries.

v-nikhilan-msft · ‎10-12-2023

Hi @DennesTorres ,

We haven’t heard from you on the last response and was just checking back to see if your issue got resolved. Please let us know if you have any further queries.

v-nikhilan-msft · ‎10-12-2023

Hi @DennesTorres ,
Glad that your query got resolved. Please continue using Fabric Community for any help regarding your queries.

Group by on eventstream's lakehouse target doesn't aggregate numeric fields

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024