Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
DennesTorres
Post Prodigy
Post Prodigy

Group by on eventstream's lakehouse target doesn't aggregate numeric fields

Hi,

The problem is: The fields are not available to apply an aggregation function. I can make a count, but I can't make a sum, avg or other.

 

This was on the NYTaxi sample data, close the the existing real time walkthrough on Microsoft portal, but I included a lakehouse target (kusto and lakehouse, two targets) and this happened on the transformation for the lakehouse.

Is this a bug, or did I miss something? The image below illustrate the problem. The only thing which appears is "partitionId":

DennesTorres_0-1696351596900.png


Kind Regards,

 

Dennes

2 ACCEPTED SOLUTIONS

Hi @DennesTorres ,

As I understand you are facing issue in applying aggregate operations on the columns in the destination. Here we have two cases.

 

1)In the first case the destination in the eventstream is set to be Lakehouse table. The sample data which you are using (i.e) Yellow Taxi is converted to string by default in this case as we are reading the data from a CSV file which does not have type. Hence all the columns will be converted to string. For lake house, it is string by default but you can change the type using event processor no code editor.

 

2) In the second case the destination is set to be a KQL database. For kusto, it has the type as conversion by default. Hence the datatypes will not be changed and will be similar to that of the source datatypes as you have shown me in the screenshot.

 

So if you want to use the destination for the eventstream as Lakehouse and want to apply aggregate operation on your columns, we have to change the datatype of the columns. We cannot apply aggregate operation on a string datatype column. This can be achieved by using event processor no code editor. I have created a repro at my end and attaching the screenshots for your reference. I am able to apply aggregate operation on other columns as well. 

Hence it is not a bug.
Please refer this link for changing the datatypes of the columns: link 


vnikhilanmsft_1-1696606198753.png

 

 

vnikhilanmsft_0-1696606143514.png

 

vnikhilanmsft_2-1696606252556.png

Hope this helps. Do let us know if you have any further queries. Glad to help.

View solution in original post

Hi,

 

This explain and works. But it doesn't change the fact it's a terrible idea to have transformations and types completely tied witht he target.

Is bad for us, who will use the technology, because it's prone to make maintenance more difficult. It's bad for the developers because it multiply the need of development and creates the requirement to keep the development in sync in two different places - and it's not in sync now, the targets don't have the same features.

This problem was not there before. When using streaming dataflows, you can notice by the image below, the transformations are done by the dataflow, independent of the target. I could add two targets receiving the same transformation result.

DennesTorres_0-1697128897111.png


When using Stream Analytics, the transformatiosn were done by the stream analytics query language, and we could drop the result in multiple targets

DennesTorres_1-1697129062556.png

The idea to keep the transformations and even the data types tied to the target doesn't seems a good idea at all and the previous technologies were not like this.

I hope this feedback reaches the development team.

Kind Regards,

Dennes



View solution in original post

9 REPLIES 9
v-nikhilan-msft
Community Support
Community Support

Hi @DennesTorres  ,
Thanks for using Fabric Community . 

For the sample data which you are using (i.e) Yellow Taxi, the incoming event stream on that sample data has only VARCHAR columns, except for partition_ID which is of the type INTEGER. An aggregate function can only run on integers or decimals. This is the reason why we are able to apply aggregate only on partion_ID. I have attached the screenshot for your reference.

Hope this helps. Do let us know if you have any further queries.

 

 

vnikhilanmsft_0-1696418621736.png

Hi,

On the image below, you may notice 3 real fields and one long, from the Yellow taxi sample data.

I even converted one of the real fields into integer.

 

DennesTorres_0-1696539672712.png

 


So, this explanation that "all the fields are string", would not possible, unless the bug with the lakehouse target is way bigger than I though and each target is recognizing different data types from the same source, what should not happen at all.

Did I miss something or is this a bug on how the targets are recognizing the data types?

Kind Regards,

Dennes

Hi @DennesTorres ,

As I understand you are facing issue in applying aggregate operations on the columns in the destination. Here we have two cases.

 

1)In the first case the destination in the eventstream is set to be Lakehouse table. The sample data which you are using (i.e) Yellow Taxi is converted to string by default in this case as we are reading the data from a CSV file which does not have type. Hence all the columns will be converted to string. For lake house, it is string by default but you can change the type using event processor no code editor.

 

2) In the second case the destination is set to be a KQL database. For kusto, it has the type as conversion by default. Hence the datatypes will not be changed and will be similar to that of the source datatypes as you have shown me in the screenshot.

 

So if you want to use the destination for the eventstream as Lakehouse and want to apply aggregate operation on your columns, we have to change the datatype of the columns. We cannot apply aggregate operation on a string datatype column. This can be achieved by using event processor no code editor. I have created a repro at my end and attaching the screenshots for your reference. I am able to apply aggregate operation on other columns as well. 

Hence it is not a bug.
Please refer this link for changing the datatypes of the columns: link 


vnikhilanmsft_1-1696606198753.png

 

 

vnikhilanmsft_0-1696606143514.png

 

vnikhilanmsft_2-1696606252556.png

Hope this helps. Do let us know if you have any further queries. Glad to help.

Hi,

 

This explain and works. But it doesn't change the fact it's a terrible idea to have transformations and types completely tied witht he target.

Is bad for us, who will use the technology, because it's prone to make maintenance more difficult. It's bad for the developers because it multiply the need of development and creates the requirement to keep the development in sync in two different places - and it's not in sync now, the targets don't have the same features.

This problem was not there before. When using streaming dataflows, you can notice by the image below, the transformations are done by the dataflow, independent of the target. I could add two targets receiving the same transformation result.

DennesTorres_0-1697128897111.png


When using Stream Analytics, the transformatiosn were done by the stream analytics query language, and we could drop the result in multiple targets

DennesTorres_1-1697129062556.png

The idea to keep the transformations and even the data types tied to the target doesn't seems a good idea at all and the previous technologies were not like this.

I hope this feedback reaches the development team.

Kind Regards,

Dennes



Thank you for sharing this feedback. We heard this feedback. The team is working on bringing the event processing logic design to the main canvas that can enable the common transformation logic be reused by multiple destinations. 

Hi @DennesTorres ,
Thanks for your valuable feedback. We can certainly keep an eye on it for upvote.

Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

Hi @DennesTorres ,
We haven’t heard from you on the last response and was just checking back to see if your issue got resolved. Please let us know if you have any further queries.

Hi @DennesTorres ,

We haven’t heard from you on the last response and was just checking back to see if your issue got resolved. Please let us know if you have any further queries.

Hi @DennesTorres ,
Glad that your query got resolved. Please continue using Fabric Community for any help regarding your queries.

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors