<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using aifunc in notebook to extract csv data in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5155046#M15935</link>
    <description>&lt;P&gt;&lt;FONT&gt;&lt;FONT&gt;Hi &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/802637"&gt;@Ira_27&lt;/a&gt;&amp;nbsp;,&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The behavior you're describing is expected. By default, ai.extract returns a list for each label, even if only one value is found. That's why you see ["ABC123"] instead of ABC123 — it's not a bug, it's how it works internally.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fix is max_items=1 (&lt;A href="https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract?tabs=labels#return" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract?tabs=labels#return&lt;/A&gt;) That parameter tells the function to return a scalar instead of a list. Give this a try this:&lt;/P&gt;&lt;P&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;labels=[&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel(&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Name",&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; description="Return only the Name without brackets or quotes.",&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_items=1,&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type="string"&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel("Age", max_items=1, type="integer"),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel("Total %", max_items=1, type="number"),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&lt;/EM&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;For other hand If your data starts at row 10, it's worth filtering those rows out before calling ai.extract — otherwise the model will try to extract values from lines that aren't actual data, which can produce empty or incorrect results for those rows.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;If my comment helped solve your question, it would be great if you could like the comment and mark it as the accepted solution. It helps others with the same issue and also motivates me to keep contributing.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;Thanks a lot, I really appreciate it.&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 20 Apr 2026 21:31:39 GMT</pubDate>
    <dc:creator>arabalca</dc:creator>
    <dc:date>2026-04-20T21:31:39Z</dc:date>
    <item>
      <title>Using aifunc in notebook to extract csv data</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5154772#M15932</link>
      <description>&lt;P class=""&gt;&lt;SPAN&gt;Hi Microsoft Fabric Community,&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;I am testing the new AI Functions in Microsoft Fabric notebooks and using &lt;/SPAN&gt;&lt;SPAN&gt;df.ai.extract()&lt;/SPAN&gt;&lt;SPAN&gt; with &lt;/SPAN&gt;&lt;SPAN&gt;aifunc.ExtractLabel()&lt;/SPAN&gt;&lt;SPAN&gt; to extract values from a csv file that has comma seperated value where data starts from 10th row.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;My code looks like this:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from synapse.ml.spark import aifunc
df = spark.read.text("Files/xxx.csv")
dfcsv = df.ai.extract(
    labels=[
        aifunc.ExtractLabel(
            "Name",
            description="Return only the Name without brackets or quotes."
        ),
        "Name",
        "Age",
        "Total %"
    ],
    input_col="value"
)&lt;/LI-CODE&gt;&lt;P class=""&gt;&lt;SPAN&gt;The extraction works, but the returned values often come back like:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;["ABC123"]
"ABC123"
['ABC123']&lt;/LI-CODE&gt;&lt;P class=""&gt;&lt;SPAN&gt;Instead of plain text: &lt;STRONG&gt;"ABC123"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;My Questions:&lt;/SPAN&gt;&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;Is this expected behavior for &lt;/SPAN&gt;&lt;SPAN&gt;aifunc.extract()&lt;/SPAN&gt;&lt;SPAN&gt;?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Is there a parameter or setting to force scalar/plain text output instead of arrays or quoted strings?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;What is the recommended production approach for cleaning or normalizing these outputs in Fabric notebooks?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;If multiple values are detected, how does the function decide array vs string return type?&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;SPAN&gt;I’d appreciate any best practices or examples from others using AI Functions in Fabric.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 16:08:17 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5154772#M15932</guid>
      <dc:creator>Ira_27</dc:creator>
      <dc:date>2026-04-20T16:08:17Z</dc:date>
    </item>
    <item>
      <title>Re: Using aifunc in notebook to extract csv data</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5155046#M15935</link>
      <description>&lt;P&gt;&lt;FONT&gt;&lt;FONT&gt;Hi &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/802637"&gt;@Ira_27&lt;/a&gt;&amp;nbsp;,&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The behavior you're describing is expected. By default, ai.extract returns a list for each label, even if only one value is found. That's why you see ["ABC123"] instead of ABC123 — it's not a bug, it's how it works internally.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fix is max_items=1 (&lt;A href="https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract?tabs=labels#return" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract?tabs=labels#return&lt;/A&gt;) That parameter tells the function to return a scalar instead of a list. Give this a try this:&lt;/P&gt;&lt;P&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;labels=[&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel(&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "Name",&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; description="Return only the Name without brackets or quotes.",&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_items=1,&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type="string"&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel("Age", max_items=1, type="integer"),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; aifunc.ExtractLabel("Total %", max_items=1, type="number"),&lt;/EM&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="comic sans ms,sans-serif"&gt;&lt;EM&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ]&lt;/EM&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;For other hand If your data starts at row 10, it's worth filtering those rows out before calling ai.extract — otherwise the model will try to extract values from lines that aren't actual data, which can produce empty or incorrect results for those rows.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;If my comment helped solve your question, it would be great if you could like the comment and mark it as the accepted solution. It helps others with the same issue and also motivates me to keep contributing.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;Thanks a lot, I really appreciate it.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 21:31:39 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5155046#M15935</guid>
      <dc:creator>arabalca</dc:creator>
      <dc:date>2026-04-20T21:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: Using aifunc in notebook to extract csv data</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5155098#M15937</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/802637"&gt;@Ira_27&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Microsoft clearly documents this about ai.extract in the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;"The default return type is a list of strings for each label. When&amp;nbsp;max_items&amp;nbsp;isn't specified, multiple matches are returned as a list."&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract?tabs=labels" target="_blank"&gt;Use ai.extract with PySpark - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Within the function you can use max_items=1 to force the function to return as a single element list, not multiple values. You can then convert the array to a scaler value by doing either&amp;nbsp;&lt;EM&gt;element_at(col("Name"), 1)&amp;nbsp;or&amp;nbsp;col("Name")[0]&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fact that the labels are surrounded by quotes is simply the model chosing to return the string in that way. This is a common behaviour.&amp;nbsp;You can trim whitespace and remove wrapping quotes as a final clean up as appropriate.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;These are some of the approaches you can use if you're planning to run this in a production capacity.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 00:09:10 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Using-aifunc-in-notebook-to-extract-csv-data/m-p/5155098#M15937</guid>
      <dc:creator>deborshi_nag</dc:creator>
      <dc:date>2026-04-21T00:09:10Z</dc:date>
    </item>
  </channel>
</rss>

