<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic RunMultiple DAG and tricks in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3878869#M1237</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;In summary, my main question is:&lt;BR /&gt;&lt;BR /&gt;If I need to use RunMultiple to call an external API multiple times and get the JSON result, are there specific tricks to be used to extract the API result from inside the DAG result ?&lt;BR /&gt;&lt;BR /&gt;Let me explain better why I'm asking:&lt;BR /&gt;&lt;BR /&gt;I have a notebook which makes these calls sequentially. I want to improve the performance changing to parallel execution.&lt;BR /&gt;&lt;BR /&gt;On the original notebook, when I call the API, this is the piece of code which starts processing the result:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    result = requests.post(function_url, data = body, headers=headers)
    
    if result.status_code!=200:
        result.raise_for_status()
    
    data=json.loads(result.text)

    if data==None:
        continue

    for serviceLine in data:&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From this point forwards, everythink works well.&lt;BR /&gt;&lt;BR /&gt;When using RunMultiple, I created a parameterized notebook to make one call and return the result. This is how I'm returning the result:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;result = requests.post(function_url, data = body, headers=headers)

if result.status_code!=200:
    result.raise_for_status()
    
data=json.loads(result.text)

mssparkutils.notebook.exit(data)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is: Everything is arriving differently and I'm having to make more changes to the processing code (not visible on the code blocks above) than I expected.&lt;BR /&gt;&lt;BR /&gt;First, to extract the JSON from the DAG result, I had to make some transformations on it:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;def prepareJSON(jsonValue):
    jsonValue = jsonValue.replace("\'", "\"")
    jsonValue = jsonValue.replace('None','[]')
    jsonValue = jsonValue.replace('True','true')
    jsonValue = jsonValue.replace('False','false')
    return(jsonValue)  

    exitVal=result[account_number]['exitVal']

    if exitVal==None:
        continue

    exitVal = prepareJSON(exitVal)
    row=json.loads(exitVal)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the code above, "account_number" is the value used as the name of the activity in the DAG.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If it had stopped here, I would not be so concerned. But at the end of the processing it generated errors because it can't join the results, some columns have mixed data types.&lt;BR /&gt;&lt;BR /&gt;Back to the question: Is there any special tricks or suggestions to extract a value from inside the DAG without having to change so much my original processing code ?&lt;BR /&gt;&lt;BR /&gt;Kind Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dennes&lt;/P&gt;</description>
    <pubDate>Tue, 30 Apr 2024 10:01:22 GMT</pubDate>
    <dc:creator>DennesTorres</dc:creator>
    <dc:date>2024-04-30T10:01:22Z</dc:date>
    <item>
      <title>RunMultiple DAG and tricks</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3878869#M1237</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;In summary, my main question is:&lt;BR /&gt;&lt;BR /&gt;If I need to use RunMultiple to call an external API multiple times and get the JSON result, are there specific tricks to be used to extract the API result from inside the DAG result ?&lt;BR /&gt;&lt;BR /&gt;Let me explain better why I'm asking:&lt;BR /&gt;&lt;BR /&gt;I have a notebook which makes these calls sequentially. I want to improve the performance changing to parallel execution.&lt;BR /&gt;&lt;BR /&gt;On the original notebook, when I call the API, this is the piece of code which starts processing the result:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    result = requests.post(function_url, data = body, headers=headers)
    
    if result.status_code!=200:
        result.raise_for_status()
    
    data=json.loads(result.text)

    if data==None:
        continue

    for serviceLine in data:&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From this point forwards, everythink works well.&lt;BR /&gt;&lt;BR /&gt;When using RunMultiple, I created a parameterized notebook to make one call and return the result. This is how I'm returning the result:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;result = requests.post(function_url, data = body, headers=headers)

if result.status_code!=200:
    result.raise_for_status()
    
data=json.loads(result.text)

mssparkutils.notebook.exit(data)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is: Everything is arriving differently and I'm having to make more changes to the processing code (not visible on the code blocks above) than I expected.&lt;BR /&gt;&lt;BR /&gt;First, to extract the JSON from the DAG result, I had to make some transformations on it:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;def prepareJSON(jsonValue):
    jsonValue = jsonValue.replace("\'", "\"")
    jsonValue = jsonValue.replace('None','[]')
    jsonValue = jsonValue.replace('True','true')
    jsonValue = jsonValue.replace('False','false')
    return(jsonValue)  

    exitVal=result[account_number]['exitVal']

    if exitVal==None:
        continue

    exitVal = prepareJSON(exitVal)
    row=json.loads(exitVal)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the code above, "account_number" is the value used as the name of the activity in the DAG.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If it had stopped here, I would not be so concerned. But at the end of the processing it generated errors because it can't join the results, some columns have mixed data types.&lt;BR /&gt;&lt;BR /&gt;Back to the question: Is there any special tricks or suggestions to extract a value from inside the DAG without having to change so much my original processing code ?&lt;BR /&gt;&lt;BR /&gt;Kind Regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dennes&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2024 10:01:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3878869#M1237</guid>
      <dc:creator>DennesTorres</dc:creator>
      <dc:date>2024-04-30T10:01:22Z</dc:date>
    </item>
    <item>
      <title>Re: RunMultiple DAG and tricks</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3879042#M1238</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/599172"&gt;@DennesTorres&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Thanks for using Fabric Community.&lt;BR /&gt;At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them. &lt;BR /&gt;Thanks&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2024 10:44:17 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3879042#M1238</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-04-30T10:44:17Z</dc:date>
    </item>
    <item>
      <title>Re: RunMultiple DAG and tricks</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3886771#M1239</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/599172"&gt;@DennesTorres&lt;/a&gt;&lt;BR /&gt;The internal team replied as follows:&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;Just a suggestion, but seems like it's simply a problem with json-&amp;gt;string-&amp;gt;json conversion. Perhaps try json.dumps at notebook exit to safely convert the json to string and then json.loads once you've combined them together? Also remember the new soft limit for runMultiple is by default 50 so may be better to convert to python multithreading but note this will only run on the driver node so use single node pool&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;Hope this helps. Please let me know if you have any further questions.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 17:04:40 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3886771#M1239</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-05-02T17:04:40Z</dc:date>
    </item>
    <item>
      <title>Re: RunMultiple DAG and tricks</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3887110#M1240</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;Thank you. Yes, I found a similar solution.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The problem was the load. Making json.load before returning the data converts string -&amp;gt; dictionary . When returning the dictionary in the DAG, the conversion from dictionary -&amp;gt; string doesn't result in the same string.&lt;BR /&gt;&lt;BR /&gt;The solution was to not use JSON load inside the parallel execution. I returned the original string, retrieved outside and used the json.load outside the parallel exeuction. It work perfectly.&lt;BR /&gt;&lt;BR /&gt;Is the same as the recommendation.&lt;BR /&gt;&lt;BR /&gt;About the runMultiple limit, I'm aware. About python multithreading, it's something I need to explore.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Kind Regards,&lt;BR /&gt;&lt;BR /&gt;Dennes&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2024 19:58:09 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/RunMultiple-DAG-and-tricks/m-p/3887110#M1240</guid>
      <dc:creator>DennesTorres</dc:creator>
      <dc:date>2024-05-02T19:58:09Z</dc:date>
    </item>
  </channel>
</rss>

