Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
DennesTorres
Post Prodigy
Post Prodigy

RunMultiple DAG and tricks

Hi,

In summary, my main question is:

If I need to use RunMultiple to call an external API multiple times and get the JSON result, are there specific tricks to be used to extract the API result from inside the DAG result ?

Let me explain better why I'm asking:

I have a notebook which makes these calls sequentially. I want to improve the performance changing to parallel execution.

On the original notebook, when I call the API, this is the piece of code which starts processing the result:

    result = requests.post(function_url, data = body, headers=headers)
    
    if result.status_code!=200:
        result.raise_for_status()
    
    data=json.loads(result.text)

    if data==None:
        continue

    for serviceLine in data:

 

From this point forwards, everythink works well.

When using RunMultiple, I created a parameterized notebook to make one call and return the result. This is how I'm returning the result:

result = requests.post(function_url, data = body, headers=headers)

if result.status_code!=200:
    result.raise_for_status()
    
data=json.loads(result.text)

mssparkutils.notebook.exit(data)

 

The problem is: Everything is arriving differently and I'm having to make more changes to the processing code (not visible on the code blocks above) than I expected.

First, to extract the JSON from the DAG result, I had to make some transformations on it:


def prepareJSON(jsonValue):
    jsonValue = jsonValue.replace("\'", "\"")
    jsonValue = jsonValue.replace('None','[]')
    jsonValue = jsonValue.replace('True','true')
    jsonValue = jsonValue.replace('False','false')
    return(jsonValue)  

    exitVal=result[account_number]['exitVal']

    if exitVal==None:
        continue

    exitVal = prepareJSON(exitVal)
    row=json.loads(exitVal)

 

On the code above, "account_number" is the value used as the name of the activity in the DAG. 

 

If it had stopped here, I would not be so concerned. But at the end of the processing it generated errors because it can't join the results, some columns have mixed data types.

Back to the question: Is there any special tricks or suggestions to extract a value from inside the DAG without having to change so much my original processing code ?

Kind Regards,

 

Dennes

2 ACCEPTED SOLUTIONS

Hi @DennesTorres
The internal team replied as follows:

Just a suggestion, but seems like it's simply a problem with json->string->json conversion. Perhaps try json.dumps at notebook exit to safely convert the json to string and then json.loads once you've combined them together? Also remember the new soft limit for runMultiple is by default 50 so may be better to convert to python multithreading but note this will only run on the driver node so use single node pool

Hope this helps. Please let me know if you have any further questions.

View solution in original post

Hi,

Thank you. Yes, I found a similar solution. 

The problem was the load. Making json.load before returning the data converts string -> dictionary . When returning the dictionary in the DAG, the conversion from dictionary -> string doesn't result in the same string.

The solution was to not use JSON load inside the parallel execution. I returned the original string, retrieved outside and used the json.load outside the parallel exeuction. It work perfectly.

Is the same as the recommendation.

About the runMultiple limit, I'm aware. About python multithreading, it's something I need to explore.

Kind Regards,

Dennes

View solution in original post

3 REPLIES 3
v-nikhilan-msft
Community Support
Community Support

Hi @DennesTorres 
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks 

Hi @DennesTorres
The internal team replied as follows:

Just a suggestion, but seems like it's simply a problem with json->string->json conversion. Perhaps try json.dumps at notebook exit to safely convert the json to string and then json.loads once you've combined them together? Also remember the new soft limit for runMultiple is by default 50 so may be better to convert to python multithreading but note this will only run on the driver node so use single node pool

Hope this helps. Please let me know if you have any further questions.

Hi,

Thank you. Yes, I found a similar solution. 

The problem was the load. Making json.load before returning the data converts string -> dictionary . When returning the dictionary in the DAG, the conversion from dictionary -> string doesn't result in the same string.

The solution was to not use JSON load inside the parallel execution. I returned the original string, retrieved outside and used the json.load outside the parallel exeuction. It work perfectly.

Is the same as the recommendation.

About the runMultiple limit, I'm aware. About python multithreading, it's something I need to explore.

Kind Regards,

Dennes

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.