Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
I am trying to get the data from an XML based API into a datalake. Using web activities I succeeded retrieving the data, but I couldn't find a way to load that output into a datalake. I tried the copy data with a rest connection as source, this resulted in a 500 error. Most of the solutions I could find were options I don't see in my UI (I will attach some screenshots).
The most important thing is just to get the data in datalake. It doesn't matter what activities I use. If possible, I like to use the output of a web activity (which like I mentioned does retrieve data).
Solved! Go to Solution.
Here is what I did to test that I could get it to work.
from io import StringIO # https://docs.python.org/3/library/io.html#io.StringIO
import pandas as pd # https://pandas.pydata.org/docs/reference/index.html
import requests # https://requests.readthedocs.io/en/latest/
# Setup Request Variables.
base_url = 'https://vsac.nlm.nih.gov/vsac/svs/'
end_point = 'RetrieveMultipleValueSets'
parameters = {'id':'2.16.840.1.113883.3.117.1.7.1.226','release':'eCQM Update 2020-05-07'}
# Request - Basic Authentication values have been removed
response = requests.get(f"{base_url}{end_point}",params=parameters,auth=('userName','password'))
# Warns about using response.text directly using StringIO as suggested
xml_io_concept = StringIO(response.text)
# Define namespaces
namespaces = {
'ns0': 'urn:ihe:iti:svs:2008',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}
# Parse the XML and extract Concept elements
pandas_df = pd.read_xml( # https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
xml_io_concept,
xpath=".//ns0:Concept",
namespaces=namespaces
)
# Convert Pandas to PySpark
spark_df = spark.createDataFrame(pandas_df)
# Write to table in default Lakehouse
spark_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(f"Tables/test_value_set")
# Query Lakehouse Table
df = spark.sql("SELECT * FROM ValueSet_Lake.test_value_set LIMIT 1000")
display(df)
I solved the copy data by using a http connector with basic auth instead of a REST connection.
This was not obvious because the API uses tokens for auth and a http connection anomynous failed.
Hi @Jorn_vb
To retrieve XML data from an API and load it into a Microsoft Fabric Lakehouse, you can use a combination of Web Activity and Copy Activity in your data pipeline. Here's a step-by-step approach to accomplish this:
Please accept this solution and give kudos , if this helps .
Thanks
Can you explain: Choose "REST" as the source type. How can I set the Request body for a GET request? It only shows up for me when I switch it to POST?
The screenshots on this page do not look like my UI (see screenshot in my post). With what I know it does not make sense to put the response (data I want) into the body of a request? But that is probaply just because what I see mismatches what you see. Do you know anything that can help me?
The screen shots do not match what I see either and I was at a loss on how to create a dummy request, I am new to this. I ended up using a notebook with the requests and pandas libraries to import the xml API data into my lakehouse tables
Here is what I did to test that I could get it to work.
from io import StringIO # https://docs.python.org/3/library/io.html#io.StringIO
import pandas as pd # https://pandas.pydata.org/docs/reference/index.html
import requests # https://requests.readthedocs.io/en/latest/
# Setup Request Variables.
base_url = 'https://vsac.nlm.nih.gov/vsac/svs/'
end_point = 'RetrieveMultipleValueSets'
parameters = {'id':'2.16.840.1.113883.3.117.1.7.1.226','release':'eCQM Update 2020-05-07'}
# Request - Basic Authentication values have been removed
response = requests.get(f"{base_url}{end_point}",params=parameters,auth=('userName','password'))
# Warns about using response.text directly using StringIO as suggested
xml_io_concept = StringIO(response.text)
# Define namespaces
namespaces = {
'ns0': 'urn:ihe:iti:svs:2008',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}
# Parse the XML and extract Concept elements
pandas_df = pd.read_xml( # https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
xml_io_concept,
xpath=".//ns0:Concept",
namespaces=namespaces
)
# Convert Pandas to PySpark
spark_df = spark.createDataFrame(pandas_df)
# Write to table in default Lakehouse
spark_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(f"Tables/test_value_set")
# Query Lakehouse Table
df = spark.sql("SELECT * FROM ValueSet_Lake.test_value_set LIMIT 1000")
display(df)
Check out the March 2025 Fabric update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
10 | |
4 | |
3 | |
2 | |
2 |