Don't miss your chance to take exam DP-600 or DP-700 on us!
Request nowFabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now
I am trying to get the data from an XML based API into a datalake. Using web activities I succeeded retrieving the data, but I couldn't find a way to load that output into a datalake. I tried the copy data with a rest connection as source, this resulted in a 500 error. Most of the solutions I could find were options I don't see in my UI (I will attach some screenshots).
The most important thing is just to get the data in datalake. It doesn't matter what activities I use. If possible, I like to use the output of a web activity (which like I mentioned does retrieve data).
Solved! Go to Solution.
Here is what I did to test that I could get it to work.
from io import StringIO # https://docs.python.org/3/library/io.html#io.StringIO
import pandas as pd # https://pandas.pydata.org/docs/reference/index.html
import requests # https://requests.readthedocs.io/en/latest/
# Setup Request Variables.
base_url = 'https://vsac.nlm.nih.gov/vsac/svs/'
end_point = 'RetrieveMultipleValueSets'
parameters = {'id':'2.16.840.1.113883.3.117.1.7.1.226','release':'eCQM Update 2020-05-07'}
# Request - Basic Authentication values have been removed
response = requests.get(f"{base_url}{end_point}",params=parameters,auth=('userName','password'))
# Warns about using response.text directly using StringIO as suggested
xml_io_concept = StringIO(response.text)
# Define namespaces
namespaces = {
'ns0': 'urn:ihe:iti:svs:2008',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}
# Parse the XML and extract Concept elements
pandas_df = pd.read_xml( # https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
xml_io_concept,
xpath=".//ns0:Concept",
namespaces=namespaces
)
# Convert Pandas to PySpark
spark_df = spark.createDataFrame(pandas_df)
# Write to table in default Lakehouse
spark_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(f"Tables/test_value_set")
# Query Lakehouse Table
df = spark.sql("SELECT * FROM ValueSet_Lake.test_value_set LIMIT 1000")
display(df)
I solved the copy data by using a http connector with basic auth instead of a REST connection.
This was not obvious because the API uses tokens for auth and a http connection anomynous failed.
Hi @Jorn_vb
To retrieve XML data from an API and load it into a Microsoft Fabric Lakehouse, you can use a combination of Web Activity and Copy Activity in your data pipeline. Here's a step-by-step approach to accomplish this:
Please accept this solution and give kudos , if this helps .
Thanks
Can you explain: Choose "REST" as the source type. How can I set the Request body for a GET request? It only shows up for me when I switch it to POST?
The screenshots on this page do not look like my UI (see screenshot in my post). With what I know it does not make sense to put the response (data I want) into the body of a request? But that is probaply just because what I see mismatches what you see. Do you know anything that can help me?
The screen shots do not match what I see either and I was at a loss on how to create a dummy request, I am new to this. I ended up using a notebook with the requests and pandas libraries to import the xml API data into my lakehouse tables
Here is what I did to test that I could get it to work.
from io import StringIO # https://docs.python.org/3/library/io.html#io.StringIO
import pandas as pd # https://pandas.pydata.org/docs/reference/index.html
import requests # https://requests.readthedocs.io/en/latest/
# Setup Request Variables.
base_url = 'https://vsac.nlm.nih.gov/vsac/svs/'
end_point = 'RetrieveMultipleValueSets'
parameters = {'id':'2.16.840.1.113883.3.117.1.7.1.226','release':'eCQM Update 2020-05-07'}
# Request - Basic Authentication values have been removed
response = requests.get(f"{base_url}{end_point}",params=parameters,auth=('userName','password'))
# Warns about using response.text directly using StringIO as suggested
xml_io_concept = StringIO(response.text)
# Define namespaces
namespaces = {
'ns0': 'urn:ihe:iti:svs:2008',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}
# Parse the XML and extract Concept elements
pandas_df = pd.read_xml( # https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
xml_io_concept,
xpath=".//ns0:Concept",
namespaces=namespaces
)
# Convert Pandas to PySpark
spark_df = spark.createDataFrame(pandas_df)
# Write to table in default Lakehouse
spark_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(f"Tables/test_value_set")
# Query Lakehouse Table
df = spark.sql("SELECT * FROM ValueSet_Lake.test_value_set LIMIT 1000")
display(df)
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
Check out the February 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 4 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |
| User | Count |
|---|---|
| 6 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |