This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
As you may have seen, AI Skill in Microsoft Fabric is publicly available, offering exciting new possibilities for users to create customized, data-driven generative AI experts within Microsoft Fabric. This post will demonstrate how you can extend the capabilities of Fabric AI Skill in Microsoft Fabric notebooks to deliver richer and more comprehensive responses using additional Large Language Model (LLM) queries. By leveraging these techniques, you can enhance the depth and context of AI-generated answers from your data that lives in Microsoft Fabric. You can also access and run the Notebook here.
The AI Skill in Microsoft Fabric allows you to build a Q&A chatbot that draws insights from selected data sources. By simply configuring the relevant tables in Fabric, you can ask questions and receive detailed, data-backed responses. This blog will guide you on how to configure the AI Skill by providing additional context and details, in order to receive more comprehensive and contextually rich responses. For example, if you inquire about the top-selling product in a retail dataset for a specific year, you can leverage the AI Skill to return not just the best-seller, but also deeper insights such as how the top products from that year compare to previous years.
This enriched Q&A experience provides a more detailed look at trends and patterns, which can be essential for data analysis.
By integrating the AI Skill with supplementary LLM calls, you can elevate the Q&A experience. Follow the steps below to configure this advanced setup:
Harness_Microsoft_Fabric_AI_Skill_to_Unlock_Context-Rich_Insights_from_Your_Data
Here's a breakdown of the process to enrich your AI Skill responses.
Create a Lakehouse and then open a Fabric notebook. Attach the Lakehouse that you created to the notebook and run the following code to add all tables to your Lakehouse. Note that you will be using the AdventureWorks dataset which includes retail data with tables for sales, products, customers, promotions, etc.
import pandas as pd
from tqdm.auto import tqdm
base = "https://synapseaisolutionsa.blob.core.windows.net/public/AdventureWorks"
# load list of tables
df_tables = pd.read_csv(f"{base}/adventureworks.csv", names=["table"])
for table in (pbar := tqdm(df_tables['table'].values)):
pbar.set_description(f"Uploading {table} to lakehouse")
# download
df = pd.read_parquet(f"{base}/{table}.parquet")
# save as lakehouse table
spark.createDataFrame(df).write.mode('overwrite').saveAsTable(table)
Go to the Data Science experience and select AI Skill. Enter a name to create an AI Skill. Then, select the Lakehouse you created above and select Connect.
You must then select the tables for which you want the AI Skill to have available access. From the left panel, select the tables dimproduct, dimpromotion, factinternetsales as these are the only needed tables.
To improve the quality of the query generated by the AI Skill, provide the following instructions as parts of the Notes to model.
dimproduct in the answer.No Discount appearing in dimpromotion is not actually a promotion. Always filter out the No Discount if user asks about promotion.Click on the publish button to publish the AI Skill. Once published, navigate to the Settings, then click on Publishing to copy the provided URL to the cell below. You will use the URL to query the AI skill by making calls to the AI skill API. Learn more about how to use AI Skill programmatically.
Harness_Microsoft_Fabric_AI_Skill_to_Unlock_Context-Rich_Insights_from_Your_Data
Harness_Microsoft_Fabric_AI_Skill_to_Unlock_Context-Rich_Insights_from_Your_Data
Harness_Microsoft_Fabric_AI_Skill_to_Unlock_Context-Rich_Insights_from_Your_Data
# AI Skill published URL
aiskill_url = ""
The first stop is to transform your initial question into several related questions. By rephrasing the original question in different ways, you can gather more comprehensive information and provide a deeper understanding of the topic at hand.
Below is a sample prompt that demonstrates how to generate multiple relevant questions based on your initial question. In the following function, you will see how to call a specific LLM model from OpenAI. The parameters, deployment ID, messages, temperature, and seed are part of the experimental setup.
In this case, the prompt and question are passed directly into the messages, the temperature is set to 0, the seed is fixed, and the deployment ID is set to one of the supported models.
For information on switching models, please refer to Fabric AI Services.
prompt_extract_questions = """You are a helpful analyst. You are given a user query, using the query create 3 relevant questions that will give more information around the *question being asked*.
For example: If the question is, what is the top selling product in 2019? The rephrased questions could be:
what were the top selling 3 products in 2019? what were the top 3 products sold in 2018? or what were the top 3 best-selling products across all years?
Note that these questions will be independently used to query a SQL table. Don't enumerate the questions and put each question in a new line."""import openai
# Given a user question, generates 3 more questions that will be used to obtain more context about the original question
def openAI_service_multiple_questions(query):
response = openai.ChatCompletion.create(
deployment_id='gpt-4-32k',
messages=[
{"role": "system", "content": prompt_extract_questions},
{"role": "user", "content": query},
],
temperature=0,
seed=40,
)
return response
You can now provide your own prompt, which will be fed as input to the openAI_service_multiple_questions function defined above to generate relevant questions. This allows you to customize the process and obtain more targeted information from the AI Skill.
query = "What is the top selling product in 2013?"
response = openAI_service_multiple_questions(query)
answer = response.choices[0].message.content
print(f"{answer}")
Once you have seen the generated questions by the openAI_service_multiple_questions function, you can break it into separate lines which helps to extract each question separately for easier processing later.
questions = answer.split("\n")
print(questions)
After you have generated relevant questions, move to the next step to define the function aiSkill which would take three inputs:
context: the notes for the model in the AI Skillquestion: user's questionaiskill_url: the URL of the published AI Skillfrom synapse.ml.mlflow import get_mlflow_env_config
from tenacity import retry, stop_after_attempt
import requests
import json
@retry(stop=stop_after_attempt(3)) # Retry up to 3 times
def aiSkill(context, question, aiskill_url):
configs = get_mlflow_env_config()
headers = {
"Authorization": f"Bearer {configs.driver_aad_token}",
"Content-Type": "application/json; charset=utf-8"
}
text = {
'userQuestion': question,
'modelBehavior' : {
'enableBlockAdditionalContextByLength': False,
},
'additionalContext': context # notes to the model
}
response = requests.post(aiskill_url, headers=headers, data = json.dumps(text))
response = json.loads(response.content)
# If the AISkill didn't generate a good SQL query, it will throw an error
if "ResultRows" not in response.keys():
raise ValueError(response["message"])
return response
You will then provide relevant context or notes to the AI Skill. Note that this additional information, which is usually provided by the subject matter expert who has good knowledge of the data helps the AI Skill determine the appropriate columns to use and generate accurate and relevant outputs.
notes = """ \
- When answering about a product, make sure to include the Product Name in dimproduct in the answer.
- Best selling product should be determined by sales volume, not sales amount.
- If you answer questions about quantities, make sure to include the quantity.
- If the user asks about promotion, note that "No Discount" appearing in dimpromotion is not actually a promotion. Always filter out the "No Discount" if user asks about promotion.
- If the question is generic and involves no finite entities such as "top 5", use a reasonable number less than 10, so that answer is not too large.
"""
Consolidate all outputs from the AI Skill by combining the results generated for each rephrased question into a single, unified set of answers. This ensures that you have a comprehensive view of all the responses and can analyze or utilize them collectively.
# Run a loop over the generated questions and get an answer from AI Skill
def get_context_from_aiSkill(questions):
ai_skill_response = []
ai_skill_sql = []
for question in questions:
response = aiSkill(notes, question, aiskill_url)
ai_skill_response.append(f"headers: {response['ResultHeaders']}, rows: {response['ResultRows']}")
ai_skill_sql.append(response['executedSQL'])
return ai_skill_response, ai_skill_sqlquestions.append(query)
ai_skill_response, ai_skill_sql = get_context_from_aiSkill(questions)
You can review the SQL query generated for the each of the rephrased question. This allows you to verify and understand the SQL code created by the AI Skill in response to each specific question.
import pandas as pd
# Adjust Pandas display options to prevent truncation of long strings
pd.set_option('display.max_colwidth', None) # Set to 'None' to display all contents without truncation
def display_ai_skill_data(questions, ai_skill_sql, ai_skill_response):
"""
This function takes questions, AI Skill SQL queries, and AI Skill responses, and
displays them in a table with three columns: 'Question', 'SQL Query', and 'AI Skill Response'.
Args:
questions (list): List of questions.
ai_skill_sql (list): List of SQL queries generated by AI Skill.
ai_skill_response (list): List of responses generated by AI Skill.
Returns:
DataFrame: A pandas DataFrame containing the questions, SQL queries, and responses.
"""
# Create DataFrame
df = pd.DataFrame({
'Question': questions,
'SQL Query': ai_skill_sql,
'AI Skill Response': ai_skill_response
})
# Display the DataFrame
return df
Next, integrate the questions and answers from the AI Skill into the context.
def get_context_to_answer_with_llm(query, rephrased_query, ai_skill_response):
context = ""
for index, question in enumerate(questions):
context += f"rephrased question: {question} \n" + f"header and rows of the table: {ai_skill_response[index]}\n"
context += f"user query: {query}"
return context
context = get_context_to_answer_with_llm(query, questions, ai_skill_response)
Consolidating the initial question, all additional questions, and the AI Skill response into the final LLM-generated answer. This process results in a cohesive, comprehensive answer produced by LLM, integrating all relevant information into a final, unified response.
prompt_final_answer = """You are an expert analyst. Given a user query, rephrased query asking more details, and the relevant context that is from an output of a SQL
table presented as headers and corresponding rows, answer the user question with details.
Formulate your response to directly answer the original user query, while adding details with the answers to the rephrased queries.
Do not mention things like 'Based on the data provided' or anything about the data, just answer the user question."""def openAI_service_generate_answer(query, rephrased_query, ai_skill_response):
context = get_context_to_answer_with_llm(query, rephrased_query, ai_skill_response)
response = openai.ChatCompletion.create(
deployment_id='gpt-4-32k',
messages=[
{"role": "system", "content": prompt_final_answer },
{"role": "user", "content": f"Use the information provided in the context headers and rows: {context} and write a descriptive report assessing all the table information, then concluding with the answer to the user query." },
],
temperature=0)
return response
response = openAI_service_generate_answer(query, questions, ai_skill_response)
answer = response.choices[0].message.content
print(f"{answer}")
Now that you have a clear understanding of how the framework operates, you can apply it to your own specific examples. This will help you see how the framework handles various scenarios and further clarify its functionality.
Question: What is the top selling product in 2013?
Answer: The top selling products in 2013 were the Water Bottle - 30 oz., Patch Kit/8 Patches, and Mountain Tire Tube, with total quantities of 4080, 3026, and 2926 respectively. In 2012, the top selling product was the Mountain-200 Black, 46.
When we look at the top selling products across all years, the Water Bottle - 30 oz. leads the pack with a total quantity of 4244. Following closely are the Patch Kit/8 Patches and Mountain Tire Tube with total quantities of 3191 and 3095 respectively. Other notable top sellers include the Road Tire Tube and Sport-100 Helmet, Red, with total quantities of 2376 and 2230 respectively.
In conclusion, the top selling product in 2013 was the Water Bottle - 30 oz. with a total quantity of 4080.
The Fabric AI Skill offers powerful capabilities to create customized AI-driven experiences within Microsoft Fabric. By integrating additional LLM calls, you can create a powerful, customized AI Skill to enhance the depth and context of the responses, providing a more robust data-driven experience. Whether you're analyzing product trends or uncovering hidden insights, these steps will help you make the most of the Fabric AI Skill in your environment.
Post Author(s):
Amir Jafari - Senior Product Manager in Azure Data.
Aydan Aksoylar - Senior AI Engineer in Azure Data.
Sandip Kulkarni - Principal AI Engineer in Azure Data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.