Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

jehebr1

PII Detection and Redaction with Fabric AI Functions

Introduction

 

In a previous post, we explored how to use the Presidio library with PySpark on Microsoft Fabric to detect and anonymize PII. While powerful, that approach requires managing external dependencies and custom logic. In this follow-up, we’ll explore a more native and streamlined alternative: the Fabric AI functions library.

With just a few lines of code, Fabric’s built-in AI functions like ai.extract and ai.generate_response allow you to identify and redact PII directly within your data pipelines—no external libraries required.

 

Why Consider Fabric AI Functions over Presidio?

 

Fabric AI functions, powered by sophisticated language models, offer several advantages:

  • Native to Fabric: No need to install or manage external libraries.
  • Simplified Syntax: One-liner functions that integrate with pandas and PySpark.
  • LLM-Powered: Now backed by GPT-4o mini for greater cost reduction, these functions deliver high-quality entity recognition and response generation.
  • Flexible Use Cases: From entity extraction to redaction, summarization, and even sentiment analysis.
  • Flexible entity types: You can prompt the model to extract custom or domain-specific PII entities beyond the standard set.
  • Contextual understanding: AI models can interpret the surrounding context, distinguishing between sensitive and innocuous usage of similar words (e.g., “Apple” the company vs. “apple” the fruit).

Let’s dive into how the Fabric AI functions library enables these capabilities.

 

Getting Started with Fabric AI Functions

 

The Fabric AI functions library is designed to be both intuitive and powerful, exposing a rich set of APIs for text analysis and transformation. Two functions are particularly relevant for privacy workflows:

  • ai.extract – Extract entities (like names, dates, emails, phone numbers, etc.) from text using AI-driven entity recognition.
  • ai.generate_response – Generate customized responses to structured or unstructured text, including redaction and rewriting with privacy in mind.

For complete documentation check out AI functions overview page. Now let's look at each function in detail with practical examples.

 

Example 1: PII Detection with ai.extract

 

Suppose you receive a dataset containing free-form customer feedback, and you want to flag all records containing PII. With Presidio, you would configure recognizers and patterns for each entity type; with Fabric AI functions, the process is more dynamic.

from synapse.ml.spark.aifunc.DataFrameExtensions import AIFunctions
from synapse.ml.services.openai import OpenAIDefaults

defaults = OpenAIDefaults()
defaults.set_deployment_name("gpt-4o-mini")

data = [
    ("Contact John Doe at john.doe@example.com or 555-123-4567.",),
    ("Jane Smith, 123 Main St, NY, can be reached at jane.smith@email.com.",),
    ("Call 800-555-6789 for support.",),
]
df = spark.createDataFrame(data, ["text"])
pii_extracted = df.ai.extract( input_col="text", labels=["PERSON","EMAIL_ADDRESS","PHONE_NUMBER"] )

display(pii_extracted)

The output looks like this:

A Fabric notebook python cell showing the results of executing ai.extract functionA Fabric notebook python cell showing the results of executing ai.extract function

 

With the PII entities identified, you could take it a step further by using built-in PySpark regular expression functions to redact or mask this data in the original text. Next let’s look at an alternative approach to accomplish this in a single step.

 

Example 2: Redaction and Anonymization with ai.generate_response

 

Let’s say you want not only to detect, but also to redact sensitive information in text. With ai.generate_response, you can prompt the AI to both identify and redact PII in one step.

Here’s an example using the same sample data we defined above:

redaction_prompt = """Redact all PII from this text.
    Input: {text}
    Expected output: The original text but with PII replaced with [REDACTED] and no extra words added.
"""

redacted_df = df.ai.generate_response(prompt=redaction_prompt, is_prompt_template = True, output_col="redacted_text")
display(redacted_df)

 

The result would look like this:

A Fabric notebook python cell showing the results of executing ai.generate_response functionA Fabric notebook python cell showing the results of executing ai.generate_response function

 

This approach is very flexible. You can customize the LLM instructions for anonymization, pseudonymization, or data masking, such as replacing names with generic labels (“NAME_1”) or random values.

 

Best Practices and Considerations

 

  • Prompt engineering matters. Clear, precise instructions yield better results. Experiment with different phrasing for extraction and redaction.
  • Validate outputs. Always validate AI-generated outputs, particularly before sharing or deploying sensitive data. AI models can sometimes make mistakes or hallucinate entities, so human-in-the-loop review is recommended for high-stakes data.
  • Performance: AI functions are currently limited to 1,000 requests/minute
  • Combine approaches. Consider hybrid pipelines: use Presidio for batch processing, and AI functions for edge cases or unstructured data. Or use AI functions for custom entity recognition and Presidio for redaction step.
  • Language Support: Optimized for English, though multilingual support is evolving.
  • Monitor and iterate. Continually monitor model outputs and update prompts or data flows as your data and privacy needs evolve.

 

Conclusion

 

Microsoft Fabric’s AI Functions library offers a powerful, low-code alternative to traditional PII detection and anonymization approaches like Presidio. By leveraging ai.extract and ai.generate_response, data teams can build privacy-preserving workflows with minimal setup, all within the Fabric ecosystem. This approach not only simplifies development but also aligns with privacy-by-design principles, ensuring compliance and data utility for analytics and AI applications.

 

For more details on AI Functions, check out the Microsoft Fabric documentation.

Comments

Greate article. With new improvements from product, this is the way to go for data redaction and other AI Functions use.