Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

jainr

From restricted to AI-ready: Preparing unstructured data directly in Microsoft Fabric with Tonic Textual (Generally Available)

If you haven’t already, check out Arun Ulag’s hero blog “FabCon and SQLCon 2026: Unifying databases and Fabric on a single, complete platform” for a complete look at all of our FabCon and SQLCon announcements across both Fabric and our database offerings. 


AI teams need to move quickly, but the reality is that most of the data they need simply isn’t ready for AI. In fact, Gartner predicts that through 2026, 60% of AI projects will be abandoned because they lack AI-ready data.

A large portion of enterprise information exists in unstructured text. This includes support tickets, contracts, call transcripts, documentation, and internal communications. These sources often contain information that can help train or evaluate AI systems. However, they also include sensitive data such as personal identifiers, financial details, or confidential business information. Because of this, access to these datasets is often restricted.

Tonic Textual (Generally Available) as a workload in the Microsoft Fabric Workload Hub. The workload helps teams detect sensitive information in text and prepare datasets that can be used in AI development workflows inside Fabric. For the official announcement, read the press release - Tonic.ai Announces General Availability of Tonic Textual for Microsoft Fabric.

Unlocking unstructured data for AI development

Many organizations already manage structured data for analytics in Fabric. Unstructured text can also provide useful information for AI systems.

For example, in healthcare, information such as clinical notes, discharge summaries, physician documentation, and patient communications often contain context that structured datasets do not capture. These sources can support use cases such as knowledge retrieval, clinical documentation analysis, or internal search tools. However, they may include protected health information, which limits how the data can be used in development environments. Preparing this data before it is used in AI systems allows organizations to reduce privacy risk while still using the information contained in these documents.

Tonic Textual workload provides tools to identify and transform sensitive information so that unstructured datasets can be used more safely in AI workflows within Fabric.

Preparing unstructured data inside Fabric

Tonic Textual runs as a workload within Microsoft Fabric. It scans unstructured files stored in Microsoft OneLake and detects sensitive entities such as names, identifiers, or financial information.

Organizations can configure how these entities are handled, options include:

  • Redacting sensitive values
  • Masking specific identifiers
  • Replacing values with synthetic data
  • Applying custom rules for specific entity types
Prepared datasets can then be used in AI development workflows, including model training, evaluation, and application development with Microsoft Foundry.

Running these steps inside Fabric allows teams to prepare text data without moving it outside the platform.

Preparing text for AI workflows

When sensitive entities are transformed, it is important that the text remains usable for downstream tasks. Tonic Textual preserves document structure and surrounding context when transformations are applied.

For example:

  • Support conversations can keep their dialogue format.
  • Contracts can keep their structure and clauses.
  • Documentation can retain technical context.
This helps teams prepare datasets that can be used in training, testing, or retrieval scenarios.

Applying the same transformation rules across datasets also helps organizations use consistent privacy policies across development workflows.

Example workflow in Fabric

Teams can use Tonic Textual in Fabric as part of their existing data workflows.

1. Add the workload: Install Tonic Textual from the Microsoft Fabric Workload Hub.

Microsoft_Fabric_Workload_Hub_showing_the_Tonic_Textual_workload_tile_availableMicrosoft_Fabric_Workload_Hub_showing_the_Tonic_Textual_workload_tile_availableFigure: Tonic Textual in Fabric Workload Hub.

2. Select data in OneLake: Choose document collections, transcripts, or other text datasets stored in OneLake.

Tonic_Textual_in_Fabric_prompting_the_user_to_select_a_source_folder_in_OneLakeTonic_Textual_in_Fabric_prompting_the_user_to_select_a_source_folder_in_OneLakeFigure: Select source folder in OneLake.

3. Select an output location: Choose where the prepared dataset should be written in OneLake. This allows teams to keep the original source data unchanged while creating a separate AI-ready dataset.

Tonic_Textual_in_Fabric_prompting_the_user_to_choose_a_destination_folder_in_OneTonic_Textual_in_Fabric_prompting_the_user_to_choose_a_destination_folder_in_OneFigure: Select destination folder in OneLake.

4. Detect sensitive entities: Tonic Textual scans the dataset and identifies sensitive information.

Tonic_Textual_scan_view_showing_files_being_analyzed_and_detected_sensitive_entiTonic_Textual_scan_view_showing_files_being_analyzed_and_detected_sensitive_entiFigure: Scan the files for sensitive text.

5. Configure transformations: Select how each entity type should be handled.

Tonic_Textual_configuration_screen_for_choosing_how_detected_entities_are_transfTonic_Textual_configuration_screen_for_choosing_how_detected_entities_are_transf

Figure: Configure de-identification preference.

6. Use the prepared data: The resulting dataset remains in Fabric and can be used in downstream pipelines such as training or retrieval systems.

This approach allows teams to prepare text datasets before they are used in prompts or models. For a quick end‑to‑end walkthrough of this flow in action, check out the Tonic Textual on Microsoft Fabric video.

Available now

Tonic Textual for Microsoft Fabric can be added directly from the Fabric Workload Hub. If you are building AI with unstructured data and need a secure way to prepare sensitive text for production, you can start using Tonic Textual Fabric Workload now!