<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Matching script using python in Fabric - running using pandas vs park data frame in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4795583#M11702</link>
    <description>&lt;P&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;&amp;nbsp;Hey,&lt;BR /&gt;I will follow below steps to troubleshoot the issue.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;for script 1:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1) Consider using &lt;STRONG&gt;multiprocessing&lt;/STRONG&gt; to parallelize the computation, leveraging multiple cores on your machine.&lt;/P&gt;&lt;P&gt;Alternatively, use &lt;STRONG&gt;dask library&lt;/STRONG&gt; for parallelized dataframe operations.&lt;/P&gt;&lt;P&gt;2)&amp;nbsp;Simplify the inner loop comparison by using vectorized operations where possible with &lt;STRONG&gt;numpy and pandas&lt;/STRONG&gt;, though this may require significant changes in how comparisons are performed.&lt;/P&gt;&lt;P&gt;3)&amp;nbsp;&lt;SPAN&gt;Use similarity heuristics or initial filters to reduce the number of candidates to compare before performing detailed fuzzy matching.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;&lt;STRONG&gt;Script 2: Spark with RapidFuzz&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;&lt;STRONG&gt;Suggestion:&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1) Instead of fuzzy matching every record, perform a join operation on comparable columns and then apply fuzzy matching only on resulting candidate pairs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;df_candidate_pairs = df1_clean.join(df2_clean, df1_clean['some_column'] == df2_clean['some_column'], 'inner')&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) UDF can slow down Spark as they often run slower than built-in functions. Consider trying &lt;STRONG&gt;Spark SQL functions available&lt;/STRONG&gt; or a &lt;STRONG&gt;pandas_udf.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3)&amp;nbsp;Convert the resulting Spark DataFrame to Pandas for final operations that require the session access, bypassing worker restriction issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4) I will use pandas_udf: Leverage Pandas UDF, which operates on Python objects directly and is executed at Python-level inside the JVM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;from pyspark.sql.functions import pandas_udf @pandas_udf(schema, SparkSession.sqlContext) def fuzzy_match_udf(df1_chunk: pd.DataFrame) -&amp;gt; pd.Series: matches = [] for row_i in df1_chunk.itertuples(): # Perform the same logic as defined in your function # Append results to matches return pd.Series(matches) df_matches = df1_clean.withColumn("Matches", fuzzy_match_udf(df1_clean))&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Ensure df2_clean is as small and optimized for broadcasting operations, minimizing memory footprint.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Harish M&lt;/P&gt;&lt;P&gt;Kindly accept it as solution if it solved your problem. Kindly give kudos.&lt;/P&gt;</description>
    <pubDate>Wed, 13 Aug 2025 09:48:49 GMT</pubDate>
    <dc:creator>HarishKM</dc:creator>
    <dc:date>2025-08-13T09:48:49Z</dc:date>
    <item>
      <title>Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4788323#M11516</link>
      <description>&lt;P&gt;Hello, I am trying to compare 2 datasets to determine if there are customers between them that are likely matches. We are using 2 methodolgies to compare and determine if there is a match. We combine the customer name and customer address into a single string to compare that between the data frames. We also combine the customer name and customer postal code into a single string to compare.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've ran this script successfully in the past, but we recently changed fabric capacities and now I am having some issues. I've tested a few different things but I have not been successful and I am hoping to get some guidance.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Some context:&lt;/P&gt;&lt;P&gt;- df1_clean is a spark data frame with ~52k rows&lt;/P&gt;&lt;P&gt;- df2_clean is a pandas data frame with ~3k rows&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(scripts referenced are pasted below)&lt;/P&gt;&lt;P&gt;1) This is the first script uses rapid fuzz and pandas to compare the 2 dataframes which technically does run but will take 20hrs to run, which seems really long for such a small dataset.&lt;/P&gt;&lt;P&gt;2) The second script uses spark and rapid fuzz to compare the 2 dataframes which runs successfully and quickly. However, I run into an error that the resulting dataframe can't be written to a table or exported to a csv or use display() or show(). I get an error that says "SparkSession should only be created and accessed on the driver." because&amp;nbsp;Fabric restricts access to the SparkSession from worker nodes, which is what happens when the UDF tries to use&amp;nbsp;fuzz.ratio(...)&amp;nbsp;inside the distributed Spark job.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would welcome workarounds for the spark issue or a suggestion on a different approach to acheive the desired result.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SCRIPT 1:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;from tqdm import tqdm&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;def find_fuzzy_matches_with_combined_fields(df1_clean, df2_clean, customer_number_column, customer_name_column, threshold=70):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; matches = []&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; processed = set()&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; # Wrap the outer loop with tqdm for progress tracking&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; for i, row_i in tqdm(df1_clean.iterrows(), total=len(df1_clean), desc="Matching"):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if row_i['Combined_Field_Main'] in processed or row_i['Combined_Field_Delivery'] in processed:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for j, row_j in df2_clean.iterrows():&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if row_i[customer_number_column] == row_j[customer_number_column]:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; score_main = fuzz.ratio(row_i['Combined_Field_Main'], row_j['Combined_Field_Main'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; score_delivery = fuzz.ratio(row_i['Combined_Field_Delivery'], row_j['Combined_Field_Delivery'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; name_zip_score_main = fuzz.ratio(row_i['Name_Zip_Field_Main'], row_j['Name_Zip_Field_Main'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; name_zip_score_delivery = fuzz.ratio(row_i['Name_Zip_Field_Delivery'], row_j['Name_Zip_Field_Delivery'])&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; matched_main = score_main &amp;gt;= threshold or name_zip_score_main &amp;gt;= threshold&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; matched_delivery = score_delivery &amp;gt;= threshold or name_zip_score_delivery &amp;gt;= threshold&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if matched_main or matched_delivery:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if matched_main:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; key_address_number = row_i.get('Address_number', 'N/A')&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; duplicate_address_number = row_j.get('Address_number', 'N/A')&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; key_customer_address = [row_i[f'Main_Address_{x}'] for x in range(1, 5)]&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; duplicate_customer_address = [row_j[f'Main_Address_{x}'] for x in range(1, 5)]&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; elif matched_delivery:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; key_address_number = row_i.get('Delivery_Address_Num', 'N/A')&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; duplicate_address_number = row_j.get('Delivery_Address_Num', 'N/A')&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; key_customer_address = [row_i[f'Delivery_Address_{x}'] for x in range(1, 5)]&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; duplicate_customer_address = [row_j[f'Delivery_Address_{x}'] for x in range(1, 5)]&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; match_category = categorize_match(max(score_main, score_delivery))&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; name_zip_category = categorize_match(max(name_zip_score_main, name_zip_score_delivery))&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; matches.append({&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Customer Number': row_i[customer_number_column],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Customer Number': row_j[customer_number_column],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Customer Name': row_i[customer_name_column],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Customer Name': row_j[customer_name_column],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Main Addr Match Value': row_i['Combined_Field_Main'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Delivery Addr Match Value': row_i['Combined_Field_Delivery'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Main Addr Match Value': row_j['Combined_Field_Main'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Delivery Addr Match Value': row_j['Combined_Field_Delivery'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Score Main': score_main,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Score Delivery': score_delivery,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'AddressMatch_Category': match_category,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Main Zip Match Value': row_i['Name_Zip_Field_Main'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Delivery Zip Match Value': row_i['Name_Zip_Field_Delivery'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Main Zip Match Value': row_j['Name_Zip_Field_Main'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Delivery Zip Match Value': row_j['Name_Zip_Field_Delivery'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Name_Zip Score Main': name_zip_score_main,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Name_Zip Score Delivery': name_zip_score_delivery,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'ZipMatch_Category': name_zip_category,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Customer Address Number': key_address_number,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Customer Address Number': duplicate_address_number,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Match Address Line 1': key_customer_address[0],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Match Address Line 2': key_customer_address[1],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Match Address Line 3': key_customer_address[2],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Match Address Line 4': key_customer_address[3],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Main Postal Code': row_i['Main_Postal_Code'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Key Cust Delivery Postal Code': row_i['Delivery_Postal_Code'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Match Address Line 1': duplicate_customer_address[0],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Match Address Line 2': duplicate_customer_address[1],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Match Address Line 3': str(duplicate_customer_address[2]),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Match Address Line 4': duplicate_customer_address[3],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Main Postal Code': str(row_j['Main_Postal_Code']),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Cust Delivery Postal Code': str(row_j['Delivery_Postal_Code']),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate?': '',&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Comment': ''&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; })&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if score_main == 100 or score_delivery == 100:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; processed.add(row_i['Combined_Field_Main'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; processed.add(row_i['Combined_Field_Delivery'])&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; unique_matches = {frozenset(d.items()): d for d in matches}.values()&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; return list(unique_matches)&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;# Check the DataFrame types and convert if necessary&lt;/DIV&gt;&lt;DIV&gt;if isinstance(df1_clean, F.DataFrame):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; df1_pandas = df1_clean.toPandas()&lt;/DIV&gt;&lt;DIV&gt;else:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; df1_pandas = df1_clean&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;if isinstance(df2_clean, F.DataFrame):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; df2_pandas = df2_clean.toPandas()&lt;/DIV&gt;&lt;DIV&gt;else:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; df2_pandas = df2_clean&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; # Run the fuzzy matching process for customers&lt;/DIV&gt;&lt;DIV&gt;matches = find_fuzzy_matches_with_combined_fields(df1_pandas, df2_pandas, 'Customer_number', 'Customer_name')&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;# Convert the results to a Pandas DataFrame&lt;/DIV&gt;&lt;DIV&gt;df_matches = pd.DataFrame(matches)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;SCRIPT 2:&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;from pyspark.sql.functions import udf, col, lit&lt;/DIV&gt;&lt;DIV&gt;from pyspark.sql.types import ArrayType, StructType, StructField, StringType, IntegerType&lt;/DIV&gt;&lt;DIV&gt;from rapidfuzz import fuzz&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;# Broadcast df2_clean&lt;/DIV&gt;&lt;DIV&gt;df2_broadcast = spark.sparkContext.broadcast(df2_clean.to_dict(orient='records'))&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;# Define UDF to compare each row in df1_clean to all rows in df2_clean&lt;/DIV&gt;&lt;DIV&gt;def find_matches_udf(row_combined_main, row_combined_delivery, row_name_zip_main, row_name_zip_delivery, row_cust_num, threshold=70):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; matches = []&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; for row_j in df2_broadcast.value:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if row_cust_num == row_j['Customer_number']:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; continue&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; score_main = fuzz.ratio(row_combined_main, row_j['Combined_Field_Main'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; score_delivery = fuzz.ratio(row_combined_delivery, row_j['Combined_Field_Delivery'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; name_zip_score_main = fuzz.ratio(row_name_zip_main, row_j['Name_Zip_Field_Main'])&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; name_zip_score_delivery = fuzz.ratio(row_name_zip_delivery, row_j['Name_Zip_Field_Delivery'])&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if max(score_main, score_delivery, name_zip_score_main, name_zip_score_delivery) &amp;gt;= threshold:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; matches.append({&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Duplicate Customer Number': row_j['Customer_number'],&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Score Main': score_main,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Score Delivery': score_delivery,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Name_Zip Score Main': name_zip_score_main,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'Name_Zip Score Delivery': name_zip_score_delivery&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; })&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; return matches&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;# Register UDF&lt;/DIV&gt;&lt;DIV&gt;schema = ArrayType(StructType([&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; StructField("Duplicate Customer Number", StringType()),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; StructField("Score Main", IntegerType()),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; StructField("Score Delivery", IntegerType()),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; StructField("Name_Zip Score Main", IntegerType()),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; StructField("Name_Zip Score Delivery", IntegerType())&lt;/DIV&gt;&lt;DIV&gt;]))&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;match_udf = udf(find_matches_udf, schema)&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;# Apply UDF to df1_clean&lt;/DIV&gt;&lt;DIV&gt;df_matches = df1_clean.withColumn(&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; "Matches",&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; match_udf(&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col("Combined_Field_Main"),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col("Combined_Field_Delivery"),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col("Name_Zip_Field_Main"),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col("Name_Zip_Field_Delivery"),&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; col("Customer_number")&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp; )&lt;/DIV&gt;&lt;DIV&gt;).filter(col("Matches").isNotNull())&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 05 Aug 2025 22:52:36 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4788323#M11516</guid>
      <dc:creator>fo6168</dc:creator>
      <dc:date>2025-08-05T22:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4794481#M11676</link>
      <description>&lt;P&gt;Panda(s) data frame has additional options that are not available in spark data frames. The performance differences should be minimal&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 12:34:54 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4794481#M11676</guid>
      <dc:creator>Thomaslleblanc</dc:creator>
      <dc:date>2025-08-12T12:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4795097#M11694</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is how it works:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First Python (Pandas DataFrame ) -- &amp;gt; Apache Spark ( Data Lake )&amp;nbsp; &amp;nbsp;sdf = spark.createDataFrame("df")&amp;nbsp; -- &amp;gt; Delta Lake ( Data Lakehouse ).&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;sdf.write.mode("overwrite").format("delta").SaveAsTable("DimTable")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Python --&amp;gt; Parquet Table -- &amp;gt; Bronze Tables -- &amp;gt; Silver Tables -- &amp;gt; Gold ( In Memory Engine ).&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 04:54:19 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4795097#M11694</guid>
      <dc:creator>BhaveshPatel</dc:creator>
      <dc:date>2025-08-13T04:54:19Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4795583#M11702</link>
      <description>&lt;P&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;&amp;nbsp;Hey,&lt;BR /&gt;I will follow below steps to troubleshoot the issue.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;for script 1:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1) Consider using &lt;STRONG&gt;multiprocessing&lt;/STRONG&gt; to parallelize the computation, leveraging multiple cores on your machine.&lt;/P&gt;&lt;P&gt;Alternatively, use &lt;STRONG&gt;dask library&lt;/STRONG&gt; for parallelized dataframe operations.&lt;/P&gt;&lt;P&gt;2)&amp;nbsp;Simplify the inner loop comparison by using vectorized operations where possible with &lt;STRONG&gt;numpy and pandas&lt;/STRONG&gt;, though this may require significant changes in how comparisons are performed.&lt;/P&gt;&lt;P&gt;3)&amp;nbsp;&lt;SPAN&gt;Use similarity heuristics or initial filters to reduce the number of candidates to compare before performing detailed fuzzy matching.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;&lt;STRONG&gt;Script 2: Spark with RapidFuzz&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;&lt;STRONG&gt;Suggestion:&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1) Instead of fuzzy matching every record, perform a join operation on comparable columns and then apply fuzzy matching only on resulting candidate pairs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;df_candidate_pairs = df1_clean.join(df2_clean, df1_clean['some_column'] == df2_clean['some_column'], 'inner')&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) UDF can slow down Spark as they often run slower than built-in functions. Consider trying &lt;STRONG&gt;Spark SQL functions available&lt;/STRONG&gt; or a &lt;STRONG&gt;pandas_udf.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3)&amp;nbsp;Convert the resulting Spark DataFrame to Pandas for final operations that require the session access, bypassing worker restriction issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4) I will use pandas_udf: Leverage Pandas UDF, which operates on Python objects directly and is executed at Python-level inside the JVM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;from pyspark.sql.functions import pandas_udf @pandas_udf(schema, SparkSession.sqlContext) def fuzzy_match_udf(df1_chunk: pd.DataFrame) -&amp;gt; pd.Series: matches = [] for row_i in df1_chunk.itertuples(): # Perform the same logic as defined in your function # Append results to matches return pd.Series(matches) df_matches = df1_clean.withColumn("Matches", fuzzy_match_udf(df1_clean))&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Ensure df2_clean is as small and optimized for broadcasting operations, minimizing memory footprint.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Harish M&lt;/P&gt;&lt;P&gt;Kindly accept it as solution if it solved your problem. Kindly give kudos.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 09:48:49 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4795583#M11702</guid>
      <dc:creator>HarishKM</dc:creator>
      <dc:date>2025-08-13T09:48:49Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4796715#M11737</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for reaching out to the Microsoft Fabric Forum Community, and special thanks to&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/291692"&gt;@HarishKM&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/10288"&gt;@BhaveshPatel&lt;/a&gt;&amp;nbsp; and&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/498348"&gt;@Thomaslleblanc&lt;/a&gt;&amp;nbsp; for prompt and helpful responses.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just following up to see if the responses provided by community members were helpful in addressing the issue.&lt;/P&gt;
&lt;P&gt;If one of the responses helped resolve your query, please consider marking it as the Accepted Solution. Feel free to reach out if you need any further clarification or assistance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Prasanna Kumar&lt;/P&gt;</description>
      <pubDate>Thu, 14 Aug 2025 07:14:09 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4796715#M11737</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2025-08-14T07:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4801793#M11827</link>
      <description>&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Just following up to see if the Response provided was helpful in resolving your issue. Please feel free to let us know if you need any further assistance.&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Best regards,&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Prasanna Kumar&lt;/P&gt;</description>
      <pubDate>Wed, 20 Aug 2025 06:23:17 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4801793#M11827</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2025-08-20T06:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Matching script using python in Fabric - running using pandas vs park data frame</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4804350#M11880</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/895941"&gt;@fo6168&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just following up to see if the responses provided by community members were helpful in addressing the issue.&lt;/P&gt;
&lt;P&gt;If one of the responses helped resolve your query, please consider marking it as the Accepted Solution. Feel free to reach out if you need any further clarification or assistance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Prasanna Kumar&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 03:07:17 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Matching-script-using-python-in-Fabric-running-using-pandas-vs/m-p/4804350#M11880</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2025-08-22T03:07:17Z</dc:date>
    </item>
  </channel>
</rss>

