Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I am attempting a variation to this post:
Solved: PySpark Notebook Using Structured Streaming with D... - Microsoft Fabric Community
Trying to use Spark Job defintion in Microsoft Fabric to use Structured Streaming from Lakehouse Files into Lakehouse Table.
Running the script as standalone notebook works fine, however as Spark Job, I dont get data populated. The delta table gets created and the the checkpoint location, however no data is being populated.
Any suggestions be much appreciated.
Here is the working notebook example script:
and here is the non populating Spark Job defintion:
Solved! Go to Solution.
@BilalBobat lets try this.. more simpler version, This works for me just Fine
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
spark = SparkSession.builder \
.appName("Stream CSV to Delta Table") \
.getOrCreate()
userSchema = StructType().add("name", "string").add("sales", "integer")
streamingDF = spark.readStream \
.schema(userSchema) \
.option("maxFilesPerTrigger", 1) \
.csv("Files/Streaming/") # Replace with the actual path to your streaming CSV files
query = streamingDF.writeStream \
.trigger(processingTime='5 seconds') \
.outputMode("append") \
.format("delta") \
.option("checkpointLocation", "Tables/Streaming_Table_test/_checkpoint") \
.start("Tables/Streaming_Table_test") # Replace with the path where you want to save the Delta table
query.awaitTermination()
Note : you need to fix the Indentation of the code i shared as it is sensitive to indentation
When i paste it here it looses the Indentation
Or you can download the testjob.py from here
https://github.com/puneetvijwani/fabricNotebooks
Perfect, that works. Thanks again for your extensive speeedy help on this, much appreciated. 🙏
Have a super day.
@BilalBobat lets try this.. more simpler version, This works for me just Fine
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
spark = SparkSession.builder \
.appName("Stream CSV to Delta Table") \
.getOrCreate()
userSchema = StructType().add("name", "string").add("sales", "integer")
streamingDF = spark.readStream \
.schema(userSchema) \
.option("maxFilesPerTrigger", 1) \
.csv("Files/Streaming/") # Replace with the actual path to your streaming CSV files
query = streamingDF.writeStream \
.trigger(processingTime='5 seconds') \
.outputMode("append") \
.format("delta") \
.option("checkpointLocation", "Tables/Streaming_Table_test/_checkpoint") \
.start("Tables/Streaming_Table_test") # Replace with the path where you want to save the Delta table
query.awaitTermination()
Thanks for your response, not late at all, appreciate your help on this, extremely helpful.
Tried your code and nope no data being populated unfortunately..
here is sample of the csv file I am loading if that helps from Files
alpha,100
beta,200
charlie,300
@BilalBobat
Sorry for late response , can you try this code and let me know if its working for you
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
if __name__ == "__main__":
try:
spark = SparkSession.builder.appName("MyApp").getOrCreate()
spark.sparkContext.setLogLevel("DEBUG")
# Define the schema
userSchema = StructType().add("name", "string").add("sales", "integer")
# Read the csv files into a DataFrame
query = (spark.readStream
.schema(userSchema)
.option("maxFilesPerTrigger", 1)
.csv("Files/streamingdata/streamingfiles")
.writeStream
.trigger(processingTime='5 seconds') # Added a time-based trigger
.format("delta")
.outputMode("append")
.option("checkpointLocation", "Files/_checkpoint/Struc_streaming_csv_data")
.toTable("Struc_streaming_csv_data"))
query.awaitTermination()
except Exception as e:
print(f"An error occurred: {e}")
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
13 | |
4 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
7 | |
6 | |
6 | |
5 |