<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Structured Streaming for multiple queries are not picking the next batch after processing the fi in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4244275#M4634</link>
    <description>&lt;P&gt;Hi @Anonymous&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is my follow-up just to ask if the problem has been solved?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, can you accept the correct answer as a solution or share your solution to help other members find it faster?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you very much for your cooperation!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;BR /&gt;Yang&lt;BR /&gt;Community Support Team&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there is any post&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;helps&lt;/EM&gt;&lt;/STRONG&gt;, then please consider&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Accept it as the solution&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;to help the other members find it more quickly.&lt;BR /&gt;If I misunderstand your needs or you still have problems on it, please feel free to let us know.&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Thanks a lot!&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Oct 2024 07:30:31 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2024-10-16T07:30:31Z</dc:date>
    <item>
      <title>Structured Streaming for multiple queries are not picking the next batch after processing the first</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4233897#M4497</link>
      <description>&lt;P&gt;I m streaming multiple queries in a single spark job where it takes parquet files from source folder and then upsert its to delta table in lakehouse in the foreachbatch function .It worked fine and all streams are active and running without any error .But its not picking up the next batch even if new files get added to source.What could be the reason for this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;for row in control_df.collect():&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; table = os.path.basename(row.TargetFiles)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; source_path = f"{self.data_lake_container}/{row.SourceFiles}"&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; schema= self.infer_schema_from_file(source_path,"parquet")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; # print(schema)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; df_read = spark.readStream \&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .schema(schema) \&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .format("parquet")\&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .option("recursiveFileLookup","true")\&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .load(source_path)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; # Write streaming data to destination&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; query = df_read.writeStream.foreachBatch(self.streamwriter).trigger(processingTime="60&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;seconds").queryName(table).option("checkpointLocation",f"{self.data_lake_container}/{self.checkpoint_path}/{table}").start()&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for q in spark.streams.active:&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; stream_count+=1&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print(f"ID: {q.id},\nName: {q.name},\nStatus: {q.status},\nLast Progress: {q.lastProgress}\n\n")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print(stream_count)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; self.logger.info(f"Created streams for {stream_count} tables in {self.target_lakehouse} lakehouse")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; query.awaitTermination()&lt;/STRONG&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 08 Oct 2024 15:55:43 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4233897#M4497</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-10-08T15:55:43Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Streaming for multiple queries are not picking the next batch after processing the fi</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4234266#M4502</link>
      <description>&lt;P&gt;Hi @Anonymous&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have the following two suggestions:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your code I observe that the processingTime trigger is set to 60 seconds. If new files are added after the trigger interval, they may not be captured immediately. Please try increasing the time interval.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Make sure that new files added to the source directory have unique names and are not overwritten. If the file name is reused, Spark may not detect the change.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If none of this solves your problem, as an alternative, you can recreate a spark job that adds all files (including the new ones) to the source at once.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;BR /&gt;Yang&lt;BR /&gt;Community Support Team&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there is any post&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;helps&lt;/EM&gt;&lt;/STRONG&gt;, then please consider&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Accept it as the solution&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;to help the other members find it more quickly.&lt;BR /&gt;If I misunderstand your needs or you still have problems on it, please feel free to let us know.&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Thanks a lot!&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2024 01:19:07 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4234266#M4502</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-10-09T01:19:07Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Streaming for multiple queries are not picking the next batch after processing the fi</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4244275#M4634</link>
      <description>&lt;P&gt;Hi @Anonymous&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is my follow-up just to ask if the problem has been solved?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, can you accept the correct answer as a solution or share your solution to help other members find it faster?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you very much for your cooperation!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;BR /&gt;Yang&lt;BR /&gt;Community Support Team&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there is any post&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;helps&lt;/EM&gt;&lt;/STRONG&gt;, then please consider&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Accept it as the solution&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;to help the other members find it more quickly.&lt;BR /&gt;If I misunderstand your needs or you still have problems on it, please feel free to let us know.&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;Thanks a lot!&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2024 07:30:31 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Structured-Streaming-for-multiple-queries-are-not-picking-the/m-p/4244275#M4634</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-10-16T07:30:31Z</dc:date>
    </item>
  </channel>
</rss>

