The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I’m running into the Livy pagination bug that returns:
InvalidHttpRequestToLivy: from cannot be less than 0 HTTP status code: 400.
Setup:
- Orchestrator notebook uses runMultiple() to start worker notebooks in parallel. Job size is approx 1000 tables, so chunking is done to have a DAG for smaller table batches. Orchestrator calls runMultiple() multiple times (once per chunk of 100 tables) -- also tried reducing to chunks of 20 tables.
- Each chunk launches 20 worker notebooks in parallel -- similar post said to increase concurrency and I tried that up to 40
- Worker (child) notebooks return audit and watermark data via mssparkutils.notebook.exit(json_data) -- If I turn off all output from worker notebooks issue stops.
Error:
After processing approx 200 tables over 15-20 minutes no matter how parallelism or chunk size is set,
InvalidHttpRequestToLivy: from cannot be less than 0 HTTP status code: 400
Root Cause Analysis:
- Livy session persists across multiple runMultiple() calls. Seems to keep the same log buffer for the entire spark session which has a fixed limit and appearance is there is no trimming.
- Each chunk's worker output accumulates in Livy's log buffer as it gets passed back to the orchestrator notebook.
- Multiple runMultiple() calls appear to corrupt Livy's cursor state for log reading
- Error occurs when Livy tries to read logs with invalid cursor position
Workarounds Attempted:
- Reduced worker stdout logging (spark.sparkContext.setLogLevel("ERROR"))
- Added delays between chunks
- Livy session "reset" attempts (ineffective)
- Increasing size of spark session through additional nodes and executors
- Decreasing batch size, increasing batch size, decreasing parallel workers (concurrency), increasing parallel workers
Question:
Is this a known issue? Are there proper ways to clear/reset Livy session state between runMultiple() calls, or alternative approaches for large-scale
parallel notebook orchestration?
Impact: Blocking ETL processing of 750K+ tables across 500+ customers.
Solved! Go to Solution.
The solution provided by the product team was to place this line before the runMultiple to prevent the DAG orchestration from displaying its graphics. It seems to be working and they said a full fix would be implemented soon.
spark.conf.set("spark.notebookutils.run.progressbar.enabled", "false")
The solution provided by the product team was to place this line before the runMultiple to prevent the DAG orchestration from displaying its graphics. It seems to be working and they said a full fix would be implemented soon.
spark.conf.set("spark.notebookutils.run.progressbar.enabled", "false")
Hi @tr5610 ,
In this scenario i suggest you to raise a support ticket here. so, that they can assit you in addressing the issue you are facing. please follow below link on how to raise a support ticket:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thanks,
Prashanth Are
MS Fabric community support
Hi @tr5610 ,
We would like to follow up to see if the solution provided by the community member resolved your issue. Please let us know if you need any further assistance.
@J-Mo & @vrsanaidu2025 , thanks for your prompt response.
Thanks,
Prashanth Are
MS Fabric community support
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.
No there has been no solution provided.
Instead of reusing the same session for every runMultiple batch, try Create a new Livy session per DAG chunk. This avoids log cursor conflicts and buffer overflow across chunks.
Did you have any specific detail for this? I believe the livy session and the notebook session can't be seperated. When I forced a restart of the livy session the notebook stopped.
When you say “create a new Livy session per DAG chunk,” do you mean there’s a PySpark or Fabric-native command I can run inside the same notebook to reset/reinitialize the Livy session—before calling runMultiple() again?
Just to clarify, my current setup is entirely notebook-driven:
---One orchestration notebook.
---It uses nbutils.runMultiple() inside a for loop to submit batches (chunks).
---Each runMultiple() call executes a batch of worker notebooks in parallel.
There’s no pipeline orchestration around it, so the notebook session itself stays open while runMultiple() runs multiple times.