The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I'm using a fabric notebook with pyspark and I need to convert multiple docx files that are saved in a Fabric Lakehouse file folder to pdf and save them in a new folder.
I've tried:
msgraph with a sharepoint connection. It worked until one day it didn't.
docx2pdf doesn't work in a fabric notebook.
I'm currently attempting pypandoc but I need to get a pdf-engine installed "in the cloud environment" (according to copilot). How do I go about installing the pdf-engine installed? Copilot suggests using !sudo command, but I don't have that type of access.
Thanks for any assistance.
Hi @GenaSea,
Thanks for reaching out to the Microsoft fabric community forum.
You are right, there’s no sudo apt‑get available in Fabric Notebooks, so you can’t shim in a system‑level PDF engine that way. Instead, you can try to bring your own engine via a “binary‑in‑a‑wheel”. Here’s the Python only workflow in a nutshell:
Install the wheels #(This lands both Pandoc and a bundled wkhtmltopdf on every node, no sudo needed.)
Fetch the Pandoc binary #(Grabs a self‑contained Pandoc for Linux.)
Define your converter and wire it up to your Lakehouse.
That’s it no system‑level installs, just pure Python in your Fabric Notebook.
pythonCopyEditfrom notebookutils import mssparkutils
src="abfss://…/docx_folder"dst = f"{src}/pdf_output"
mssparkutils.fs.mkdirs(dst)
for file in mssparkutils.fs.ls(src):
if file.name.lower().endswith(".docx"):
local_doc = f"/tmp/{file.name}"
local_pdf = local_doc.replace(".docx", ".pdf")
# copy → convert → copy back mssparkutils.fs.cp(file.path, local_doc)
docx_to_pdf(local_doc, local_pdf)
mssparkutils.fs.cp(local_pdf, f"{dst}/{file.name[:-5]}.pdf")
pythonCopyEditimport pypandoc
def docx_to_pdf(in_path, out_path):
pypandoc.convert_file(
in_path, 'pdf',
outputfile=out_path,
extra_args=['--pdf-engine=wkhtmltopdf']
)
pythonCopyEditfrom pypandoc.pandoc_download import download_pandoc
download_pandoc()
pythonCopyEdit%pip install pypandoc==1.15 wkhtmltopdf-pack
I would also take a moment to thank @nilendraFabric, for actively participating in the community forum and for the solutions you’ve been sharing in the community forum. Your contributions make a real difference.
If I misunderstand your needs or you still have problems on it, please feel free to let us know.
Best Regards,
Hammad.
Community Support Team
Hi @GenaSea,
As we haven’t heard back from you, so just following up to our previous message. I'd like to confirm if you've successfully resolved this issue or if you need further help.
If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If so, it would be really helpful for the community if you could mark the answer that helped you the most. If you're still looking for guidance, feel free to give us an update, we’re here for you.
Best Regards,
Hammad.
Hi @GenaSea,
I just wanted to follow up on your thread. If the issue is resolved, it would be great if you could mark the helpful reply as solution so other community members facing similar issues can benefit too.
If not, don’t hesitate to reach out, we’re happy to keep working with you on this.
Best Regards,
Hammad.
Hi @GenaSea,
We noticed there hasn’t been any recent activity on this thread. If your issue is resolved, marking the correct reply as a solution would be a big help to other community members. If you still need support, just reply here and we’ll pick it up from where we left off.
Best Regards,
Hammad.
Hi @GenaSea
Try LibreOffice in headless mode. This approach works in cloud environments and preserves formatting without requiring Microsoft Office.
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
18 | |
17 | |
6 | |
5 | |
5 |