Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

MarkLaf

Anonymously Query Public Google Drives (Fast, Refreshes in Service, No Cloud Project)

GoogleDriveContents( "<Drive ID>" )

MarkLaf_0-1756187645466.png

 

Intro

This function uses the internal takeout-pa.clients6.google.com API to start a zip job on the entire Drive, which creates a short-lived zip archive that is downloaded through the public and anonymous storage.googleapis.com/drive-bulk-export-anonymous endpoint. To access the API, the function extracts the Google Drive web app’s public API key from the HTML of the drive’s page at https://drive.google.com/drive/folders/<Drive ID>.

 

This method mirrors the "Download all" action through the browser and is generally faster than querying all Drive items and downloading each individually. All files within the specified drive, and within all its subfolders, are captured.

 

This is leveraging an internal API, so is not documented, not supported, and liable to break with no notice. The API key is pulled from brittle html. Use in production at your own risk.

 

Function

Here is the custom function to get all public Google Drive contents anonymously. The function is broken up into a number of subfunctions, which each have some light documentation on what they are doing.

 

I added a fair amount of metadata to the final orchestration function, focusing on how to invoke. I've included a screenshot of this metadata rendered in PBI desktop at the bottom of this article for easy reference.

 

// Source: https://community.fabric.microsoft.com/t5/Data-Factory-Community-Blog/Anonymously-Query-Public-Google-Drives-Fast-Refreshes-in-Service/ba-p/4807585

let
    // Use Drive Id to load webpage and extract root drive name {0} and anonymous drive api key {1}
    fxGDriveAnon_Meta = 
    (DriveId as text) as list =>
    [
        DriveWebPageHtml = Text.FromBinary(Binary.Buffer(Web.Contents( 
            "https://drive.google.com", [
                RelativePath = "drive/folders/" & DriveId
            ]
        ))),
        key = List.First( Html.Table(
            DriveWebPageHtml,
            {{
                "keyrow", "script[data-id=""_gd""]", 
                (_) as text => Text.BetweenDelimiters([TextContent], """yLTeS"":""", """")
            }},
            [RowSelector = "html body script"]
        )[keyrow] ),
        title = List.First( Html.Table(
            DriveWebPageHtml,
            {{
                "titlerow", "title", 
                (_) as text => Text.BeforeDelimiter( [TextContent], " - Google Drive", {0, Occurrence.Last} ) 
            }},
            [RowSelector = "html head"]
        )[titlerow] ),
        output = { title, key }
    ] [output],

    // Sends folder id and name to drive utility to start zip job, returns zip job id to check back
    fxGDriveAnon_ZipStart =
    (DriveId as text, DriveName as text, ApiKey as text) as text =>
    Json.Document( Web.Contents(
        "https://takeout-pa.clients6.google.com", [
            RelativePath = "v1/exports",
            Query = [key = ApiKey], 
            Content = Json.FromValue( [
                archivePrefix = DriveName,
                items = { [id = DriveId] }
            ] )
        ] 
    ) ) [exportJob] [id],

    // Uses zip job id to check in with zip utility until done, then returns finished zip metadata table
    fxGDriveAnon_ZipWait = 
    (ZipId as text, ApiKey as text, optional RetryAfterSeconds as number) as table =>
    [
        waitSecs = RetryAfterSeconds ?? 5,
        call = Json.Document( Web.Contents(
            "https://takeout-pa.clients6.google.com", [
                RelativePath = "v1/exports/" & ZipId, 
                Headers = [ PbiToken = Text.NewGuid() ],
                Query = [ key = ApiKey ]
            ]
        ) ),
        output = 
            if call[exportJob][status] = "QUEUED" 
            then Function.InvokeAfter( 
                () => @fxGDriveAnon_ZipWait(ZipId,ApiKey,waitSecs), 
                #duration(0,0,0,waitSecs) 
            ) 
            else Table.FromRecords( 
                call[exportJob][archives],  
                type table [fileName=text,storagePath=text,compressedSize=Int64.Type,sizeOfContents=Int64.Type]
            )
    ] [output],

    // Download and unzip binaries from storagePath column of zip metadata table
    fxGDriveAnon_UnzipDL =
    (StoragePaths as list) as table =>
    [
        BinaryLists = List.Transform(
            StoragePaths, 
            each [
                base = "https://storage.googleapis.com/drive-bulk-export-anonymous",
                rel = Text.AfterDelimiter(_, base&"/"), 
                zip = Web.Contents(base,[RelativePath=rel]), 
                unzip = fxUnzipWithCD(zip) 
            ] [unzip]
        ),
        CombineBinaries = Table.FromRecords( List.Combine( BinaryLists ), type table [FileName=text,Content=binary] )
    ] [CombineBinaries],

    // Unzips zip file using the file metadata from zip's central directory.
    // Note: this is needed, as gdrive does not populate the local file headers with info needed to extract binaries.
    // Source: https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-power-bi/m-p/895318/highlight/true#M30599
    fxUnzipWithCD = 
    (ZIPFileInput) as list => 
    [
        ZIPFile = Binary.Buffer( ZIPFileInput ),
        ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        EDOCfn = BinaryFormat.Record([
            ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
            Magic = uint,
            DiskNum = ushort,
            CDirectoryDiskId = ushort,
            CDirectoryRecordCountOnDisk = ushort,
            CDirectoryRecordCount = ushort,
            SizeOfCentralDirectory = uint,
            CentralDirectoryOffset = uint,
            CommendLength = ushort
        ]),
        EDOC = EDOCfn(ZIPFile),
        BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
        CentralDirectory = BinaryFormat.Length(
            BinaryFormat.Record([
                ZipContent = BeforeCentralDirectory,
                Items = BinaryFormat.List(
                    BinaryFormat.Record([
                        Magic = uint,
                        CurrentVersion = ushort,
                        MinVersion = ushort,
                        Flags = ushort,
                        CompressionMethod = ushort,
                        FileModificationTime = ushort,
                        FileModificationDate = ushort,
                        CRC32 = uint,
                        BinarySize = uint,
                        FileSize   = uint,
                        FileInfo = BinaryFormat.Choice(
                            BinaryFormat.Record([
                                Len = ushort,
                                FieldsLen = ushort,
                                FileCommentLen = ushort,
                                Disk = ushort,
                                InternalFileAttr = ushort,
                                ExternalAttr = uint,
                                PosOfFileHeader = uint
                            ]),
                            (fileInfo) => BinaryFormat.Record([
                                FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                                Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
                                FileComment = BinaryFormat.Text(fileInfo[FileCommentLen], TextEncoding.Ascii),
                                Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                                InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
                                ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
                                PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
                            ])
                        )
                    ]), 
                    EDOC[CDirectoryRecordCount]
                )
            ]), 
            EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]
        ),  
        Contents = List.Transform(
            CentralDirectory(ZIPFile)[Items],
            (cdEntry) => 
            let
                ZipEntry = BinaryFormat.Record([
                    PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]), 
                    Magic = uint,
                    ZipVersion = ushort,
                    ZipFlags = ushort,
                    CompressionMethod = ushort,
                    FileModificationTime = ushort,
                    FileModificationDate = ushort,
                    CRC32 = uint, 
                    BinarySize = uint,
                    FileSize   = uint,
                    FileName = BinaryFormat.Choice(
                        BinaryFormat.Record([
                            Len = ushort,
                            FieldsLen = ushort
                        ]),
                        (fileInfo) => BinaryFormat.Record([
                            FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
                            Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
                        ]) 
                    ),
                    FileContent = BinaryFormat.Transform(
                        BinaryFormat.Binary(cdEntry[BinarySize]), 
                        each Binary.Decompress(_, Compression.Deflate)
                    )
                ])(ZIPFile)
            in
                [FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
        )
    ] [Contents],

    // Take Google Drive ID and optional RetryAfter (seconds) to: 
    // 1) get drive key + drive name, 2) start zip job on drive, 
    // 3) wait until zip is ready for download, and 4) download and unzip
    fxGDriveAnon = 
    ( DriveId as text, optional RetryAfterSeconds as number ) as table =>
    [
        Drive = Record.FromList( 
            List.Buffer( { DriveId } & fxGDriveAnon_Meta(DriveId) & { RetryAfterSeconds } ),
            { "Id", "Title", "AnonKey", "RetryAfter" }
        ),

        ZipId = fxGDriveAnon_ZipStart( Drive[Id], Drive[Title], Drive[AnonKey] ),

        ZipPaths = fxGDriveAnon_ZipWait( ZipId, Drive[AnonKey], Drive[RetryAfter] ),

        Unzip = fxGDriveAnon_UnzipDL( ZipPaths[storagePath] )

    ] [Unzip],

    AddMetadata = Value.ReplaceType(
        fxGDriveAnon,
        type function (
            DriveId as (type text meta [
                Documentation.FieldCaption = "Google Drive ID",
                Documentation.SampleValues = {
                    "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms", 
                    "0FvJkL9sQdRwXyZtAbCdEfGhIjKlMnOpQrStUvWxYz12"
                }
            ]),
            optional RetryAfterSeconds as (type number meta [
                Documentation.FieldCaption = "Retry-After (seconds)",
                Documentation.SampleValues = { 5, 10, 15 }
            ])
        ) as ( type table [FileName=text,Content=binary] )
        meta [
            Documentation.Name = "GoogleDriveContents",
            Documentation.LongDescription = 
                "<p><b>DISCLAIMER</b></br>This function uses the INTERNAL takeout-pa.clients6.google.com API and is thus UNDOCUMENTED, UNSUPPORTED, and BRITTLE.</p><p><b>Description</b></br>Use a public Google Drive's ID to get a flat table of path/name.ext and binaries. Does this WITHOUT needing a) any authentication or b) a Google Cloud Project key. This is achieved by using the internal takeout-pa.clients6.google.com API to start a zip job, which creates a short-lived zip archive that is downloaded through the public and anonymous storage.googleapis.com/drive-bulk-export-anonymous endpoint. To access the API, the function extracts the Google Drive web app’s public API key from the HTML of the drive’s page at https://drive.google.com/drive/folders/<Drive ID>. This mirrors the ""Download all"" action in browsers and is generally faster than querying all Drive items and downloading each individually.</p><p><b>Parameters</b></br><i><b>Google Drive ID</b></i></br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The unique ID of the Google Drive to access. Find at the end of the URL when you open a drive in your browser. I.e., https://drive.google.com/drive/folders/<Google Drive ID></br><b><i>Retry-After (seconds)</i></b> OPTIONAL (default = 5)</br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The function continually checks whether the zip archive job has been completed, which may take seconds to minutes. This parameter specifies how many seconds to wait between each check.</br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;It is recommended that you not go below 5 seconds as this is the timing that Google Drive uses in the browser.</p><p><b>Source</b></br>https://community.fabric.microsoft.com/t5/Data-Factory-Community-Blog/Anonymously-Query-Public-Google-Drives-Fast-Refreshes-in-Service/ba-p/4807585</p>",
            Documentation.Examples = {[
                Description = "Returns table of binaries using default 5 second retry-after.</br><i>Note: Function name (GoogleDriveContents) depnds on what you name query you paste the function into.</i>",
                Code = "GoogleDriveContents(""1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"")",
                Result = "Table.FromRecords({#(lf)    [FileName = ""DriveTitle/data1.csv"", Content = binary],#(lf)    [FileName = ""DriveTitle/Subfolder/data2.csv"", Content = binary]#(lf)})"
            ],[
                Description = "Returns table of binaries using custom 30 second retry-after.</br><i>Note: Function name (GoogleDriveContents) depnds on what you name query you paste the function into.</i>",
                Code = "GoogleDriveContents(#(lf)    ""1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"",#(lf)    30#(lf))",
                Result = "Table.FromRecords({#(lf)    [FileName = ""DriveTitle/data1.csv"", Content = binary],#(lf)    [FileName = ""DriveTitle/Subfolder/data2.csv"", Content = binary]#(lf)})"
            ]}
        ]
    )

in
    AddMetadata

 

Data source configuration

This function leverages three data sources that have to be configured. These configurations should work in both Power BI Desktop and in the service (dataflows and semantic models). In short, all three should be set to Anonymous authentication and Public privacy level. Note that "Skip test connection" must be checked for https://storage.googleapis.com/drive-bulk-export-anonymous  and https://takeout-pa.clients6.google.com.

 

https://drive.google.com/

MarkLaf_0-1756183056126.png

 

https://takeout-pa.clients6.google.com/

MarkLaf_1-1756183089952.png

 

https://storage.googleapis.com/drive-bulk-export-anonymous

MarkLaf_2-1756183119994.png

 

Rendered Metadata

MarkLaf_0-1756247913971.png