Our team is currently working on a project that involves loading a substantial amount of data from an on-premise Oracle database to Fabric. During this process, we encountered performance issues related to the Oracle ODBC driver. To ensure that the Oracle ODBC driver can meet our required performance levels, we conducted a series of tests focused on loading data to a parquet file. Our goal was to investigate the driver's capabilities rather than developing a production-ready solution. After several attempts with various implementations and libraries, we successfully loaded the same dataset to a parquet file on the gateway server in approximately 120 seconds, using the ODBC2Parquet tool written in Rust. However, when the load is initiated from Fabric, the same process takes between 300 seconds (with optimization of the fetch buffer size) and significantly longer without optimization. This performance discrepancy is a concern, given the substantial amount of data we need to load from our on-premise Oracle database to Fabric. It is important to note that there is a difference between the array size and the fetch buffer size. The array size refers to the number of rows that can be returned in a single fetch operation, while the fetch buffer size determines the amount of memory allocated for storing the fetched rows. We believe that the array size, which can be set using the ODBC API function "SQLSetStmtAttr" with the attribute "SQL_ATTR_ROW_ARRAY_SIZE", plays a crucial role in the performance difference. To rule out any potential bottlenecks, we conducted tests on the connection between the gateway server and Fabric and found no issues. Furthermore, the performance difference was also observed when loading the same table from the Oracle database to a local parquet file on the gateway server, with and without the array size parameter set. This confirms that the performance issue is related to the Oracle ODBC driver and the array size parameter, rather than any network or connectivity constraints. Interestingly, we observed a similar performance improvement when working with the native Oracle driver. Although the method for setting the value differs, increasing the fetch buffer size resulted in a substantial performance boost. We kindly request that this information be forwarded to your development team for evaluation. It would be highly beneficial if a feature could be implemented in the data gateway to set this attribute, potentially leading to significant performance improvements in our data loading processes from on-premise Oracle to Fabric.
... View more