-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the default arraysize #475
Comments
Thanks for the data point. What size did you end up using? What was the SQL Server default? And how is performance now in comparison? If anyone else has comments and data, please let us know. The DB API suggested a default arraysize of 1 which is way small. In cx_Oracle 4.4 @anthony-tuininga changed it to 50 and later this was upped to 100 in 5.3 to align with other APIs e.g. in PHP and Node.js. A lot of people seem to doing single-row selects, where obviously a smaller arraysize is going to be better. Other than that, every user seems to have a different use case. When we ran some artificial benchmarks to help choose a default prefetchrows value for 8.0, there were so many factors like network costs, row width, DB load, machine speeds etc that it was difficult to make a generalization. |
No problem and thanks a lot for the prompt reply. I think I am going to use an arraysize of 10000. Here are some more specifics on the data/querying. So the query I used for comparison was one that selects the first 100,000 rows of a table with 9 columns. (This table is essentially the same in the SQL Server and Oracle Database I am using). Here are the results with different array sizes when I was targeting the Oracle Database: I looked at the pyodbc code and from what I could tell, the I also looked at the implementation of So it seems to me like it is a peculiarity of cx_oracle that the Here is how I retrieved the data. cx_oracle:
pymssql/pyodbc:
|
https://pkg.go.dev/github.com/godror/godror?tab=doc |
Thanks for the details and for showing interest in making improvement This discussion is about the cost of fetching data across the network, so your timings could/should be for the fetch, not the execute phase. Of course, the prefetchrows value will play a role, and is used at the execute, so the situation becomes murky to compare. Also by comparing with the execute phase included, all the DB configuration and indexes and DB host disk speed and buffering could play a role. Overall, although I appreciate you saw better 'nearly out of the box' performance with one DB, a direct vendor-to-vendor comparison would justify deeper analysis (including network config) - and is beyond the scope of cx_Oracle itself. (But feel free to tell us more, as it's always good to know).
That is for PL/SQL array sizing and not related to query fetch tuning. The godror query internal buffer sizes for fetcharraysize and prefetchrows are documented just above ( Overall, yes a bigger arraysize (and prefetchrows) will help big transfers, but what is the common case for all users (who don't read the tuning doc) and balances memory allocation costs & sizes (right through the Oracle stack and DB) against network costs?! Out of interest did you check memory and network use with the higher arraysize? Did you tune your SDU sizes? See the links in https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html |
Fair point about the execution time. If the execution time is just the time to run Thanks for pointing that about the Go buffer sizes. I definitely agree that this a hard problem to optimize for. I definitely don't think that I have enough expertise to provide a good answer here. I just wanted provide a data point. Also, for my personal curiosity, do you mind explaining what happens if you have an arraysize of 10k but execute a query that returns only 100 rows? Is there higher latency/memory usage? I used |
I got way more than 10X performance. I set it up to 10,000,000. import cx_Oracle
class Connection(cx_Oracle.Connection):
def cursor(self):
cursor = super(Connection, self).cursor()
cursor.arraysize = 10000000
return cursor |
I am currently noticed a 10x performance discrepancy when transitioning from using a SQL Sever to using a Oracle Database with the almost the same data. I later found out that this was due that this performance discrepancy was due to a low network throughput and I was able to fix this 10x discrepancy by increasing the cx_oracle arraysize.
Even though there is some great documentation on performance tuning, I think that it would be best if users never encounter this performance issue in the first place. So I think that it might be a good idea to increase the default arraysize of 100. I know that this is a tradeoff between memory and throughput and I am not really qualified to tell you what a good number would be here. But I still wanted to bring this to your attention since I think it could possibly improve the library's user experience.
The text was updated successfully, but these errors were encountered: