Skip to content

TST: Make test_sql.py parallelizable #60378

Open
@WillAyd

Description

@WillAyd
Member

Feature Type

  • Adding new functionality to pandas

    Changing existing functionality in pandas

    Removing existing functionality in pandas

Problem Description

test_sql.py must be run on a single thread now, because tests re-use the same table names. This can cause a race condition when different parametrizations of a test run on different threads

Feature Description

Add a uuid or something else to the table names in the test_sql.py module to disambiguate

Alternative Solutions

status quo

Additional Context

No response

Activity

added
Testingpandas testing functions or related to the test suite
IO SQLto_sql, read_sql, read_sql_query
on Nov 20, 2024
UmbertoFasci

UmbertoFasci commented on Nov 20, 2024

@UmbertoFasci
Contributor

take

changed the title [-]TST: Make test_sql.py serializable[/-] [+]TST: Make test_sql.py parallelizable[/+] on Nov 20, 2024
UmbertoFasci

UmbertoFasci commented on Nov 21, 2024

@UmbertoFasci
Contributor

@WillAyd I am about halfway through the tests. I am generating a unique table uuid when indicated while maintaining the original context through the prefix.

Before:

@pytest.mark.parametrize("conn", all_connectable)
def test_read_table_columns(conn, request, test_frame1):
    # test columns argument in read_table
    conn_name = conn
    if conn_name == "sqlite_buildin":
        request.applymarker(pytest.mark.xfail(reason="Not Implemented"))

    conn = request.getfixturevalue(conn)
    sql.to_sql(test_frame1, "test_frame", conn)

    cols = ["A", "B"]

    result = sql.read_sql_table("test_frame", conn, columns=cols)
    assert result.columns.tolist() == cols

After made parallelizable:

@pytest.mark.parametrize("conn", all_connectable)
def test_read_table_columns(conn, request, test_frame1):
    # test columns argument in read_table
    conn_name = conn
    if conn_name == "sqlite_buildin":
        request.applymarker(pytest.mark.xfail(reason="Not Implemented"))

    conn = request.getfixturevalue(conn)
    table_uuid = f"test_frame_{uuid.uuid4().hex}"
    sql.to_sql(test_frame1, table_uuid, conn)

    cols = ["A", "B"]

    result = sql.read_sql_table(table_uuid, conn, columns=cols)
    assert result.columns.tolist() == cols

Let me know if you would like this done in a different fashion.

WillAyd

WillAyd commented on Nov 21, 2024

@WillAyd
MemberAuthor

Seems reasonable. Probably worth a helper function in the module to not have to repeat the same code in each function, but what you have looks like its headed in the right direction

arashgodgiven

arashgodgiven commented on Dec 4, 2024

@arashgodgiven

Hi, noticed this issue is being worked on. Is there any way I can assist or take on parts of the work to contribute? I'm interested in helping out.

UmbertoFasci

UmbertoFasci commented on Dec 4, 2024

@UmbertoFasci
Contributor

Hi @arashgodgiven I am fairly close to wrapping this up. I am just handling the iris and types connectables now. Keep an eye out though, thanks for asking!

narutonamikaze

narutonamikaze commented on Dec 18, 2024

@narutonamikaze

hey ! i think the issue still hasn't been resolved, and would like to work upon it immediately !

UmbertoFasci

UmbertoFasci commented on Dec 18, 2024

@UmbertoFasci
Contributor

All that is remaining is handling the test cases that share a database state through a couple of fixtures which is only a couple of connectable sets. Submitting PR this coming Friday.

UmbertoFasci

UmbertoFasci commented on Dec 20, 2024

@UmbertoFasci
Contributor

@WillAyd I have gone ahead and submitted a PR for this issue, however I believe there is an issue with the current reviewer.

dougnovellano

dougnovellano commented on May 8, 2025

@dougnovellano

It looks like this PR was closed. Is this issue still being worked on, or being refined?

dShcherbakov1

dShcherbakov1 commented on May 20, 2025

@dShcherbakov1

@dougnovellano I'm interested in taking a look at this issue.

Is there anyone working on it now? I would be happy to start investigating this, or help collaborate if someone's already started and needs help tackling a specifically-scoped problem

Saberghanbarnejad

Saberghanbarnejad commented on May 23, 2025

@Saberghanbarnejad

Hi, I’m interested in helping make test_sql.py parallelizable by generating unique table names. Is this issue still open for contributions, considering PR #60595? Any guidance on next steps? Thanks!

dShcherbakov1

dShcherbakov1 commented on May 23, 2025

@dShcherbakov1

Hello @Saberghanbarnejad, I've set up a VM, and configured the necessary databases. This should provide a proper environment for running these integration tests. From here on out, it's just the Python code of, well, modifying the actual tests. Would you be interested in collaborating on this?

Saberghanbarnejad

Saberghanbarnejad commented on May 26, 2025

@Saberghanbarnejad

Hi @dShcherbakov1, thanks for setting up the VM and databases! I’m excited to collaborate on modifying the test_sql.py tests. Could you share more details on the changes needed?
Also, any pointers to relevant docs or examples would be great.
Thanks!

dShcherbakov1

dShcherbakov1 commented on May 26, 2025

@dShcherbakov1

Hello @Saberghanbarnejad, would you mind sending me an email at daniel.shcherbakov@gmail.com, email would probably be a better place to plan out technical details like that
I'm excited to collaborate on this!

linked a pull request that will close this issue on Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

IO SQLto_sql, read_sql, read_sql_queryTestingpandas testing functions or related to the test suite

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @WillAyd@mroeschke@arashgodgiven@UmbertoFasci@Saberghanbarnejad

    Issue actions

      TST: Make test_sql.py parallelizable · Issue #60378 · pandas-dev/pandas