Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: Extract string interpolation alignment and format. #19089

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Mar 21, 2025

This PR fixes an issue with missing extraction of alignment and format clauses from string interpolation expressions as described here.

Prior to this change, when we extracted a string interpolation expression $"Hello {name,align:format}. Welcome." (denoted E in the graphs below), it looked like this in the expressions relation

Loading
graph TD;
  id1["#quot;Hello #quot;"]
  id2["name"]
  id3["#quot;. Welcome.#quot;"]
  E--0-->id1
  E--1-->id2
  E--2-->id3

With this change we introduce a new expression for `{name:align:format}, which we call a string interpolation insert expression. The expression tree for extracting a string interpolation expression now looks like,

Loading
graph TD;
  id1["#quot;Hello #quot;"]
  id2["{name,align:format}"]
  id3["#quot;. Welcome.#quot;"]
  id4["name"]
  id5["align"]
  id6["format"]
  E--0-->id1
  E--1-->id2
  E--2-->id3
  id2--0-->id4
  id2--1-->id5
  id2--2-->id6

This change in the expression tree of string interpolation expressions makes it impossible/difficult to make a proper upgrade/downgrade script, as we introduce a new kind of expression and alter the parent/child relationship. That is, to make a proper upgrade script, we would need to introduce new expressions and alter- and add new entries in the expressions and expr_parent relations. We might be able to do it a bit better in the downgrade script, but I have chosen just to change the kind of the expressions to unknown.
Effectively, this change means that we break dataflow for string interpolation expressions, when the upgrade and downgrade scripts are used.
Another alternative could have been to make a tree like

Loading
graph TD;
  id1["#quot;Hello #quot;"]
  id2["name"]
  id3["{align:format}"]
  id4["#quot;. Welcome.#quot;"]
  E--0-->id1
  E--1-->id2
  E--2-->id3
  E--3-->id4

This would make it easier to make backwards compatible upgrade- and downgrade scripts, but I think the chosen design is better as it more closely resembles the structure of the code.

Other relevant implementation details

  • The new expression kind should just used the standard post order control flow.
  • Taint flow for string interpolation expressions are now split in two steps, as we need to propagate taint from name to {name,align:format} in the example.

DCA looks un-eventfull.

@github-actions github-actions bot added the C# label Mar 21, 2025
@michaelnebel michaelnebel force-pushed the csharp/improvestringinterpolation branch from 04621bd to acec97d Compare March 21, 2025 12:33
@michaelnebel michaelnebel force-pushed the csharp/improvestringinterpolation branch from a783354 to d9fb137 Compare March 24, 2025 11:00
@michaelnebel michaelnebel marked this pull request as ready for review March 24, 2025 12:33
@michaelnebel michaelnebel requested a review from a team as a code owner March 24, 2025 12:33
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work. Some small questions.

import csharp

query predicate inserts(InterpolatedStringExpr expr, Expr e) {
expr.getAnInsert() = e // and inSpecificSource(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing comment?

}

query predicate texts(InterpolatedStringExpr expr, StringLiteral literal) {
expr.getAText() = literal // and inSpecificSource(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing comment?

where
expressions(e, k, t) and
if k = 138 then kind = 106 else kind = k
select e, kind, t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this select the insert part of e instead of e?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered doing that as well, but it requires that we also make changes to the expr_parent relation as well (this is what I am vaguely hinting in the PR description).
Basically, I am worried that I break some DB invariant by changing the parent/child relationship as a part of the script.
Should we attempt to make the better downgrade script? It requires
(1) Removing the insert expressions from the expressions relation
(2) Adjust the expr_parent relation
(3) Consider if we need to do something with the new dangling align and format expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants