Skip to content

[libc++] Refactor basic_filebuf::overflow() #144793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2025

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Jun 18, 2025

Refactor the function to streamline the logic so it matches the specification in [filebuf.virtuals] more closely. In particular, avoid modifying the put area pointers when we loop around after a partial codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since the Standard says that we shouldn't try more than once after a partial codecvt conversion. However, this refactoring attempts not to change any functionality.

Refactor the function to streamline the logic so it matches the specification
in [filebuf.virtuals] more closely. In particular, avoid modifying the put
area pointers when we loop around after a partial codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since
the Standard says that we shouldn't try more than once after a partial
codecvt conversion. However, this refactoring attempts not to change any
functionality.
@ldionne ldionne requested a review from a team as a code owner June 18, 2025 21:08
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

Refactor the function to streamline the logic so it matches the specification in [filebuf.virtuals] more closely. In particular, avoid modifying the put area pointers when we loop around after a partial codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since the Standard says that we shouldn't try more than once after a partial codecvt conversion. However, this refactoring attempts not to change any functionality.


Full diff: https://github.com/llvm/llvm-project/pull/144793.diff

1 Files Affected:

  • (modified) libcxx/include/fstream (+36-21)
diff --git a/libcxx/include/fstream b/libcxx/include/fstream
index 00aa00ff7e9cd..f9cbac2ab0578 100644
--- a/libcxx/include/fstream
+++ b/libcxx/include/fstream
@@ -835,35 +835,50 @@ typename basic_filebuf<_CharT, _Traits>::int_type basic_filebuf<_CharT, _Traits>
   }
   if (this->pptr() != this->pbase()) {
     if (__always_noconv_) {
-      size_t __nmemb = static_cast<size_t>(this->pptr() - this->pbase());
-      if (std::fwrite(this->pbase(), sizeof(char_type), __nmemb, __file_) != __nmemb)
+      size_t __n = static_cast<size_t>(this->pptr() - this->pbase());
+      if (std::fwrite(this->pbase(), sizeof(char_type), __n, __file_) != __n)
         return traits_type::eof();
     } else {
-      char* __extbe = __extbuf_;
-      codecvt_base::result __r;
+      if (!__cv_)
+        std::__throw_bad_cast();
+
+      // See [filebuf.virtuals]
+      char_type* __b = this->pbase();
+      char_type* __p = this->pptr();
+      const char_type* __end;
+      char* __extbuf_end = __extbuf_;
       do {
-        if (!__cv_)
-          std::__throw_bad_cast();
-
-        const char_type* __e;
-        __r = __cv_->out(__st_, this->pbase(), this->pptr(), __e, __extbuf_, __extbuf_ + __ebs_, __extbe);
-        if (__e == this->pbase())
+        codecvt_base::result __r = __cv_->out(__st_, __b, __p, __end, __extbuf_, __extbuf_ + __ebs_, __extbuf_end);
+        if (__end == __b)
           return traits_type::eof();
+
+        // No conversion needed: output characters directly to the file, done.
         if (__r == codecvt_base::noconv) {
-          size_t __nmemb = static_cast<size_t>(this->pptr() - this->pbase());
-          if (std::fwrite(this->pbase(), 1, __nmemb, __file_) != __nmemb)
+          size_t __n = static_cast<size_t>(__p - __b);
+          if (std::fwrite(__b, 1, __n, __file_) != __n)
             return traits_type::eof();
-        } else if (__r == codecvt_base::ok || __r == codecvt_base::partial) {
-          size_t __nmemb = static_cast<size_t>(__extbe - __extbuf_);
-          if (std::fwrite(__extbuf_, 1, __nmemb, __file_) != __nmemb)
+          break;
+
+          // Conversion successful: output the converted characters to the file, done.
+        } else if (__r == codecvt_base::ok) {
+          size_t __n = static_cast<size_t>(__extbuf_end - __extbuf_);
+          if (std::fwrite(__extbuf_, 1, __n, __file_) != __n)
+            return traits_type::eof();
+          break;
+
+          // Conversion partially successful: output converted characters to the file and repeat with the
+          // remaining characters.
+        } else if (__r == codecvt_base::partial) {
+          size_t __n = static_cast<size_t>(__extbuf_end - __extbuf_);
+          if (std::fwrite(__extbuf_, 1, __n, __file_) != __n)
             return traits_type::eof();
-          if (__r == codecvt_base::partial) {
-            this->setp(const_cast<char_type*>(__e), this->pptr());
-            this->__pbump(this->epptr() - this->pbase());
-          }
-        } else
+          __b = const_cast<char_type*>(__end);
+          continue;
+
+        } else {
           return traits_type::eof();
-      } while (__r == codecvt_base::partial);
+        }
+      } while (true);
     }
     this->setp(__pb_save, __epb_save);
   }

@@ -835,35 +835,50 @@ typename basic_filebuf<_CharT, _Traits>::int_type basic_filebuf<_CharT, _Traits>
}
if (this->pptr() != this->pbase()) {
if (__always_noconv_) {
size_t __nmemb = static_cast<size_t>(this->pptr() - this->pbase());
if (std::fwrite(this->pbase(), sizeof(char_type), __nmemb, __file_) != __nmemb)
size_t __n = static_cast<size_t>(this->pptr() - this->pbase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this __put_buffer_size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This isn't really the size of the put area though, as that would be epptr() - pbase(). This is really the number of characters inside the put area that we want to write to the underlying file stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't epptr() - pbase() be the capacity? Maybe the terminology is ambiguous here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess if you use vector terminology, then epptr() - pbase() would be the capacity, and pptr() - pbase() would be the size. But here __n also corresponds to the number of characters we want to commit, which is why I kinda like __n.

I am fine with renaming this post-commit if we find something better.

@ldionne ldionne merged commit 52b27c2 into llvm:main Jun 24, 2025
163 of 171 checks passed
@ldionne ldionne deleted the review/stream-refactor-overflow branch June 24, 2025 17:58
DrSergei pushed a commit to DrSergei/llvm-project that referenced this pull request Jun 24, 2025
Refactor the function to streamline the logic so it matches the
specification in [filebuf.virtuals] more closely. In particular, avoid
modifying the put area pointers when we loop around after a partial
codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since
the Standard says that we shouldn't try more than once after a partial
codecvt conversion. However, this refactoring attempts not to change any
functionality.
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
Refactor the function to streamline the logic so it matches the
specification in [filebuf.virtuals] more closely. In particular, avoid
modifying the put area pointers when we loop around after a partial
codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since
the Standard says that we shouldn't try more than once after a partial
codecvt conversion. However, this refactoring attempts not to change any
functionality.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Refactor the function to streamline the logic so it matches the
specification in [filebuf.virtuals] more closely. In particular, avoid
modifying the put area pointers when we loop around after a partial
codecvt conversion.

Note that we're technically not up-to-spec in this implementation, since
the Standard says that we shouldn't try more than once after a partial
codecvt conversion. However, this refactoring attempts not to change any
functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants