-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Insights: apache/spark
Overview
-
0 Active issues
-
- 1 Merged pull request
- 72 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
1 Pull request merged by 1 person
-
[Core]Convert shuffleWriteTime from Nanoseconds to Milliseconds for Consistency with Other Metrics
#50400 merged
Mar 26, 2025
72 Pull requests opened by 52 people
-
[SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator
#50123 opened
Mar 1, 2025 -
[MINOR][DOCS] Add `spark.taskMetrics.trackUpdatedBlockStatuses` description for configuration.md
#50147 opened
Mar 4, 2025 -
Test sbt 1.10.10
#50151 opened
Mar 4, 2025 -
[SPARK-51400] Replace ArrayContains nodes to InSet
#50170 opened
Mar 5, 2025 -
[WIP][SPARK-51411][SS][DOCS] Add documentation for the transformWithState operator
#50177 opened
Mar 6, 2025 -
[SPARK-48311][SQL] Fix nested pythonUDF in groupBy and aggregate in Binding Exception
#50183 opened
Mar 6, 2025 -
[MINOR][DOCS] Remove u prefix for py str
#50185 opened
Mar 6, 2025 -
[MINOR][DOCS] IP -> HOST
#50186 opened
Mar 6, 2025 -
[SPARK-51338][INFRA] Add automated CI build for `connect-examples`
#50187 opened
Mar 6, 2025 -
[MINOR][DOCS] Update foreachbatch docs
#50188 opened
Mar 6, 2025 -
[WIP][SPARK-51395][SQL] Refine handling of default values in procedures
#50197 opened
Mar 6, 2025 -
[SPARK-51430][PYTHON] Stop PySpark context logger from propagating logs to stdout
#50198 opened
Mar 7, 2025 -
log executing SQL text
#50202 opened
Mar 7, 2025 -
session extension comments enhancement
#50204 opened
Mar 7, 2025 -
fix: handle compare_vals turn into str when parse IgnoreColumnType
#50205 opened
Mar 7, 2025 -
[SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true
#50209 opened
Mar 7, 2025 -
[WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0
#50213 opened
Mar 7, 2025 -
[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files
#50215 opened
Mar 8, 2025 -
Change host to ip
#50216 opened
Mar 8, 2025 -
Compute checksum for shuffle
#50230 opened
Mar 10, 2025 -
[SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests
#50231 opened
Mar 10, 2025 -
Delay `Join.metadataOutput` computation until `Join` is resolved
#50240 opened
Mar 11, 2025 -
Update supported_api_gen.py: remove an invalid escape sequence "\_" using a raw string
#50243 opened
Mar 11, 2025 -
[SPARK-51478][ML] Validate SQLTransformer statement by parsed plan
#50244 opened
Mar 11, 2025 -
[SPARK-51479][SQL] Nullable in Row Level Operation Column is not correct
#50246 opened
Mar 12, 2025 -
[SPARK-51398][SQL] SparkSQL supports sorting in the shuffle phase.
#50248 opened
Mar 12, 2025 -
[SPARK-51441][SQL] Add DSv2 APIs for constraints
#50253 opened
Mar 12, 2025 -
[WIP]: Spark 51272 51016 combined: For testing of HA Test
#50263 opened
Mar 13, 2025 -
[SPARK-51414][SQL] Add the make_time() function
#50269 opened
Mar 13, 2025 -
[SPARK-51270] Support nanosecond precision timestamp in Variant
#50270 opened
Mar 13, 2025 -
[SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE
#50271 opened
Mar 13, 2025 -
[SPARK-51512][CORE] Filter out null MapStatus when cleaning up shuffle data with ExternalShuffleService
#50277 opened
Mar 14, 2025 -
[SPARK-51513][SQL] Fix RewriteMergeIntoTable rule produces unresolved plan
#50281 opened
Mar 14, 2025 -
[SPARK-50983][SQL]Part 1.a Add unresolved outer attributes for SubqueryExpression
#50285 opened
Mar 15, 2025 -
[SPARK-51503][SQL] Support Variant type in XML scan
#50300 opened
Mar 17, 2025 -
[SPARK-XXXXX][SQL] Add maxRecordsPerOutputBatch to limit the number of record of Arrow output batch
#50301 opened
Mar 18, 2025 -
[SPARK-50838][SQL] Add Cross Join as legal in recursion of Recursive CTE
#50308 opened
Mar 18, 2025 -
[SPARK-51566][PYTHON] Python UDF traceback improvement
#50313 opened
Mar 18, 2025 -
[SPARK-51187][SQL][SS] Introduce the migration logic of config removal from SPARK-49699
#50314 opened
Mar 19, 2025 -
[SPARK-51548][SQL] Provides configuration to decide whether to copy objects before shuffle.
#50318 opened
Mar 19, 2025 -
[SPARK-51568][SQL] Introduce isSupportedExtract to prevent happening unexpected behavior
#50333 opened
Mar 20, 2025 -
[WIP][SPARK-51423][SQL] Add the current_time() function for TIME datatype
#50336 opened
Mar 20, 2025 -
[SPARK-51583] [SQL] Improve error message when to_timestamp function has arguments of wrong type
#50347 opened
Mar 21, 2025 -
[SPARK-51585][SQL] Oracle dialect supports pushdown datetime functions
#50353 opened
Mar 22, 2025 -
[SPARK-51062][PYTHON] Fix assertSchemaEqual to compare decimal precision and scale
#50365 opened
Mar 24, 2025 -
[SPARK-51592][SQL] Avoid logging makes Driver OOM in AdaptiveSparkPlanExec
#50368 opened
Mar 24, 2025 -
docs: update commet for `nullIntolerant`
#50370 opened
Mar 24, 2025 -
[WIP][SPARK-51439] Support SQL UDF with DEFAULT argument
#50373 opened
Mar 24, 2025 -
[MINOR][DOCS]: corrected spellings and typos
#50376 opened
Mar 24, 2025 -
[SPARK-46860][CORE]Enable basic authentication when downloading jars and other files from secured repos
#50377 opened
Mar 24, 2025 -
[SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing
#50391 opened
Mar 25, 2025 -
[SPARK-51615][SQL] Refactor ShowNamespaces to use RunnableCommand
#50393 opened
Mar 25, 2025 -
[WIP] StateStore state machine
#50394 opened
Mar 26, 2025 -
[Spark-50873][SQL] Prune column after RewriteSubquery rule for DSV2
#50399 opened
Mar 26, 2025 -
[SPARK-51609][SQL] Optimize Recursive CTE execution for simple queries
#50402 opened
Mar 26, 2025 -
[SPARK-51252][SS][TESTS] Make tests more consistent for StateStoreInstanceMetricSuite
#50405 opened
Mar 26, 2025 -
[SPARK-51611][SQL] New iteration of single-pass Analyzer functionality
#50406 opened
Mar 26, 2025 -
[SPARK-51439][SQL] Support SQL UDF with DEFAULT argument
#50408 opened
Mar 26, 2025 -
[SPARK-51614] [SQL] Introduce ResolveUnresolvedHaving rule in the Analyzer
#50409 opened
Mar 26, 2025 -
[SPARK-51616][SQL] Run CollationTypeCasts before ResolveAliases and ResolveAggregateFunctions
#50410 opened
Mar 26, 2025 -
Report metrics for failed writes for V2 data sources
#50413 opened
Mar 26, 2025 -
[SPARK-51637][PYTHON] Implement createColumnarReader for Python Data Source
#50414 opened
Mar 26, 2025 -
[SPARK-51619][PYTHON] Support UDT input / output in Arrow-optimized Python UDF
#50417 opened
Mar 27, 2025 -
[Core]Convert shuffleWriteTime from Nanoseconds to Milliseconds for Consistency with Other Metrics
#50418 opened
Mar 27, 2025 -
[MINOR][FOLLOW-UP][SPARK-51612][SQL]Update Desc As JSON test to use withSQLConf
#50419 opened
Mar 27, 2025 -
[WIP][INFRA] Add a job to guide spark shell
#50423 opened
Mar 27, 2025 -
[SPARK-51629][UI] Add a download link on the ExecutionPage for svg/dot/txt format plans
#50427 opened
Mar 27, 2025 -
[SPARK-51632][INFRA] Passing Command Line Arguments of lint-js to eslint
#50430 opened
Mar 27, 2025 -
[SPARK-50657][DOCS][FOLLOW-UP] Fix the recommended pyarrow version
#50432 opened
Mar 27, 2025
49 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys
#50033 commented on
Mar 18, 2025 • 48 new comments -
[SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance
#50099 commented on
Mar 27, 2025 • 18 new comments -
[SPARK-51298][SQL] Support variant in CSV scan
#50052 commented on
Mar 24, 2025 • 10 new comments -
[SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types
#49453 commented on
Mar 2, 2025 • 8 new comments -
[SPARK-51334][CONNECT] Add java/scala version in analyze spark_version response
#50102 commented on
Mar 7, 2025 • 3 new comments -
[SPARK-37019][SQL] Add codegen support to array higher-order functions
#34558 commented on
Mar 19, 2025 • 3 new comments -
[SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1
#50057 commented on
Mar 24, 2025 • 2 new comments -
[SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
#49928 commented on
Mar 8, 2025 • 2 new comments -
[SPARK-51179][SQL] Refactor SupportsOrderingWithinGroup so that centralized check
#49908 commented on
Mar 14, 2025 • 2 new comments -
[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC
#49528 commented on
Mar 6, 2025 • 2 new comments -
[SPARK-50806][SQL] Support InputRDDCodegen interruption on task cancellation
#49501 commented on
Mar 19, 2025 • 2 new comments -
[SPARK-47573][K8S] Support custom driver log url
#45728 commented on
Mar 5, 2025 • 2 new comments -
[SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction`
#50073 commented on
Feb 28, 2025 • 1 new comment -
[SPARK-51069][SQL] Add big-endian support to UnsafeRowUtils.validateStructuralIntegrityWithReasonImpl
#49773 commented on
Mar 6, 2025 • 1 new comment -
[SPARK-50992][SQL] OOMs and performance issues with AQE in large plans
#49724 commented on
Mar 27, 2025 • 1 new comment -
[SPARK-49485][CORE] Fix speculative task hang bug due to remaining executor lay on same host
#47949 commented on
Mar 12, 2025 • 1 new comment -
[SPARK-49881][SQL] Skipping DeduplicateRelations rule conditionally to improve analyzer perf
#48354 commented on
Feb 28, 2025 • 0 new comments -
[SPARK-51608] Log exception on Python runner termination
#49890 commented on
Mar 26, 2025 • 0 new comments -
[SPARK-48821][SQL] Support `Update` in `DataFrameWriterV2`
#47233 commented on
Mar 7, 2025 • 0 new comments -
[WIP][SPARK-51180][BUILD] Upgrade Arrow to 19.0.0
#49909 commented on
Mar 5, 2025 • 0 new comments -
[SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699
#49983 commented on
Mar 13, 2025 • 0 new comments -
[MINOR][SQL] Format the SqlBaseParser.g4
#49987 commented on
Mar 8, 2025 • 0 new comments -
[SPARK-51250][K8S] Add Support for K8s PriorityClass Configuration fo…
#49998 commented on
Mar 11, 2025 • 0 new comments -
[SPARK-51256][SQL] Increase parallelism if joining with small bucket table
#50004 commented on
Mar 3, 2025 • 0 new comments -
[SPARK-51264][SQL][JDBC] JDBC write keeps driver connection open unnecessarily
#50014 commented on
Mar 27, 2025 • 0 new comments -
[SPARK-51016][SQL] Stage.isIndeterminate gives wrong result in case the shuffle partitioner uses an inDeterministic attribute or expression.
#50029 commented on
Mar 20, 2025 • 0 new comments -
[SPARK-22876][YARN] Respect YARN AM failure validity interval
#42570 commented on
Mar 6, 2025 • 0 new comments -
[WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming
#42352 commented on
Mar 27, 2025 • 0 new comments -
[SPARK-44639][SS][YARN] Use Java tmp dir for local RocksDB state storage on Yarn
#42301 commented on
Mar 6, 2025 • 0 new comments -
[SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET
#50097 commented on
Mar 18, 2025 • 0 new comments -
[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table
#41611 commented on
Mar 5, 2025 • 0 new comments -
[SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions
#32987 commented on
Mar 17, 2025 • 0 new comments -
[SPARK-49953][CORE] Require `errorClass` in `SparkException`
#48440 commented on
Mar 27, 2025 • 0 new comments -
[SPARK-50188][CONNECT][PYTHON] When the connect client starts, print the server's webUrl
#48720 commented on
Mar 12, 2025 • 0 new comments -
[SPARK-50417] Make number of FallbackStorage sub-directories configurable
#48960 commented on
Mar 20, 2025 • 0 new comments -
[SPARK-50464] Support unsigned integer for Arrow
#49022 commented on
Mar 12, 2025 • 0 new comments -
[CORE][SPARK-49788] Add optional TTL based cleaning for blocks to better support notebooks
#49032 commented on
Mar 27, 2025 • 0 new comments -
[SPARK-33152][SQL] Implement new optimized logic for constraint propagation rule
#49117 commented on
Mar 3, 2025 • 0 new comments -
[SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth
#49124 commented on
Feb 27, 2025 • 0 new comments -
[SPARK-47609][SQL] Making CacheLookup more optimal to minimize cache miss
#49150 commented on
Feb 28, 2025 • 0 new comments -
[WIP][SPARK-44662][SQL] Perf improvement in BroadcastHashJoin queries with stream side join key on non partition columns
#49209 commented on
Mar 26, 2025 • 0 new comments -
[SPARK-50639][SQL] Improve warning logging in CacheManager
#49276 commented on
Mar 6, 2025 • 0 new comments -
[SPARK-47193][SQL] Ensure SQL conf is propagated to executors when actions are called on RDD returned by `Dataset#rdd`
#48325 commented on
Mar 27, 2025 • 0 new comments -
[MINOR][INFRA] Do not upload docker build record
#48012 commented on
Mar 5, 2025 • 0 new comments -
[SPARK-50903][CONNECT] Cache logical plans after analysis
#49584 commented on
Mar 5, 2025 • 0 new comments -
[CORE][CONNECT] Add cogroup and cogroupSorted variants for additional KeyValueGroupedDatasets.
#49754 commented on
Mar 6, 2025 • 0 new comments -
[SPARK-43131][SQL][WIP] Add labels to identify UDFs
#49775 commented on
Mar 11, 2025 • 0 new comments -
[SPARK-51094][SQL][TESTS] Adjust some test parameters to work on s390x
#49813 commented on
Mar 6, 2025 • 0 new comments -
[SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException
#49870 commented on
Mar 6, 2025 • 0 new comments