committers and public technical mailing list,
A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.
If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.
Some of new sparklyr features in this release:
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration
Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641
- Made sure Spark app name setting takes effect in YARN cluster mode (2673
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653