sparklyr 1.4 release


Yitao Li
 

sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


Hossein Falaki
 

Sounds good to me and this is an impressive list of new features!


On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein


Javier Luraschi
 

+1 from me as well. Great Job, Yitao!

On Tue, Sep 8, 2020 at 3:36 PM Hossein Falaki <hossein@...> wrote:
Sounds good to me and this is an impressive list of new features!

On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein


Samuel Victor Medeiros de Macedo - IFPE - Campus Recife
 

+1 from me. An impressive job :)

Em qua., 9 de set. de 2020 às 13:42, Javier Luraschi <javier@...> escreveu:

+1 from me as well. Great Job, Yitao!

On Tue, Sep 8, 2020 at 3:36 PM Hossein Falaki <hossein@...> wrote:
Sounds good to me and this is an impressive list of new features!

On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein