Date   

LF AI&Data Sparklyr Project License Scan and Findings July 2022

Jeff Shapiro <jshapiro@...>
 

Here are the results from the July 2022 license scan of the Sparklyr project. The scan was performed using the Linux Foundation Fossology server. Licenses and copyrights were examined.

The key findings (if any) and license summary can be found in the HTML report, the list of files in the spreadsheet, and also find the SPDX file listed below:

REPORTS:

lfai/sparklyr, code pulled 2022-07-02
- report: https://lfscanning.org/reports/lfai/sparklyr-2022-07-02-ca5947aa-f851-42aa-8d42-f69d62a7b306.html
- xlsx: https://lfscanning.org/reports/lfai/sparklyr-2022-07-02-ca5947aa-f851-42aa-8d42-f69d62a7b306.xlsx
- spdx: https://github.com/lfscanning/spdx-lfai/tree/master/sparklyr/2022-07/sparklyr-2022-07-02.spdx

NOTE: The key findings listed are not new, they were found during one of the PREVIOUS scans.

Please feel free to contact me with any questions about the scan results. Be sure to reply to me directly as I may not get an email sent directly to the distribution list.

Thanks, Jeff

Jeff Shapiro
408-910-7792
jshapiro@...


LF AI Sparklyr Project Scan and Findings

Jeff Shapiro <jshapiro@...>
 

Hi Team,

I'm taking over the licesne scanning responsibilities from Steve Winslow.

Here are the results from the most recent license scan of the Sparklyr project.  The scan was performed on the codebase at:  https://github.com/sparklyr/sparklyr  based on a snapshot from Jan 27, using the Linux Foundation Fossology server.  Licenses and copyrights were examined.

The key findings (if any) and license summary can be found in the HTML report, the list of files in the spreadsheet, and also find the SPDX file listed below:

REPORTS:

lfai/sparklyr, code pulled 2022-01-27
  - report: https://lfscanning.org/reports/lfai/sparklyr-2022-01-27-7b45cf8d-bea8-41fb-8a59-256d10a731d4.html
  - xlsx:   https://lfscanning.org/reports/lfai/sparklyr-2022-01-27-7b45cf8d-bea8-41fb-8a59-256d10a731d4.xlsx
  - spdx:   https://github.com/lfscanning/spdx-lfai/tree/master/sparklyr/2022-01/sparklyr-2022-01-27.spdx


NOTE:  For any key findings listed, they may be new or they may be carried over from the last scan that Steve did last September.

Please feel free to contact me with any questions about the scan results.  Be sure to reply to me directly as I may not get an email sent directly to the distribution list.

Thanks, Jeff

Jeff Shapiro
408-910-7792
jshapiro@...




Re: Yitao's departure from RStudio

Samuel Victor Medeiros de Macedo - IFPE - Campus Recife
 

Hi Yitao,

I wish you the best on your new journey. 

Samuel

Em ter, 18 de jan de 2022 16:55, Jacqueline Z Cardoso <jzcardoso@...> escreveu:

Yitao - Best of luck to you in your new role and thank you for your contributions, we hope to still have you as a part of the LF AI & Data Community.

Edgar - Welcome. We have noted you as the new technical lead for Sparklyr on the LF AI & Data side. You should expect an addition to the technical-projects@... mailing list and you have also been added to the contact list for our technical projects.

Thank you,
Jacqueline 

Jacqueline Z Cardoso
Senior Program Manager
The Linux Foundation
+1 (415) 676-1127 (m)
Hours: Mon-Thurs, 6am-4pm PT

My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours.


On Thu, Jan 13, 2022 at 3:48 PM Hossein Falaki <hossein@...> wrote:
Best of luck Yitao!

On Thu, Jan 13, 2022 at 3:11 PM Yitao Li <yitao@...> wrote:
Dear sparklyr committers and friends,

Hope your 2022 is off to a great start!

I wanted to let you know I am leaving RStudio soon and will start the next chapter of my professional life at SafeGraph on Jan 18th this year. My colleague, Edgar Ruiz (https://github.com/edgararuiz) will be the new maintainer and point-of-contact for sparklyr and a number of sparklyr extensions after my departure. Edgar is an active contributor in the R open-source community, and I am quite excited to learn he will be leading the sparklyr project moving forward.

If you have any question or concern, please do not hesitate to reach out to me at yitaoli1990@..., or to my colleague Edgar at edgar@....

Warm regards,
Yitao


--
Hossein


Re: Yitao's departure from RStudio

Jacqueline Z Cardoso
 

Yitao - Best of luck to you in your new role and thank you for your contributions, we hope to still have you as a part of the LF AI & Data Community.

Edgar - Welcome. We have noted you as the new technical lead for Sparklyr on the LF AI & Data side. You should expect an addition to the technical-projects@... mailing list and you have also been added to the contact list for our technical projects.

Thank you,
Jacqueline 

Jacqueline Z Cardoso
Senior Program Manager
The Linux Foundation
+1 (415) 676-1127 (m)
Hours: Mon-Thurs, 6am-4pm PT

My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours.


On Thu, Jan 13, 2022 at 3:48 PM Hossein Falaki <hossein@...> wrote:
Best of luck Yitao!

On Thu, Jan 13, 2022 at 3:11 PM Yitao Li <yitao@...> wrote:
Dear sparklyr committers and friends,

Hope your 2022 is off to a great start!

I wanted to let you know I am leaving RStudio soon and will start the next chapter of my professional life at SafeGraph on Jan 18th this year. My colleague, Edgar Ruiz (https://github.com/edgararuiz) will be the new maintainer and point-of-contact for sparklyr and a number of sparklyr extensions after my departure. Edgar is an active contributor in the R open-source community, and I am quite excited to learn he will be leading the sparklyr project moving forward.

If you have any question or concern, please do not hesitate to reach out to me at yitaoli1990@..., or to my colleague Edgar at edgar@....

Warm regards,
Yitao


--
Hossein


Re: Yitao's departure from RStudio

Hossein Falaki
 

Best of luck Yitao!


On Thu, Jan 13, 2022 at 3:11 PM Yitao Li <yitao@...> wrote:
Dear sparklyr committers and friends,

Hope your 2022 is off to a great start!

I wanted to let you know I am leaving RStudio soon and will start the next chapter of my professional life at SafeGraph on Jan 18th this year. My colleague, Edgar Ruiz (https://github.com/edgararuiz) will be the new maintainer and point-of-contact for sparklyr and a number of sparklyr extensions after my departure. Edgar is an active contributor in the R open-source community, and I am quite excited to learn he will be leading the sparklyr project moving forward.

If you have any question or concern, please do not hesitate to reach out to me at yitaoli1990@..., or to my colleague Edgar at edgar@....

Warm regards,
Yitao


--
Hossein


Yitao's departure from RStudio

Yitao Li
 

Dear sparklyr committers and friends,

Hope your 2022 is off to a great start!

I wanted to let you know I am leaving RStudio soon and will start the next chapter of my professional life at SafeGraph on Jan 18th this year. My colleague, Edgar Ruiz (https://github.com/edgararuiz) will be the new maintainer and point-of-contact for sparklyr and a number of sparklyr extensions after my departure. Edgar is an active contributor in the R open-source community, and I am quite excited to learn he will be leading the sparklyr project moving forward.

If you have any question or concern, please do not hesitate to reach out to me at yitaoli1990@..., or to my colleague Edgar at edgar@....

Warm regards,
Yitao


asking for logos of sparklyr contributors

Yitao Li
 

Dear sparklyr contributors,
    I hope this email finds you well!
    My name is Yitao. You may have noticed I became the maintainer of sparklyr last year (taking over from Javier).
    I will present sparklyr during the LFAI annual project review on Aug 26th, and would like to take this opportunity to acknowledge all individuals and organizations who have contributed to sparklyr in the past. Can you please send me the official name and logo (in Scalable Vector Graphics format, if possible) of your organization at the earliest convenience?
    Thanks in advance & keep up the great work!
    Cheers,
Yitao Li


sparklyr codebase license scan, Feb. 2021

Steve Winslow <swinslow@...>
 

Dear sparklyr technical-discuss list,

I am providing the results of a license scan of the sparklyr codebase, based on a snapshot of the repos as of February 2. The findings and license summary are available at:


That page lists any key findings/recommendations and the corresponding files in the repo, as well as a summary of all detected licenses. The more detailed catalogue of all files for each license can be obtained at:


Finally, SPDX documents are also available for the license scan results. These can be seen and obtained at https://github.com/lfscanning/spdx-lfai. The SPDX documents contain human-readable and machine-readable details about the license notices and copyright notices for each file that was scanned. Although the project does not need to do anything with these SPDX files, I am providing them in the hopes that they may be helpful for the FOSS license compliance processes for your companies and the broader community.

Please feel free to reach out to me directly with any questions. If so, please cc me (swinslow@...) on any responses, as I might not receive messages sent to the list address.

Best regards,
Steve

--
Steve Winslow
VP, Compliance and Legal
The Linux Foundation


Re: Javier's 2021 transition

Hossein Falaki
 

Hi Javier,

Best of luck in your new venture. I am sure it will be great. And I agree that sparklyr project is doing very well and is growing in usage and adoption.

On Wed, Dec 2, 2020 at 3:32 PM Javier Luraschi <javier@...> wrote:
Hi there sparklyr committers and technical discussions mailing list,

As you might have seen from twitter, I'm going to be leaving RStudio and distancing a bit from the R community starting next year. I'm planning to do my own startup for at least a year, so I want to have a chance to fully focus on that.

Therefore, I will have much less time to contribute to sparklyr; however, this project grew beyond myself, its original corporate sponsor and has found a new home in the Linux Foundation already. It may as well outlast the sponsoring companies and span beyond the time we spent affiliated with particular organizations. To that end, I updated my new contact info with this PR.

Yitao has been doing a great job leading the project now, so please reach out to him if you need anything in particular from sparklyr. For everything else related to RStudio, please contact Sigrid who will fill my role.

My last day is December 31st so we have plenty of time to chat or sort anything you think I should be contributing to.

You can reach me at jluraschi@...

Best,
Javier


--
Hossein


Javier's 2021 transition

Javier Luraschi
 

Hi there sparklyr committers and technical discussions mailing list,

As you might have seen from twitter, I'm going to be leaving RStudio and distancing a bit from the R community starting next year. I'm planning to do my own startup for at least a year, so I want to have a chance to fully focus on that.

Therefore, I will have much less time to contribute to sparklyr; however, this project grew beyond myself, its original corporate sponsor and has found a new home in the Linux Foundation already. It may as well outlast the sponsoring companies and span beyond the time we spent affiliated with particular organizations. To that end, I updated my new contact info with this PR.

Yitao has been doing a great job leading the project now, so please reach out to him if you need anything in particular from sparklyr. For everything else related to RStudio, please contact Sigrid who will fill my role.

My last day is December 31st so we have plenty of time to chat or sort anything you think I should be contributing to.

You can reach me at jluraschi@...

Best,
Javier


Re: sparklyr 1.5 release

Samuel Victor Medeiros de Macedo - IFPE - Campus Recife
 

+1 from me too :)

Em seg., 16 de nov. de 2020 às 16:07, Javier Luraschi <javier@...> escreveu:

+1 from me, great job, Yitao!

On Mon, Nov 16, 2020 at 7:44 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

We are ready to kick off the sparklyr 1.5 release! As usual, please vote on the release branch at your earliest convenience and let me know if there is any objection to submitting sparklyr 1.5 to CRAN on next Monday (Nov 23rd).

Unlike the previous 3 releases, this release will be more focused on improving existing sparklyr features rather than creating new ones. So far, some highlights from this release include the revamped non-arrow serialization routines, addition of 4 new functions to the sdf_* family (2 of which inspired by tidyr), plus manyl dplyr-related improvements and bug fixes:

- Non-arrow serialization routines of sparklyr (which were previously based on CSV file format) were replaced with new ones based on RDS format version 2, benefiting sparklyr in multiple ways:
   * The new serialization format, unlike CSV, enables binary data to be easily transported from R to Spark and allows long-standing serialization headaches such as 2031 and 2763 in sparklyr to be resolved in conceptually simple manners.
   * Raw columns within a R dataframe can now be efficiently imported to Spark as binary columns.
   * `copy_to()` becomes at least 30% faster with RDS serialization -- It is still nowhere as fast as arrow-based serialization. However, the primary goal of rewriting non-arrow serialization routines was improving correctness rather than performance. Any increase in performance was considered a bonus rather than a criterion of success.
   * Spark-based parallel backend (aka 'doSpark') takes advantage of this new serialization format to execute `foreach` loops more efficiently.

- Equivalents of `tidyr::unnest_wider()` and `tidyr::unnest_longer()` for Spark dataframes were implemented as `sdf_unnest_wider()` and `sdf_unnest_longer()` in sparklyr.

- `sdf_partition_sizes()` was implemented to compute partition sizes of a Spark dataframe efficiently

- `sdf_expand_grid()` now provides roughly the Spark equivalent of `expand.grid()` functionalities

- Subsetting operator (`[`) for selecting a subset of columns of a Spark dataframe will be supported in sparklyr 1.5. This is mostly useful within the context of dplyr verbs operating on multiple columns.

- A number of dplyr-related improvements. The full list can be found here

You can also find a more detailed update about this release in here.

Thanks!
Yitao


Re: sparklyr 1.5 release

Javier Luraschi
 

+1 from me, great job, Yitao!

On Mon, Nov 16, 2020 at 7:44 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

We are ready to kick off the sparklyr 1.5 release! As usual, please vote on the release branch at your earliest convenience and let me know if there is any objection to submitting sparklyr 1.5 to CRAN on next Monday (Nov 23rd).

Unlike the previous 3 releases, this release will be more focused on improving existing sparklyr features rather than creating new ones. So far, some highlights from this release include the revamped non-arrow serialization routines, addition of 4 new functions to the sdf_* family (2 of which inspired by tidyr), plus manyl dplyr-related improvements and bug fixes:

- Non-arrow serialization routines of sparklyr (which were previously based on CSV file format) were replaced with new ones based on RDS format version 2, benefiting sparklyr in multiple ways:
   * The new serialization format, unlike CSV, enables binary data to be easily transported from R to Spark and allows long-standing serialization headaches such as 2031 and 2763 in sparklyr to be resolved in conceptually simple manners.
   * Raw columns within a R dataframe can now be efficiently imported to Spark as binary columns.
   * `copy_to()` becomes at least 30% faster with RDS serialization -- It is still nowhere as fast as arrow-based serialization. However, the primary goal of rewriting non-arrow serialization routines was improving correctness rather than performance. Any increase in performance was considered a bonus rather than a criterion of success.
   * Spark-based parallel backend (aka 'doSpark') takes advantage of this new serialization format to execute `foreach` loops more efficiently.

- Equivalents of `tidyr::unnest_wider()` and `tidyr::unnest_longer()` for Spark dataframes were implemented as `sdf_unnest_wider()` and `sdf_unnest_longer()` in sparklyr.

- `sdf_partition_sizes()` was implemented to compute partition sizes of a Spark dataframe efficiently

- `sdf_expand_grid()` now provides roughly the Spark equivalent of `expand.grid()` functionalities

- Subsetting operator (`[`) for selecting a subset of columns of a Spark dataframe will be supported in sparklyr 1.5. This is mostly useful within the context of dplyr verbs operating on multiple columns.

- A number of dplyr-related improvements. The full list can be found here

You can also find a more detailed update about this release in here.

Thanks!
Yitao


sparklyr 1.5 release

Yitao Li
 

sparklyr committers and public technical mailing list,

We are ready to kick off the sparklyr 1.5 release! As usual, please vote on the release branch at your earliest convenience and let me know if there is any objection to submitting sparklyr 1.5 to CRAN on next Monday (Nov 23rd).

Unlike the previous 3 releases, this release will be more focused on improving existing sparklyr features rather than creating new ones. So far, some highlights from this release include the revamped non-arrow serialization routines, addition of 4 new functions to the sdf_* family (2 of which inspired by tidyr), plus manyl dplyr-related improvements and bug fixes:

- Non-arrow serialization routines of sparklyr (which were previously based on CSV file format) were replaced with new ones based on RDS format version 2, benefiting sparklyr in multiple ways:
   * The new serialization format, unlike CSV, enables binary data to be easily transported from R to Spark and allows long-standing serialization headaches such as 2031 and 2763 in sparklyr to be resolved in conceptually simple manners.
   * Raw columns within a R dataframe can now be efficiently imported to Spark as binary columns.
   * `copy_to()` becomes at least 30% faster with RDS serialization -- It is still nowhere as fast as arrow-based serialization. However, the primary goal of rewriting non-arrow serialization routines was improving correctness rather than performance. Any increase in performance was considered a bonus rather than a criterion of success.
   * Spark-based parallel backend (aka 'doSpark') takes advantage of this new serialization format to execute `foreach` loops more efficiently.

- Equivalents of `tidyr::unnest_wider()` and `tidyr::unnest_longer()` for Spark dataframes were implemented as `sdf_unnest_wider()` and `sdf_unnest_longer()` in sparklyr.

- `sdf_partition_sizes()` was implemented to compute partition sizes of a Spark dataframe efficiently

- `sdf_expand_grid()` now provides roughly the Spark equivalent of `expand.grid()` functionalities

- Subsetting operator (`[`) for selecting a subset of columns of a Spark dataframe will be supported in sparklyr 1.5. This is mostly useful within the context of dplyr verbs operating on multiple columns.

- A number of dplyr-related improvements. The full list can be found here

You can also find a more detailed update about this release in here.

Thanks!
Yitao


migration from 'master' to 'main'

Yitao Li
 

sparklyr committers and public technical mailing list,

I'm writing to inform you the default branch of sparklyr has been migrated from 'master' to 'main'.

  • How does this affect you:
Very little. When creating a pull request through the GitHub web UI, GitHub will compare your branch with the upstream 'main' branch by default from now on (instead of comparing with the 'master' branch). The 'master' branch will be eventually deleted assuming this migration goes well. In the meanwhile, I promise to keep my eyes peeled and always double-check any pull request opened gets merged to the 'main' branch of sparklyr, rather than the 'master' branch.

  • Why now:
Reason 1: the default branch being named 'main' rather than 'master' is currently the default convention on GitHub, and in near future it will be the norm rather than the exception. It's best we conform to this convention sooner rather than later.

Reason 2: there is no open pull request or ongoing release for sparklyr at the moment, so, performing this migration now would cause little to no disruption.

  • Further info:
In my opinion https://github.com/github/renaming did a pretty good job documenting all aspects of this renaming process. 

I believe it will be safe to delete the old 'master' branch of sparklyr later this year after the automatic redirection from 'master' to 'main' is happening seamlessly everywhere on github.com (including, say, for use cases such as `remote::install_github("sparklyr/sparklyr", ref = "master")` which I'm aware is the modus operandi for some advanced users).


Please don't hesitate to ping me if you have any questions about this migration.

Best regards,
Yitao


Sparklyr codebase license scan, Sept. 2020

Steve Winslow <swinslow@...>
 

Dear Sparklyr technical-discuss list,

I am providing the results of a license scan of the Sparklyr codebase, based on a snapshot of the repos as of Sept. 1. The findings and license summary are available at:

https://lfscanning.org/reports/lfai/sparklyr-2020-09-01-a1f6fec2-20f0-4fd0-b290-43002f9883ca.html

That page lists any key findings/recommendations and the corresponding files in the repo, as well as a summary of all detected licenses. The more detailed catalogue of all files for each license can be obtained at:

https://lfscanning.org/reports/lfai/sparklyr-2020-09-01-a1f6fec2-20f0-4fd0-b290-43002f9883ca.xlsx

Finally, SPDX documents are also available for the license scan results. These can be seen and obtained at https://github.com/lfscanning/spdx-lfai. The SPDX documents contain human-readable and machine-readable details about the license notices and copyright notices for each file that was scanned. Although the project does not need to do anything with these SPDX files, I am providing them in the hopes that they may be helpful for the FOSS license compliance processes for your companies and the broader community.

Please feel free to reach out to me directly with any questions. If so, please cc me (swinslow@...) on any responses, as I might not receive messages sent to the list address.

Best regards,
Steve

--
Steve Winslow
Director of Strategic Programs
The Linux Foundation


Re: sparklyr 1.4 release

Samuel Victor Medeiros de Macedo - IFPE - Campus Recife
 

+1 from me. An impressive job :)

Em qua., 9 de set. de 2020 às 13:42, Javier Luraschi <javier@...> escreveu:

+1 from me as well. Great Job, Yitao!

On Tue, Sep 8, 2020 at 3:36 PM Hossein Falaki <hossein@...> wrote:
Sounds good to me and this is an impressive list of new features!

On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein


Re: sparklyr 1.4 release

Javier Luraschi
 

+1 from me as well. Great Job, Yitao!

On Tue, Sep 8, 2020 at 3:36 PM Hossein Falaki <hossein@...> wrote:
Sounds good to me and this is an impressive list of new features!

On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein


Re: sparklyr 1.4 release

Hossein Falaki
 

Sounds good to me and this is an impressive list of new features!


On Tue, Sep 8, 2020 at 5:00 AM Yitao Li <yitao@...> wrote:
sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


--
Hossein


sparklyr 1.4 release

Yitao Li
 

sparklyr committers and public technical mailing list,

A non-trivial number of new features and bug fixes have been added to sparklyr in the past 2 months. I think we should kick off the release of sparklyr 1.4 soon.

If there is any objection to cutting the release branch for sparklyr 1.4 Monday next week, please let me know as soon as possible. We will vote on the release branch afterwards, as usual.

Some of new sparklyr features in this release:
- Weighted sampling with and without replacement on Spark data frames (https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
- Specialized implementations of many tidyr verbs for Spark data frames (namely, pivot_wider, pivot_longer, nest, unnest, separate, unite, fill, possibly more of them coming soon)
- Support for more higher-order functions that were introduced in Spark 3.0
- A number of dplyr-related improvements: for example, all higher-order functions, and also, the aforementioned weighted sampling functionalities, are now accessible using the dplyr interface
- New `spark_connect` options to enable RAPIDS GPU acceleration

Some note-worthy bug fixes in this release:
- Fixed an interop issue with Livy+Spark 2.4 (2641)
- Made sure Spark app name setting takes effect in YARN cluster mode (2673)
- Avoid Spark data frame name collision from `spark_read_compat_param` (2653)

Best regards,
Yitao


sparklyr codebase license scan, June 2020

Steve Winslow <swinslow@...>
 

Dear sparklyr technical-discuss list,

I am providing the results of a license scan of the sparklyr codebase, based on a snapshot of the repos as of June 1. The findings and license summary are available at:


That page lists any key findings/recommendations and the corresponding files in the repo, as well as a summary of all detected licenses. The more detailed catalogue of all files for each license can be obtained at:


Finally, SPDX documents are also available for the license scan results. These can be seen and obtained at https://github.com/lfscanning/spdx-lfai. The SPDX documents contain human-readable and machine-readable details about the license notices and copyright notices for each file that was scanned. Although the project does not need to do anything with these SPDX files, I am providing them in the hopes that they may be helpful for the FOSS license compliance processes for your companies and the broader community.

Please feel free to reach out to me directly with any questions. If so, please cc me (swinslow@...) on any responses, as I might not receive messages sent to the list address.

Best regards,
Steve

--
Steve Winslow
Director of Strategic Programs
The Linux Foundation

1 - 20 of 37