Re: [DISCUSS] Adding static code analysis into our review process


Chris Hupman <chris...@...>
 

After getting over 100 emails in the past 2 days on the janusgraph-python PR, I must admit I am much more sympathetic to Dmitry's complaint about email spam from in-line comments. The email spam is definitely an issue on very large pull requests, but seems to be very acceptable for PRs under 100 lines of code. Thanks for submitting the feature request Florian. 


On Friday, March 22, 2019 at 7:42:25 AM UTC-7, Florian Hockmann wrote:
As for the number of emails is it possible to configure Codacy to do a single review instead of a review comment per issue?

That sounds like a good idea for an improvement. I didn't find any option to configure how Codacy adds these inline comments, so I submitted that as a feature request to them. (I have no idea though what they do with these feature requests.)

And yes, the PR template is probably the best place as it already contains a hint to check if the build fails on Travis.

Am Donnerstag, 21. März 2019 21:08:38 UTC+1 schrieb Chris Hupman:
I'm definitely on the side of wanting as much automation for PR reviews as we can. As for the number of emails is it possible to configure Codacy to do a single review instead of a review comment per issue?

For documentation we could probably just put a line in the Pull Request instead of, or in addition to CONTRIBUTING.md template that references that if the submitters disagree with Codacy to please ask questions in the conversation tab. 

On Wednesday, March 20, 2019 at 1:40:42 AM UTC-7, Florian Hockmann wrote:
The link to the full report it adds in the checks area is IMO absolutely enough. Reviewers can just ask authors to address all codacy issues they agree with

But for that, the reviewer has to open the Codacy report in a separate windows and then correlate which issue affects which part of the contribution while reviewing that. To be honest, I wouldn't do that as a reviewer as it's too much manual work which is exactly we want to have less of thanks to automation. Having the comments directly in the diff view on GitHub shows them exactly in the right place while reviewing. If Codacy for example comments about an unused variable, then I won't have to comment on that again (or even look for it).

It has basically forced me to make those items public, which I don't mind really, but I equally don't understand what was the issue in the first place, and how can this help either readability or anything else.

You shouldn't change anything because a tool suggests to do so, if you don't agree with the suggestion. (The same basically applies to human reviewers, only that you can ask them for their reasoning.) If you don't agree with a Codacy rule in general, then please just answer to the comment or mention it here as we can then disable that rule if we agree that we don't want it. If Codacy is just wrong in a specific case, then we just ignore its suggestions.
But you should really never just blindly follow a suggestion from such a tool if you disagree with it.

Since Codacy seems to be causing some confusion with its suggestions, do you have a suggestion how we could make it clear that Codacy's comments are really just suggestions? Would it help if we added something about Codacy in CONTRIBUTING.md?

Am Samstag, 16. März 2019 13:49:55 UTC+1 schrieb Dmitry Kovalev:
Hi Florian,

On the examples of "stale" codacy-bot issues not being deleted/resolve automatically - please have a look at the PR I mentioned below. The ones which are manually resolved by me are fixed in the code but were left open by the bot. I've been thinking why it would close some but not others, and one theory I have is that my force-pushing the DCO sign-off could have interfered. It could be that not being able to find the sha it was reviewing before prevented it from detecting that it was fixed correctly.

For the examples of "snowballing" - upon a closer look, it seems that I misinterpreted it a little bit - apologies. It has reported exact same "issues" multiple times, including in the same file, for each occurrence, and the appearance of new comments on PR timeline could have interleaved with my commit fixing them, leading me to believe that it was posting the same "issue" more than once. Or possibly again the pushing of the same commit with different sha for DCO sign-off also played a role. So this point is a little bit inconclusive.

To be clear, I still think that while it could work ok for small commits and bugfixes with < 3 codacy "issues", in general posting a comment for each "issue" occurrence identified by this bot just clogs the PR discussion, and won't be helping actual reviewers either, because they will have trouble even locating their own comments in the mess of the bot messages.

The link to the full report it adds in the checks area is IMO absolutely enough. Reviewers can just ask authors to address all codacy issues they agree with, and then discuss those left. To avoid asking this manually every time, the PR "template" can be amended to include this proposition to the authors. In this context it could be good if a specific occurrence of an issue could be commented on, or "moved to comments" by a click - then the author can explain why they don't want to fix it, or reviewers can bring authors attention to it. But all of them pushed as comments by default is not such a good idea IMO.

On examples of usefulness of the issues - one example is its complaints about package-local visibility (which in my case was totally intentional to allow testing but not "external" usage). It has basically forced me to make those items public, which I don't mind really, but I equally don't understand what was the issue in the first place, and how can this help either readability or anything else.

But to be honest I am not sure if we want to start a discussion of these things here, because it has a potential to become very time consuming without real outcome. Over time the reviewers may reach their own consensus on which "issues" are useful and which are not, and hopefully it will be possible to tune the bot accordingly.

For my PR specifically, I fixed the "issues" which were trivial, even if I didn't see any point in them, and left the rest of them for now.  My hope is that when it is going to be actually reviewed by a human, the reviewers will let me know which of the "issues" they want to be fixed in order to approve.

Many thanks,
Dmitry
On Friday, 15 March 2019 14:29:22 UTC, Florian Hockmann wrote:
Thanks for your feedback on this, Dmitry!

I can see that some of the "stale" "issues"/comments are deleted by "JanusGraph" (another bot?), but some remain intact even though they are actually addressed, and it seems like it duplicates the same comments with every new commit if the committer "dared" to ignore them, so it has a potential to snowball to insane levels unless it is pacified completely, and each and every "issue" is resolved to its satisfaction.

Are you sure about that? Could you provide examples?
I'm only asking because I haven't seen this behaviour yet, but it may of course be possible that I just overlooked it since we just had a few PRs with Codacy enabled.

I hope that the mode where each and every "issue" it finds is posted as a comment can be switched off

Yes, we can turn that off. If others also find this annoying, then we will just turn this part off.
I think these inline comments have some value because they are easily visible to actual reviewers who don't have to repeat comments Codacy already made.

Regarding the usefulness of the issues, I think it would be good to have some concrete examples to discuss this. How about you just comment on issues Codacy complained about that you find useless? Then we can evaluate whether the rule doesn't make sense for us in general or whether it's just a FP because Codacy misses context but where the rule can still make sense in other cases.
In general, contributors don't have to address all issues Codacy complains about. Just because such a tool makes a comment doesn't mean that anything has to be changed. It's expected that a tool like this produces false positives. That is also why we don't use Codacy to determine whether a PR can be merged or not.

I unfortunately didn't have the time to really inspect your PR, but Codacy complained for example about a method simply for being too long and complex.
This method has over 100 lines an arguably some complexity. I really haven't looked at the method in detail, but I'm pretty sure that, as a reviewer, I would also ask whether this method could be split into multiple smaller methods as it's really hard to understand and therefore maintain such a long and complex method.
So, this is an example where Codacy can already comment on issues before a reviewer finds time to review the PR.

Since we're short on reviewers in general, I welcome every help we can get from automation for the review process. A reviewer shouldn't have to spend time on things that we can also just automate.
But on the other hand such a tool shouldn't be annoying for contributors, so it's good that we're having this discussion.

Just to add even more colour - apparently it sends an email for each "issue" comment

You are referring to the mails GitHub sends, right? These are simply sent for every comment and it should only happen if you either watch the whole repo or are subscribed to notifications of the pull request.
So, there isn't much we can do about these emails if we want to keep the inline comments (which is up for debate of course).

In general, it would be good to hear more opinions on this matter. Do people find the inline comments useful or would you prefer if we turned them off?

Am Freitag, 15. März 2019 02:16:53 UTC+1 schrieb Dmitry Kovalev:
Just to add even more colour - apparently it sends an email for each "issue" comment, so in addition to completely clogging the PR page it also has just sent 36(!) emails to me - I think this  puts it firmly in the SPAM territory :)

On Friday, 15 March 2019 00:51:57 UTC, Dmitry Kovalev wrote:
My recent PR https://github.com/JanusGraph/janusgraph/pull/1483 must be one of the first victims of this change, and I thought I should provide some feedback from the contributor's perspective while it is fresh :)

Without heading straight into discussion about the actual usefulness of 99.9% of the issues that this Codacy tool has found, I would like to check what people think about the way they are presented.

I think that the idea of posting each "issue" as a comment on PR leads to this bot completely hijacking any meaningful discussion(s) on the PR. And it is clearly unable to clean up after itself when the "issues" are addressed. 

I can see that some of the "stale" "issues"/comments are deleted by "JanusGraph" (another bot?), but some remain intact even though they are actually addressed, and it seems like it duplicates the same comments with every new commit if the committer "dared" to ignore them, so it has a potential to snowball to insane levels unless it is pacified completely, and each and every "issue" is resolved to its satisfaction.

I hope that the mode where each and every "issue" it finds is posted as a comment can be switched off, otherwise IMO this bot does much more harm than good, and is basically just a big nuisance. There is a link to full report from the "check" entry, and I think this is the only thing required on the PR page.

Coming back to the usefulness of the "issues" it reports - so far I have seen 3 broad categories:
1) "issues" which are absolutely useless in terms of code quality or readability, but at least are easy to "fix"
2) "issues" which could be helpful (such as method complexity) if they were tuned to more reasonable thresholds
3) "issues" which directly contradict the intention/design, and dismiss it as something to avoid, without providing any meaningful explanation or alternative

I am sure that it is also capable of detecting actual issues such as coding errors, deficiencies and potential runtime errors - but as currently configured, they are likely to be buried under the immense amounts of useless spam it generates

Any thoughts?

Many thanks,
Dmitry


On Friday, 1 March 2019 13:08:03 UTC, Florian Hockmann wrote:
We now have Codacy activated for all repos in the JanusGraph organization. The dashboard for the main repo can be found here:


Codacy automatically checks pull requests to verify that they don't introduce new issues. By default, it only adds that information as a check, just like how Travis reports the build result. I now also configured Codacy to add inline comments directly where a new issue is introduced in the PR and also a summary comment.
We can of course change that if we want less integration of Codacy in the PR process.

Codacy will likely complain about a lot of issues that we don't really care about, so false positives basically. I suggest therefore that we don't treat Codacy as a gateway keeper that prevents pull requests from being merged, especially in the beginning.
Instead, we should determine which checks don't make sense for us so we can disable them. So, if you spot any things Codacy complains about but it shouldn't in your opinion, then just write here and we can remove them.

Codacy will hopefully make the review process easier as reviewers don't have to look for issues that Codacy can already find and contributors get quicker feedback if they introduce new issues.

Committers / TSC members: If you want to be able to configure Codacy for JanusGraph, then please just send me an email and I will add you: f...@.... (I already invited some of you where I have your email addresses already.)

Am Mittwoch, 29. August 2018 15:50:47 UTC+2 schrieb Florian Hockmann:
JanusGraph currently uses Coverity as a static code analysis tool. While this already led to some findings, it can unfortunately not be used to analyse code changes from pull requests. I wondered whether we could improve our review process with a service that analyses changes in pull requests and directly comments on the PRs to report its findings. This hopefully makes reviews more efficient as reviewers don't have to comment on obvious style issues for example and it could find problems before they are added to production branches.

I already searched for a service that we could use. In my opinion, it should fulfil these requirements:
  1. Support for the languages we currently use / might add in the near future: Java, C#, Python, JavaScript / TypeScript.
  2. Integration into code reviews in GitHub.
  3. Free for open source projects with enough scans per day (some services limit these to 5 per day which could be not enough).
GitHub already lists services that can be used for code reviews here. Codacy seems to be the only service of those listed there that fulfils all three requirements. I wanted to try out Codacy and therefore already let it analyse forks of JanusGraph and the version of JanusGraph.Net that is currently in review:
So, what do others think about the idea to integrate a code analysis service into our review process in general? And are there any other suggestions for such a service besides Codacy?

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.