From 0 Doc Types to Full Type Declaration with Dynamic Analysis

This post was updated at August 2020 with fresh know-how.
What is new?

Updated Rector YAML to PHP configuration, as current standard.


I wrote How we Completed Thousands of Missing @var Annotations in a Day. If you have at least some annotations, you can use Rector to do the dirty work.

To be honest, open-source is the top 1 % code there is, but out in the wild of legacy established PHP companies, it's a miracle to see just one string type declaration.

Are these projects lost? Do you have to quit them? And what if the annotations are lying?

I had a great trip with a friend of mine Dave Liddament after PHP Day 2019 in Verona. During Venice sightseeing in beautiful wild rain, we had a short coffee break to talk about Rector. Dave was amazed by what Rector can do for the developer, with such a few lines of YAML config.

As a proper curious developer, he challenged me with series of "Can it do...?" or "How can it...?" questions.


The one we spent almost an hour on one was:

"How can Rector help with type declarations, if there are no docblocks and no type hints?"

<?php

class SomeClass
{
    public function run($value)
    {
        return $value;
    }
}

First I was cold stone and said: "That's far beyond static analysis. That's a job for a human, Rector can't help here".

I vividly recall I was sure was this is impossible (← now this line my motto pick project that is interesting enough, lol).


But David sticks with it questioning:

 <?php

 class SomeClass
 {
-    public function run($value)
+    public function run(string $value): string
     {
         return $value;
     }
 }

Detect Every Argument Type

I started to see a very small candle at the end of the tunnel and said:

There is a similar technique for dead-code analysis - tombs. But the problem is, it's not written in PHP. And if it's not written in the language we use, we are not able to extend it or fix it. That's why PHPStorm plugins written Java take so long to catch up with framework releases.

We wanted to have the code just once in the whole application. If possible in the end and collect all the method calls and their arguments. We tried to use register_shutdown_function and debug_trace for it. But after some time spends hacking them, we gave up.

So it will have to be good old static call under each class method, something like:

 <?php

 class SomeClass
 {
    public function run($value)
    {
+       TypeCollector::collect($value, __METHOD__, 0);
        return $value;
    }
 }

What about Performance?

file_put_contents() takes ~10 ms for 10 000 writes writes, so writing in filesystem might work.

Still, it's safer to use feature toggles or direct small fraction of traffic to a standalone server with these static methods.

How Long Should we Collect Data?

This needs to be tested in the wild. It depends on many factors, for a blogging platform, it can be a week of data. For a payment system, a month would be better, maybe more to be sure.

Also, the same way we collect data first with feature toggle/traffic fraction, we can test added types after they're added to the code.

The Simple Idea

To make the idea more solid, we looked for the edge cases:

The idea is pretty clear, right?

How can we bring it to all PHP developers in Need?

It all seemed like a nice brain exercise for our brains... but we looked for practical appliance that would help every PHP developer in the world.

To automate this process fully, we came with 4 automated steps:

This was May 2019 and it was just an idea. Now, 6 months later, I'm proud to say this 4-step process is now possible. I've merged the PR into Rector just a few minutes ago.

Step 1 - Add Type Collector

use Rector\DynamicTypeAnalysis\Rector\ClassMethod\DecorateMethodWithArgumentTypeProbeRector;
use Rector\Config\RectorConfig;

return function (RectorConfig $rectorConfig): void {
    $rectorConfig->rule(DecorateMethodWithArgumentTypeProbeRector::class);
};
vendor/bin/rector process src

Step 2 - Wait for it...

Step 3 - Complete Collected Types

use Rector\DynamicTypeAnalysis\Rector\ClassMethod\AddArgumentTypeWithProbeDataRector;
use Rector\Config\RectorConfig;

return function (RectorConfig $rectorConfig): void {
    $rectorConfig->rule(AddArgumentTypeWithProbeDataRector::class);
};
vendor/bin/rector process src

Step 4 - Remove Type Collector

use Rector\DynamicTypeAnalysis\Rector\StaticCall\RemoveArgumentTypeProbeRector;
use Rector\Config\RectorConfig;

return function (RectorConfig $rectorConfig): void {
    $rectorConfig->rule(RemoveArgumentTypeProbeRector::class);
};
vendor/bin/rector process src

It's not Perfect, But Done

In all means, it's not perfect. There is still missing support for arrays, nested arrays, type co/ntra/variance, return types, union types, etc. But it's ready to be tested and prototype works (at least that's what unit tests say).

Now it's up to you. Make your code-base filled with real data it already uses. No guessing, no hoping, just science fully-automated.


Last but not least, thank you Dave for a great afternoon and sorry it took me so long to publish this.


Happy lazy coding!




Do you learn from my contents or use open-souce packages like Rector every day?
Consider supporting it on GitHub Sponsors. I'd really appreciate it!