Last week, I kicked off the first post about tips and tricks with GPT. In the meantime, Marcel posted a great practical piece on GPT and solutions based on exception messages.
Today, we look into 2 pre-trained models that GPT provides - DaVinci and Codex - and how to talk to them to get what we need.
First, let's define what pre-trained model we will talk about:
Which one is better for generating unit tests? Intuitively, the second one, right? Let's try it out on a practical example.
First, we add a PHP SDK for GPT by Nuno Mauduro to our new project:
composer require openai-php/client
Then we create a new file, generate-test.php
and initialize the OpenAI client there:
<?php
require __DIR__ . '/vendor/autoload.php';
$client = OpenAI::client('<YOUR_API_KEY>');
How simple is that? Oh, where do you get the API key? Right here.
The last step is to ask the prompt. To keep it simple, we provide the desired model and our prompt:
$result = $client->completions()->create([
'model' => '<pretrained model>',
'prompt' => '<our prompt>',
]);
Then we render the response like this:
echo $result['choices'][0]['text'];
Great! Now that we have the bare code, lets jump to the fun part - prompting ↓
The prompt is a question - it can be a short string like "What is the best way to learn Laravel 10", or in our case - long as the provided PHP code that we want to test. To keep our code clear, we'll add prompt contents to the prompt.txt
file and load its file contents:
$result = $client->completions()->create([
'model' => 'text-davinci-003',
'prompt' => file_get_contents(__DIR__ . '/prompt.txt'),
]);
So, what exactly do we want from the GPT? Generate unit test. It's better to ask for specific details. Like testing framework ("PHPUnit), the public method name we want to test ("someMagic"), that we want the data provider, and how many cases it must contain.
Imagine you're talking to a human. The more specific we are, the better the answer will be. But if we start a 5 minutes monolog, about how to write a test, the person will get bored and stop listening. It will be tough for them to see what is essential for us.
Use the same practical language as you would use in a real conversation. Be specific, but keep it to the point.
Generate PHPUnit test for a "someMagic()" method with data provider of
4 use cases for this code:
'''php
<?php
class SomeClass
{
public function someMagic(int $firstNumber, int $secondNumber)
{
if ($firstNumber > 10) {
return $firstNumber * $secondNumber;
}
return $firstNumber + $secondNumber;
}
}
'''
We place this content to prompt.txt
and run our script:
php generate-test.php
In 2-5 seconds, we should get an answer:
<?php
use PHPUnit\Framework\TestCase;
Oh, what is this? It starts as a test case, but some crucial part is missing.
That's because the GPT answer is limited by default to 16 tokens.
We can increase it to more by using max_tokens
parameters:
$result = $client->completions()->create([
'model' => 'text-davinci-003',
'prompt' => file_get_contents(__DIR__ . '/prompt.txt'),
'max_tokens' => 1000
]);
The response will be longer now. If we ever get a longer test case that won't fit, we'll increase max_tokens
.
php generate-test.php
And voilá, our generated test is here ↓
<?php
use PHPUnit\Framework\TestCase;
class SomeClassTest extends TestCase
{
/**
* @covers SomeClass::someMagic
* @dataProvider someDataProvider
*/
public function testSomeMagic($firstNumber, $secondNumber, $expected)
{
$someClass = new SomeClass();
$result = $someClass->someMagic($firstNumber, $secondNumber);
$this->assertEquals($expected, $result);
}
public function someDataProvider()
{
return [
[5, 5, 10],
[12, 5, 60],
[15, 10, 150],
[20, 30, 600],
];
}
}
At first sight, it looks like a valid PHP code. We can use it as it is!
Often, the output needs cleaning from comments, fixing text artifacts, adjusting to best practices like setUp()
, using yield
, the correct namespace, adding strict types, removing pointless @covers
, using PHP 8 attributes syntax and other tedious steps that testgenai.com handles for you.
Let's try the other model now ↓
We already have the code ready, so we only change the model:
$result = $client->completions()->create([
'model' => 'code-davinci-002',
'prompt' => file_get_contents(__DIR__ . '/prompt.txt'),
'max_tokens' => 1000
]);
And give it a go:
php generate-test.php
Voilá the response:
Data provider:
| firstNumber | secondNumber | returned value | message |
| ----------- | :----------- | -------------- | ---------------- |
| 5 | 3 | 3 | less than 10 |
| 5 | 15 | 35 | more than 10 |
| 20 | -5 | -100 | first less than 0 |
| -20 | 150 | 30 | second less than 0 |%
Oh, what is this mess? Does that look like a PHPUnit test? I don't think so!
What is going on? The prompt is the same, we explained everything, but it seems the code-davinci-002
model has no clue about what we want. It generates a table (for me) or another mess until the 1000 tokens are used.
I'm very grateful to Marcel for sharing the following trick with me.
The DaVinci is rather generic and can figure out what we need. Yet when we go more complex, it returns more unstable results.
The Codex, on the contrary, needs more context - context is everything, right? We must be even more specific and show an example of what we want.
In practice, we'll hardcode 2 new snippets to the prompt:
Generate PHPUnit test for a "someMagic()" method with data provider
of 4 use cases for this code:
'''
<?php
class SomeClass
{
public function combine($first, $second)
{
return $first + $second * 10;
}
}
'''
Output:
'''
<?php
use PHPUnit\Framework\TestCase
final class SomeTest extends TestCase
{
/**
* @dataProvider provideData()
*/
public function test(int $first, int $second, int $expectedResult)
{
$someClass = new SomeClass();
$result = $someClass->combine($first, $second);
$this->assertSame($expectedResult, $result);
}
}
'''
Now generate a test for this code:
'''
<?php
class SomeClass
{
public function someMagic(int $firstNumber, int $secondNumber)
{
if ($firstNumber > 10) {
return $firstNumber * $secondNumber;
}
return $firstNumber + $secondNumber;
}
}
'''
Result:
'''
Imagine we're teaching another human being, and they need an example. What is "generate unit test" mean exactly? Well, precisely this!
Now, I'll let you run the script for yourself to enjoy the surprise of the result:
php generate-test.php
Oh, in my case, it generates the test but keeps looping with some markdown and mess until it reaches the 1000 tokens.
Another tip from Marcel is to use a stop
parameter. This parameter stops the generation when it is reached. We'll get only the first generated test as a result ↓
$result = $client->completions()->create([
'model' => 'code-davinci-002',
'prompt' => file_get_contents(__DIR__ . '/prompt.txt'),
'max_tokens' => 1000,
'stop' => "'''"
]);
Now re-run and see for yourself:
php generate-test.php
Play around, discover, and share your experience.
Which one do you prefer? I like the first one at first, but with more complex code, the codex seems more stable. Try testgenai.com to generated test faster on the fly.
Happy coding!
Do you learn from my contents or use open-souce packages like Rector every day?
Consider supporting it on GitHub Sponsors.
I'd really appreciate it!