by AlbertoGP on 1/4/2023, 7:19:12 AM
Yes, a few days ago there was a story in HN that went mostly ignored: “G-3PO: A Protocol Droid for Ghidra” https://medium.com/tenable-techblog/g-3po-a-protocol-droid-f...
https://news.ycombinator.com/item?id=34161642
That’s a script for the reverse-engineering tool Ghidra that uses GPT-3 to de-compile machine code and to write plain English explanations of what a piece of code does.
The article is quite detailed and describes both its capabilities and its limitations. That G-3PO script is open source, MIT license: https://github.com/tenable/ghidra_tools/tree/main/g3po
There was also another HN story about what at first sight looks like an alternative implementation of the same idea: “GptHidra – Ghidra plugin that asks OpenAI Chat GPT to explain functions”
https://news.ycombinator.com/item?id=34165291
This one is more recent and lacks that good write-up mentioned above. The script is smaller and it seems to have fewer features.
I suggest checking both of them.
by shagie on 1/4/2023, 4:21:59 AM
So, I grabbed a one liner (I didn't want to use too many tokens) from the IOCCC and tossed it into https://beta.openai.com/playground/p/default-explain-code?mo...
This is from the 1987 korn entry which won "best one liner".
https://www.ioccc.org/years.html
main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
"""
Here's what the above C code is doing: (GPT output follows)
1. The first line is a comment.
2. The second line is the main function.
3. The third line is a printf statement.
4. The fourth line is the end of the main function.
The printf statement is the most interesting part.
The first argument to printf is a string.
The string is made up of two parts.
The first part is "\021%six\012\0".
The second part is "(unix)["have"]+"fun"-0x60".
The first part of the string, "\021%six\012\0", is just a normal string.
The second part of the string, "(unix)["have"]+"fun"-0x60", is a little more complicated.
The second part starts with "(unix)".
This is an array.
The array is called "unix".
The rest of the second part, "["have"]+"fun"-0x60", is an expression.
The expression starts with "["have"]".
This is the "have" element of the unix array.
The rest of the expression is "+"fun"-0x60".
This is the string "fun" with 0x60 subtracted from it.
Putting it all together, the string is made up of the following:
1. The string "\021%six\012\0".
2. The "have" element of the unix array.
3. The string "fun" with 0x60 subtracted from it.
The result is the string "have fun".
The 0x21 in the first part of the string is a ! character.
The 0x0a in the first part of the string is a newline character.
Putting it all together, the string is:
"!x
"
The ! character tells printf to print the string in reverse.
The x character tells printf to print the string in hexadecimal.
The newline character tells printf to start a new line.
The result is:
$ ./a.out
0x6e696168
The 0x6e696168 is "niah" in hexadecimal.
This is "have" in reverse.
The result is:
$ ./a.out
0x6e696168
have
Even this takes a lot of output tokens.
Any reasons to believe this wouldn't work? Or similar work I can find?