|
|
Dans la rubrique
Nombre de visites : 2052 Mise en ligne : 05/2008 Dernière modif : 03/2009 Parsing PHP code
During the development of a code documentor (phpSimpleDoc), I had to solve this problem :
From a set of files containing php code, build the data structures containing : classes, interfaces, functions, constants ; class fields, class methods, class constants ; function parameters. The data structure must store all the informations about the elements (doc comment, file, line...) and the relation between the elements (extends, implements, overrides etc.). This can be decomposed in two steps : - Identify the elements and their caracteristics - Build the data structures expressing the links. This page deals with the first step. Reflection APII first tried the Reflection API ; the interest is to have nothing to do ! The API reflection isolates all the php elements, and provides a function to retrieve the comments (getDocComment()).
I found several (little) problems with Reflection API, exposed on page Notes on PHP Reflection API ; they can be fixed with regular expressions, but the Reflection API is not a solution to write a code documentor, because the code to analyze needs to be loaded and interpreted by the php interpreter, and in certain cases, this triggers fatal errors. To handle this, a possibility was to use Runkit_Sandbox class, and try to deal with the fatal errors, but it's available through a PECL extension, and I wanted to rely only on the PHP standard distribution.
Pear's PHP_ParserSo I searched for a php parser, and tried PHP_Parser, version 0.2.1.I faced problems, listed in page Using Pear PHP Parser. Using PHP_Parser leads to - write code to patch it, - write code to adapt the structures to the needs of phpSimpleDoc. It also has the inconvenient of loading a heavy file Core.php (412 Ko).
So I didn't keep it, but this lead me to the token array, and php function token_get_all(), which converts a string containing php code to an array containing the tokens, elementary pieces of php code. See below and page Working with PHP tokens
Regular expressionsI also thought of parsing the code with regular expressions, but it looks difficult ;look for example this php code :
/**
A comment containing valid php code :
function f1(){}
define('CONSTANT1', 12);
*/
function f1($param1 = 'function f1(){}'){}
Comments and parameter default values can contain valid code of element declaration, and I didn't feel the courage to try to handle that with regex.
Regular expressions pose an other problem : they don't permit to retrieve line numbers (as far as I know), and I absolutely wanted this information for the code documentor. I thought of ways to handle that (using flag PREG_OFFSET ; transforming the files adding line numbers at the beginning of lines...), but none seemed sympathetic.
Writing the parsingFor the limited needs of a code documentor (no need to really parse the code, just need to retrieve the elements and their characteristics), the code was finally simple to write, using the token array to identify the code elements, and regular expressions to parse the declarations ; no need to have compiler notions for that.The job is done by 3 classes which are part of phpsimpledoc's code : ldeMainLoader, ppTokenArray and ppParsePhpCode.
ppTokenArray :
Array(
['string'] => 'abstract class Class1 extends Class2'
['comment'] => 'a comment'
['commentLine'] => 2
['lastIndex'] => 12
)
Note : token_get_all() has a small bug (in PHP 5.2.3) : if a comment doesn't start exactly by 2 asterisks (ex : /*** a comment */), it is seen as a T_COMMENT instead of a T_DOC_COMMENT. So this method considers as a comment both tokens. So if an element declaration has no doc comment, but is preceeded by a normal comment, the normal comment will be considered as a doc comment.
DownloadsIf you want to use this parsing in your own program, you can retrieve the code of the classes through subversion :svn co https://phpsimpledoc.svn.sourceforge.net/svnroot/phpsimpledoc/trunk
You can also browse the svn repository and download the code from there. Useful classes are :
Forum
Par osisus
- 8 août 2008
Great ! I’m building a hook script to compute statistics of PHPUnit test cases on a commit, and I needed a PHP code parser... But I faced a difficulty : it is not possible to download your file.. I get a 404 error. Could you correct your URL ?
Par osisus
- 8 août 2008
Finally, I found a bypass solution.
I download your phpSimpleDoc from SourceForge, and collect the four files you’ve mentionned from this archive...
Par tig12
- 23 août 2008
Yes, you’re right, the best thing to do is to retrieve the files from sourceforge. I modified the download part of the page.
|