Program Guesses Your Regular Expression

Program Guesses Your Regular Expression

We aren’t sure how we feel about [pemistahl’s] grex program. On the one hand, we applaud a program that can take some input samples and produce a regular expression. On the other hand, it might be just as hard to gather example data that produces the correct regular expression. Still, it is an interesting piece of code.


Even the author suggests not to use this as an excuse to not learn regular expressions, since you’ll need to check the program’s output. It is certain that the results will match your test cases, but it isn’t certain that it won’t accept things you didn’t expect. Bad regular expressions have been the source of some deeply buried bugs.

The code is written in Rust and builds an automaton for the test cases, making assumptions about the characters it sees belonging to certain classes. You can control the class algorithm to some degree using command line options. It is also possible to use the code as a library from another program.


Here are a few examples of grex at work:


$ grex a b c
^[a-c]$ $ grex a c d e f
^[ac-f]$ $ grex a b x de
^(?:de|[abx])$

We wondered if it would help if you could provide counterexamples, too. For instance, old fashioned US area codes could only have a 1 or 0 in the middle digit. So giving examples like 713 and 212 could benefit from counterexamples such as 173 or 777.


If you want to create your own regular expressions, it isn’t that hard. If you want to practice, crosswords are fun.

..

Support the originator by clicking the read the rest link below.