LPT: ChatGPT is incredible for generating and evaluating regex

BenLloydPearson@programming.dev to Programming@programming.dev – 242 points –

I have to use a ton of regex in my new job (plz save me), and I use ChatGPT for all of it. My job would be 10x harder if it wasn't for ChatGPT. It provides extremely detailed examples and warns you of situations where the regex may not perform as expected. Seriously, try it out.

51

Just make sure to test the regex instead of blindly slapping it in assuming it works 🙂

What if I say "it's probably okay just this one time" before I do it every time?

Ah I've tested this method, shit breaks a lot. Still my go to.

Can we just have another LLM check the work for us? Like an LLM-GAN?

I'm not sure if it's still the case, but asking it to review what it just wrote for errors has led to significant quality improvements previously.

The new Code Interpreter plugin that went live for this week for Plus users can actually execute Python code on a sandboxed environment. This allows you to add "Write and execute tests for the regex" to the end of your prompt.

Regex101 is a sandbox env specifically for Regex

Not just for writing, and testing samples. It will also explain the parts of the regex.

However it won't generate examples that will pass the regex - which may be the biggest benefit of chatGPT.

This is where I go to validate the work of ChatGPT. The debugging capabilities in that site are wonderful.

I’ve tried it and found it wanting at regex and excel formulas, but I’m glad to hear it’s working for you! Are you using 4? I haven’t tried that one and I hear it’s better.

I typically try 3.5 first and switch to 4 if the results aren't great. 3.5 typically handles basic use cases quite well, for example, writing regex that detects jira ticket naming nomenclature. For more complex things, I go to 4.

It sometimes gets things wrong, but I've also found that just saying "that didn't work" gets it to reevaluate for more complex situations

So I was trying to write a regex for use with my ChatGPT discord bot. I wanted to trim off any final partial sentence at the end. I went around and around with it for a couple of hours because look ahead and look behind are just not something I do often enough.

It kept writing more and more complicated regex that didn’t work. The final solution, while not exactly perfect - it won’t keep a quote at the end of a sentence, and honorifics like Mr. and Dr. throw it - it wasn’t nearly as complicated as ChatGPT was making it. It still never did give me anything working - I just fucked around on regex101 until I got it right. As usual but having wasted 90 minutes or so.

I've found that you need to be very careful when asking it to modify things it produced directly without making significant changes to the regex it provides. Once I get to the 3rd or 4th iteration of asking it to modify previous responses I've found the likelihood that it starts hallucinating to increase dramatically. The best solution I've found to this is to put your entire request in a single prompt that walks it through all requirements step-by-step.

Also curious. If I had some AI help with regex that would be awesome. But I felt as you said it wouldn’t work great without 4. Which I don’t have.

If you think regex is the hard part of programming, then you're in for a bad time.

I often need to deal with half a dozen different programming languages in any day/week and the context switching can be difficult at times. When you've spent all day switching between JavaScript, Python, and YAML and suddenly need to draft some Regex, tools like ChatGPT can help immensely at reducing the mental burden of switching gears.

The syntax of regular regexes is the same across languages though. It's just the regex library which is different, but so is every other library between languages.

If the project is less than a thousand lines of code in a language with a garbage collector, it probably is. Most other problems don't require learning a DSL to handle them, and most other DSL's aren't nearly as terse.

Thanks for this post, I use regex a often and did not know gpt would be good at this..

That's the problem. It will confidently give you an correct sounding answer.

If it is actually true is a different topic. So don't just blindly trust it. Verify, or at least sanity check it.

This this this!!! I know this is a post from the place that shall not be named, but it just showcases the issues with ChatGPT (this is from when GPT4 was just released)

My biggest problem with it has been that it doesn't necessarily understand that some things are impossible - for example, variable-length lookbehinds.

That depends on the regex flavor. Some of them have full support for variable length lookbehinds, for example JavaScript and third-party regex module for Python.

A variable length lookbehind is the same as the opposite of a variable length lookahead.

Wait, you guys don’t use AI to make regex?

I use regex101.com

Up to now that usually was faster than trying to get chatGPT to generate something worthwhile. However, if you define some test cases first, the combination of both will even get the sales guy there eventually.

Ugh god it’s been a shit day with sales, let’s not bring them up. The turds.

I have yet to see a regex that is so complicated that I would need some help. I expect programmers to know how to use regexes but it seems that it's not the case. And when it becomes too big, you always can write verbose regexes with comments, it's even easier. If someone could show me something too difficult for a human being (excluding the regex to validate emails), I'm interested.

Regex isn't difficult, just annoying to ensure it is bug-free. If ChatGPT can help, then I don't know why you wouldn't be in favor of it

It's not that I'm incapable of evaluating regex, but rather the mental burden of evaluating complex regex statements and determining their purpose can be time-consuming. Why take 20 minutes to understand some regex when ChatGPT can do it in 20 seconds?

A coworker once defined regex as a write-only language and he definitely had a point. I love regex but it can be time consuming figuring out exactly what a complex regex expression is doing.

It's often developers who never took a finite automata class who I've seen struggle with regular expressions.

It's kind of like writing code in C while not understanding how memory management works

Huh. That class looked hard as hell, I didn't take it, and now I'm 2 years out of school still googling regex every time I need it.

Maybe I should do some reading 😅

It was mandatory. I'm glad I took it, but I'm glad it's over 😂😂😂

Just look up how finite automatas work. You don't need to understand turing machines or turing completeness

You can also ask it to do write VBA code for Excel, or Jira queries.

Still a bit new to Jira, what are Jira queries?

Typically called JQL, it's a simple query language to find info. For example, there's a simple query to find epics with a particular affects version and/or fix version, or return epics that are missing information in a particular field.

The default or basic Jira can't do some things though. Like I haven't been able to get the total number of story points from issues within an epic. I think you need a 3rd party plugin for that.

That's nice to know, hopefully I can bring that up during our sprint planning sessions when necessary.

I tried it and naaah it's not that great. Keeps giving a rule for sample text too, despite really making it clear that I want a more general one.