How reliable is AI lke ChatGPT in giving you code that you request?
ChatGPT is a language model, it's not intended for code and you're using it "off label" at your own risk. It can produce working code, which is impressive in itself, but in order to know if it's decent code you still need to be competent with that language. I had someone run a few prompts for me a while back, it ignored central parts of the query, and its output was basically like a very junior developer - fair enough, but not great or even that good.
Potentially useful, but if you expect it to be more than one part of the "process", you might be setting yourself up for trouble.
Edit: just like it's not a coder, it's not a search engine or knowledge base, either. It just knows language and what seems like it ought to follow a given phrase. Be very aware of this difference, because sometimes it spits out 100% falsehoods with the same level of confidence and authority as the true stuff.
I think it's important for people to also truly understand that generative machine learning models like ChatGPT also only "know" what they've seen before. There's no interpretation or synthesis. It merely regurgitates what it's seen, with some sampling from a probability distribution.
This means, if you're asking for something niche, and it's only seen what you're prompting it for once (or, really, the same text repeatedly across multiple websites) , there's a very good chance that it will just recreate that artifact wholesale.
Which means you need to be cognizant of what the license for that material is before you use it in a product!
I have built several programs with ChatGPT 4 by now. From very basic Python scripts to Python webscrapers and C# in combination with Unity3D.
In the beginning it was much better than it is currently. At the moment context is severely hampered no matter the limit and you'll be bashing your head against circular arguments and it straight up ignoring stuff you just posted two messages ago.
Trying to troubleshoot code it wrote a few days ago will be a slog and like dragging yourself over nails at times. Here's what I have found to help and make life better:
Be very, very, very precise in your instructions. And keep them saved, so you can reuse them later (point 4)
From the very start plan to build your project with small functions that interact (good policy anyway) which makes troubleshooting and changing these functions much easier and will prevent you running into message limits.
If it fails to work for you the way you need it, you'll might have to scrap your entire code and start over with ChatGPT -> Again, the reason for point 2 being very important. Scrapping one functin is much less painful than an entire tool.
Start new chats when you feel the quality degrading. Sometimes it helps and since the context is garbage at the moment anyway, it doesn't matter much.
Post the code it is supposed to fix every single time. It will inevitably refer to other code, code hallucinations, etc. otherwise. Again why point 2 is important.
god I hate those circular arguments, it's like you're arguing with a todler
I agree with the other comments that ChatGPT isn't really that good for programming, it hallucinates often and you end up working too hard just to try and figure out what it got wrong. However, I have found a good AI engine, phind.com, that has started to replace my google searches. It's just a wrapper for ChatGPT, but it cites its sources so you can verify or dig deeper, provides search engine results in a sidebar and has upvote/downvote options to help it improve. So it feels like a personal google "agent" that runs off and googles something for you and comes back with a concise report.
Personally I just can't work with system that lies to me (even for a little) but all the time.
I tried to use chatGpt and Bing bot and phind.com few times and everytime I got answers that looks like real and looks like correct answer but slightly (and few times completely) wrong.
Everytime I have to reread documentation, check links, investigate is there a reason why LLM answered this way, maybe I wrong this time and LLM found something that I did not found...
I agree that phind.com get best results, but every small incorrectness here and there irks me and makes me question myself and answer as whole.
Upd: in general questions, like when you trying to investigate some new field, technology, tooling suite LLM is very, very good. When you want to get something like overview of topic that you interested in.
I quite like GitHub Copilot and use it a lot, but I find ChatGPT not all that useful.
For actual coding, it feels like describing what I want it to do is more complicated than doing it myself.
I can see some uses as a search engine, but I've had a lot of bad luck where it suggested code that was plain wrong or not working and often did not even compile, so most of the time, I'd rather look on GitHub, Stackoverflow or sites like that.
I use chatgpt a lot when coding. It's pretty good and the code is typically usable. But sometimes it messes up hard and it can take a while to realize that. Net benefit though and I'm sure the technology will improve over time.
great information!
I've used ChatGPT to answer questions relating to Python. Notably, I asked it how to use QtNetwork to send and receive requests with authentication, as the application I was using did not use any non-standard modules I was more accustomed to like requests but did have PyQt. Not only did it gave me working code snippets but explained it in a way that I was able to understand. No, it's not perfect. But man it's better than hunting Google for that one StackOverflow post.
I have heard it trips up on certain less-used programming languages like Swift though, so depending on your use case YMMV. I've also not used Codex but a friend of mine has. Apparently it really liked to mention this one specific GitHub profile.
For shits and giggles I asked ChatGPT a while back to represent a Pokemon with a Python class, and it gave me working code. Google Bard would trip up and not use the class when I told it to.
I've used it a bit to get the framework or boilerplate I need. It's not a one-click solution!
What I will do is ask it to generate code for a purpose and then iterate over the suggested code adding and/or modifying specific areas until i have something usable... Depending on the complexity of the code/feature this can take quite a number of iterations. You need to understand the code it gives you!
Once I feel satisfied I will copy the code to the editor for tweaking and adjusting to my environment.
I will admit that I've been surprised sometimes by the suggestions I've been given. Sometimes in a good way, sometimes bad.
Remember that whatever you feed it will feed the LLM, so don't give it anything specific to yourself or your organisation.
It seems like a small thing you said on the side, but it is really important that you actually understand and can explain the code it gives you that you're copying into your project. Otherwise you're taking in an unknown, unmaintained and unexplained dependency, and that can lead to problems once that dependency fails.
Agreed that I should have been clearer...
Both my statements of it not being a "one-click solution" and the aside about understanding the code was meant to emphasize that very point.
It is a conversational tool that can generate decent code if properly prompted but it lacks for the most part enough context. For it to be really useful it has to be able to be trained on my entire project that I'm working with, not just a single file or function.
What I miss is the ability to "chat with my project". I.e. have the whole project in the trained context, and then reason about architectural changes, pros and cons, have suggestions for refactoring, help with complex renaming schemes and moving code, etc.
It would be super interesting to be able to give instructions like:
Organise my files by dependency and the logic they implement.
Or something like, create web components from common input elements in my html pages.
Where is the user auth code implemented?
Things like that.
This would really be extremely helpful, absolutely agree. A mode with less of a view on the details of the code and more with the architecture of it. I wonder if an extension like Codeium could be extended so that only the method signatures and comments and such could be sent in as context so it can reason more generally about your project...
Its good for basic Python scripts and stuff, but not so good for complex programs.
Like any other tool, it's as good as you use it. If you can explain complex problems in smaller bites with clear objectives it helps a lot.
Yes, but no.
You can be as precise as you want, if ChatGPT didn't have enough training material you won't get good answer even if you bend over backwards.
I can't complain 95% of time, however prompt is not to blame all the time.
ChatGPT 4 is pretty decent at checking code for any mistakes, and it can generate pretty good code if you can describe what you want very well. But sometimes it does give you code with a slight mistake or two, so what I normally do is give it the code in a new chat and get it to check it itself
so what I normally do is give it the code in a new chat and get it to check it itself
Big brain moment
While it's pretty decent at coding, it's often (in my experience) either giving you an overcomplicated way, an outdated way or just completely wrong code.
What I do really like is that when you ask for a snippet you can easely ask for variations, like: "make it a bit shorter", "add comments" or "don't/do use library X". This allows you to quickly get a few variations that allows you to come up with something of your own.
On a slightly other note, I've been impressed by github Copilot on this subject. While still often wrong, with small things it almost feels like it's reading my mind while programming.
It's certainly helpful when you want to try out something new.
For example I recently wanted to make a Firefox AddOn, which is something I hadn't done before.
So I asked OpenAI how to do it and it talked me through it step by step.
Basically it allowed me to google less, because I could just ask ChatGPT, as it was faster.
Some info is outdated or wrong, so you still have to know what you are doing and still have to use Google.
Also, I wanted it to help me get some data from the DOM, but that was a rather difficult job for OpenAI, it never "understood" what I wanted and just gave me code that didn't do what it was supposed to do, and even after explaining the problem with the code and telling it what I want, it wouldn't understand the problem and just give me other bad code that changed nothing about the problem.
So it's important to understand, that this is just one more source of information/help you have as a dev. It is not a standalone solution that can do your work for you. It merely can help you. The same way as googling, stackoverflow or reading the documentation helps you
Used chat gpt 3.x to assist in Godot and gdscript logic for our game... Nearly always wrong but often gets me thinking in a different direction than the one I was stuck in. So certainly value there.
I use it occasionally to write specific functions in python, for example, a compound interest calculator. I don't trust it to write a full app yet.
I am not good with coding, I know the basics, and I use copilot or chatgpt to generate simple scripts to do guis or automate things, and usually works first or second try
For basic things like syntax that I can't remember it's actually pretty good, way more faster than google IMO.
When I ask for something a little more complicated it can go two ways:
Actually doing a good job and generating something that I can use (I often have to polish that code, but still, it's better than expected)
Doing whatever that I didn't ask for, so when I point out that's not what I meant and explain it further, it enters a bucle where it goes back to the two same solutions.
"You didn't like solution A? Here, there's solution B. Solution B is terrible? Here, there's A again."
Overall it's a great tool though.
Not at all. It often gives bad answers, or workarounds rather than working code. It's not useful to me if I have to fix its code, when I can do it more efficiently and quicker by my own.
With some hand-holding, it's quite decent at reading and documenting functions, which is what I use it for since I'm too lazy to document them myself. :P
It's been pretty hit or miss for creating new code from a prompt, but it's been really good-- in my experience-- if I give it some code I know is sloppy and ask it to refactor it, or if I want to slightly change the functionality of some given code.
Software engineer with decades of experience here - ChatGPT can give you mostly-working code for solved problems, but with occasionally subtle and weird bugs. It's very confident and will happily hallucinate. It will not help you with debugging or integrating, which is the majority of coding. It's a pattern matching engine, nothing more.
Outside producing one simple WebPack configuration, I haven't had good experiences using ChatGPT. It often causes me more trouble than it helps. I've tried to use it multiple times to write some BASH script, and every time it gives me know that looks nice but is just broken. It's not syntactically incorrect, it's more like functionally incorrect.
For example, it told me that you could pass arrays as function arguments, which you can't do. Or, it gave me a script that was using variables within a URL string that would be passed into CURL, which won't work since the URL won't be encoded properly.
When I do it, I spend more time trying to fix the code that it gives me. Which, I guess, does have the benefit that it means I got to learn something afterward (both examples above I didn't know about until ChatGPT gave me the bad code).
The thing that ensured me that AI won't take over the programming side of software engineering was when I asked ChatGPT to help me out with some date-time bugs. It just kept making up native JavaScript API functions, couldn't understand how to parse UTC to figure out a date-time's timezone, among other issues. The day that AI is able to solve software issues around date-times or currencies is the day that we'll all be out of a job.
Edit:
I guess you could summarize using ChatGPT is like peer-programming with an overly confident CS grad.
peer-programming with an overly confident CS grad
I love this, and agree. I've always said that for all tasks, it's like you're working with an ADHD eager-to-please intern.
It's ok for basic stuff. Like giving me "Hello world"/toy app kind of type code for something I'm unfamiliar with and want an example. Or creating the basic template to build on top of. Which can be very helpful and timesaving.
But for anything more than that it fails a lot, and needs to be walked through step by step reiterating its code.
I haven't used it, however from what I've heard it's more of a less toxic Stackoverflow that can hallucinate things, rather than some magic that writes you code based on some words. It still suffers from the usual AI weakness of the AI itself not understanding context, only knowing it.
I find ChatGPT to be less useful for code and more useful for generating boilerplate more in the 'configuration' realm. Ansible playbooks or tasks, Nginx configs, Dockerfiles, docker-compose files, etc. Well-bounded things with an abundance of clear documentation.
I generate a lot of first-draft Dockerfiles and docker-compose files through ChatGPT now with a short description of what I want. It's always worth reviewing it because sometimes it just invents things that look like a Dockerfile, but it can save a lot of the boring boilerplate writing of volumes and networks and depends_ons and obvious env vars you need to override.
I do use Codeium in my VS Code instance, though. It's like a free more ethical Github Copilot, and I've been really really happy with it. Not so much to make a whole program, but I use it a lot more as a kind of super-autocomplete.
I'll go in to a class and go to a method that needs a change and I'll just type a comment like the following and it will basically spit out the authentication logicc that I do a quick review on.
// check the request authentication header against the user service to verify we're allowed to do this
It's also an amazing "static" debugger - I can highlight particularly convoluted segments of math or recursion or iteration and ask it to explain it. Then I can ask follow-up questions like "Is there any scenario in which totalFound remains at 0" and it will tell me yes or no and why it thinks that, which is really nice. I tend to save it for instances where I'm reasonably certain that it was all correct, but I wanted to check it instead. Now instead of breaking out the paper and pen and reasoning it out, I can ask it for a second opinion, and if it has no doubts, my paranoid mind is put at ease a bit.
I've been unimpressed with the ability of any of these "AI" systems to spit out larger volumes of good code. They're more like ADHD, eager-to-please little interns. They'll spit out the first answer that comes to their mind even if it's wrong, and they fall for all kinds of well-known development pitfalls.
I tried using it for research on what I would consider a novel problem (trying to map SELinux categories onto per file transparent encryption), but it would hallucinate quite a bit.
For script writing (i.e. make a python script that takes in some data and output it in a different format) it was pretty good, at a little back and forth, but usable code with good practices built into it for the most part.
I've used it to develop the framework of my code or to do some boilerplate stuff in some instances but I personally wouldn't want to use it for much else as without understanding what your code does, it could take much longer to fix or expand upon.
I wanna ask it to write me a better AI and bring on the Singularity!
The biggest issue here is that people aren't differentiating between models. gpt-4 is probably 20-30 higher IQ than gpt-3.5-turbo. Also your question could be interpreted to include LLMs in general. Most LLMs are absolutely horrible at programming. OpenAI's actually can do it given some limited specific task. Again, gpt-4 is much better at programming.
Also OpenAI just released new models. They now have one with 16k token context which is four times larger than before. So it can understand more instructions or read more code.
For something specific like writing basic SQL queries or even embedded Chart.js charts to fulfill a user request for a simple report on a table, gpt-4 can be very effective, and gpt-3.5 can often do the job. The trick is that sometimes you have to be very insistent about certain gaps or outdated information in it's knowledge or what you want to do. And you always need to make sure you also feed it the necessary context.
For something a bit complex but still relatively limited in scope, gpt-4 can often handle it when gpt-3.5 screws it up.
What those models are good at doing now especially with the version just released, is translating natural language requests into something like API calls. If there is not a lot of other stuff to figure out, it can be extremely useful for that. You can get more involved programs by combining multiple focused requests but it's quite hard to do that in a fully automated way today. But the new function calling should help a lot.
The thing is, wait 3-6 months and this could be totally out of date if someone releases a more powerful model or some of these "AGI" systems built on top of GPT get more effective.
This has some nice examples of how well large language models do with some fairly basic programming requests
ChatGPT is a language model, it's not intended for code and you're using it "off label" at your own risk. It can produce working code, which is impressive in itself, but in order to know if it's decent code you still need to be competent with that language. I had someone run a few prompts for me a while back, it ignored central parts of the query, and its output was basically like a very junior developer - fair enough, but not great or even that good.
Potentially useful, but if you expect it to be more than one part of the "process", you might be setting yourself up for trouble.
Edit: just like it's not a coder, it's not a search engine or knowledge base, either. It just knows language and what seems like it ought to follow a given phrase. Be very aware of this difference, because sometimes it spits out 100% falsehoods with the same level of confidence and authority as the true stuff.
I think it's important for people to also truly understand that generative machine learning models like ChatGPT also only "know" what they've seen before. There's no interpretation or synthesis. It merely regurgitates what it's seen, with some sampling from a probability distribution.
This means, if you're asking for something niche, and it's only seen what you're prompting it for once (or, really, the same text repeatedly across multiple websites) , there's a very good chance that it will just recreate that artifact wholesale.
Which means you need to be cognizant of what the license for that material is before you use it in a product!
I have built several programs with ChatGPT 4 by now. From very basic Python scripts to Python webscrapers and C# in combination with Unity3D.
In the beginning it was much better than it is currently. At the moment context is severely hampered no matter the limit and you'll be bashing your head against circular arguments and it straight up ignoring stuff you just posted two messages ago.
Trying to troubleshoot code it wrote a few days ago will be a slog and like dragging yourself over nails at times. Here's what I have found to help and make life better:
god I hate those circular arguments, it's like you're arguing with a todler
I agree with the other comments that ChatGPT isn't really that good for programming, it hallucinates often and you end up working too hard just to try and figure out what it got wrong. However, I have found a good AI engine, phind.com, that has started to replace my google searches. It's just a wrapper for ChatGPT, but it cites its sources so you can verify or dig deeper, provides search engine results in a sidebar and has upvote/downvote options to help it improve. So it feels like a personal google "agent" that runs off and googles something for you and comes back with a concise report.
Personally I just can't work with system that lies to me (even for a little) but all the time.
I tried to use chatGpt and Bing bot and phind.com few times and everytime I got answers that looks like real and looks like correct answer but slightly (and few times completely) wrong.
Everytime I have to reread documentation, check links, investigate is there a reason why LLM answered this way, maybe I wrong this time and LLM found something that I did not found...
I agree that phind.com get best results, but every small incorrectness here and there irks me and makes me question myself and answer as whole.
Upd: in general questions, like when you trying to investigate some new field, technology, tooling suite LLM is very, very good. When you want to get something like overview of topic that you interested in.
I quite like GitHub Copilot and use it a lot, but I find ChatGPT not all that useful.
For actual coding, it feels like describing what I want it to do is more complicated than doing it myself.
I can see some uses as a search engine, but I've had a lot of bad luck where it suggested code that was plain wrong or not working and often did not even compile, so most of the time, I'd rather look on GitHub, Stackoverflow or sites like that.
I use chatgpt a lot when coding. It's pretty good and the code is typically usable. But sometimes it messes up hard and it can take a while to realize that. Net benefit though and I'm sure the technology will improve over time.
great information!
I've used ChatGPT to answer questions relating to Python. Notably, I asked it how to use QtNetwork to send and receive requests with authentication, as the application I was using did not use any non-standard modules I was more accustomed to like
requests
but did have PyQt. Not only did it gave me working code snippets but explained it in a way that I was able to understand. No, it's not perfect. But man it's better than hunting Google for that one StackOverflow post.I have heard it trips up on certain less-used programming languages like Swift though, so depending on your use case YMMV. I've also not used Codex but a friend of mine has. Apparently it really liked to mention this one specific GitHub profile.
For shits and giggles I asked ChatGPT a while back to represent a Pokemon with a Python class, and it gave me working code. Google Bard would trip up and not use the class when I told it to.
I've used it a bit to get the framework or boilerplate I need. It's not a one-click solution!
What I will do is ask it to generate code for a purpose and then iterate over the suggested code adding and/or modifying specific areas until i have something usable... Depending on the complexity of the code/feature this can take quite a number of iterations. You need to understand the code it gives you!
Once I feel satisfied I will copy the code to the editor for tweaking and adjusting to my environment.
I will admit that I've been surprised sometimes by the suggestions I've been given. Sometimes in a good way, sometimes bad.
Remember that whatever you feed it will feed the LLM, so don't give it anything specific to yourself or your organisation.
It seems like a small thing you said on the side, but it is really important that you actually understand and can explain the code it gives you that you're copying into your project. Otherwise you're taking in an unknown, unmaintained and unexplained dependency, and that can lead to problems once that dependency fails.
Agreed that I should have been clearer...
Both my statements of it not being a "one-click solution" and the aside about understanding the code was meant to emphasize that very point.
It is a conversational tool that can generate decent code if properly prompted but it lacks for the most part enough context. For it to be really useful it has to be able to be trained on my entire project that I'm working with, not just a single file or function.
What I miss is the ability to "chat with my project". I.e. have the whole project in the trained context, and then reason about architectural changes, pros and cons, have suggestions for refactoring, help with complex renaming schemes and moving code, etc.
It would be super interesting to be able to give instructions like:
Things like that.
This would really be extremely helpful, absolutely agree. A mode with less of a view on the details of the code and more with the architecture of it. I wonder if an extension like Codeium could be extended so that only the method signatures and comments and such could be sent in as context so it can reason more generally about your project...
Its good for basic Python scripts and stuff, but not so good for complex programs.
Like any other tool, it's as good as you use it. If you can explain complex problems in smaller bites with clear objectives it helps a lot.
Yes, but no.
You can be as precise as you want, if ChatGPT didn't have enough training material you won't get good answer even if you bend over backwards.
I can't complain 95% of time, however prompt is not to blame all the time.
ChatGPT 4 is pretty decent at checking code for any mistakes, and it can generate pretty good code if you can describe what you want very well. But sometimes it does give you code with a slight mistake or two, so what I normally do is give it the code in a new chat and get it to check it itself
Big brain moment
While it's pretty decent at coding, it's often (in my experience) either giving you an overcomplicated way, an outdated way or just completely wrong code.
What I do really like is that when you ask for a snippet you can easely ask for variations, like: "make it a bit shorter", "add comments" or "don't/do use library X". This allows you to quickly get a few variations that allows you to come up with something of your own.
On a slightly other note, I've been impressed by github Copilot on this subject. While still often wrong, with small things it almost feels like it's reading my mind while programming.
It's certainly helpful when you want to try out something new.
For example I recently wanted to make a Firefox AddOn, which is something I hadn't done before.
So I asked OpenAI how to do it and it talked me through it step by step.
Basically it allowed me to google less, because I could just ask ChatGPT, as it was faster.
Some info is outdated or wrong, so you still have to know what you are doing and still have to use Google.
Also, I wanted it to help me get some data from the DOM, but that was a rather difficult job for OpenAI, it never "understood" what I wanted and just gave me code that didn't do what it was supposed to do, and even after explaining the problem with the code and telling it what I want, it wouldn't understand the problem and just give me other bad code that changed nothing about the problem.
So it's important to understand, that this is just one more source of information/help you have as a dev. It is not a standalone solution that can do your work for you. It merely can help you. The same way as googling, stackoverflow or reading the documentation helps you
Used chat gpt 3.x to assist in Godot and gdscript logic for our game... Nearly always wrong but often gets me thinking in a different direction than the one I was stuck in. So certainly value there.
I use it occasionally to write specific functions in python, for example, a compound interest calculator. I don't trust it to write a full app yet.
I am not good with coding, I know the basics, and I use copilot or chatgpt to generate simple scripts to do guis or automate things, and usually works first or second try
For basic things like syntax that I can't remember it's actually pretty good, way more faster than google IMO.
When I ask for something a little more complicated it can go two ways:
Actually doing a good job and generating something that I can use (I often have to polish that code, but still, it's better than expected)
Doing whatever that I didn't ask for, so when I point out that's not what I meant and explain it further, it enters a bucle where it goes back to the two same solutions.
"You didn't like solution A? Here, there's solution B. Solution B is terrible? Here, there's A again."
Overall it's a great tool though.
Not at all. It often gives bad answers, or workarounds rather than working code. It's not useful to me if I have to fix its code, when I can do it more efficiently and quicker by my own.
With some hand-holding, it's quite decent at reading and documenting functions, which is what I use it for since I'm too lazy to document them myself. :P
It's been pretty hit or miss for creating new code from a prompt, but it's been really good-- in my experience-- if I give it some code I know is sloppy and ask it to refactor it, or if I want to slightly change the functionality of some given code.
Software engineer with decades of experience here - ChatGPT can give you mostly-working code for solved problems, but with occasionally subtle and weird bugs. It's very confident and will happily hallucinate. It will not help you with debugging or integrating, which is the majority of coding. It's a pattern matching engine, nothing more.
Outside producing one simple WebPack configuration, I haven't had good experiences using ChatGPT. It often causes me more trouble than it helps. I've tried to use it multiple times to write some BASH script, and every time it gives me know that looks nice but is just broken. It's not syntactically incorrect, it's more like functionally incorrect.
For example, it told me that you could pass arrays as function arguments, which you can't do. Or, it gave me a script that was using variables within a URL string that would be passed into CURL, which won't work since the URL won't be encoded properly.
When I do it, I spend more time trying to fix the code that it gives me. Which, I guess, does have the benefit that it means I got to learn something afterward (both examples above I didn't know about until ChatGPT gave me the bad code).
The thing that ensured me that AI won't take over the programming side of software engineering was when I asked ChatGPT to help me out with some date-time bugs. It just kept making up native JavaScript API functions, couldn't understand how to parse UTC to figure out a date-time's timezone, among other issues. The day that AI is able to solve software issues around date-times or currencies is the day that we'll all be out of a job.
Edit:
I guess you could summarize using ChatGPT is like peer-programming with an overly confident CS grad.
It's ok for basic stuff. Like giving me "Hello world"/toy app kind of type code for something I'm unfamiliar with and want an example. Or creating the basic template to build on top of. Which can be very helpful and timesaving.
But for anything more than that it fails a lot, and needs to be walked through step by step reiterating its code.
I haven't used it, however from what I've heard it's more of a less toxic Stackoverflow that can hallucinate things, rather than some magic that writes you code based on some words. It still suffers from the usual AI weakness of the AI itself not understanding context, only knowing it.
I find ChatGPT to be less useful for code and more useful for generating boilerplate more in the 'configuration' realm. Ansible playbooks or tasks, Nginx configs, Dockerfiles, docker-compose files, etc. Well-bounded things with an abundance of clear documentation.
I generate a lot of first-draft Dockerfiles and docker-compose files through ChatGPT now with a short description of what I want. It's always worth reviewing it because sometimes it just invents things that look like a Dockerfile, but it can save a lot of the boring boilerplate writing of volumes and networks and depends_ons and obvious env vars you need to override.
I do use Codeium in my VS Code instance, though. It's like a free more ethical Github Copilot, and I've been really really happy with it. Not so much to make a whole program, but I use it a lot more as a kind of super-autocomplete.
I'll go in to a class and go to a method that needs a change and I'll just type a comment like the following and it will basically spit out the authentication logicc that I do a quick review on.
It's also an amazing "static" debugger - I can highlight particularly convoluted segments of math or recursion or iteration and ask it to explain it. Then I can ask follow-up questions like "Is there any scenario in which
totalFound
remains at 0" and it will tell me yes or no and why it thinks that, which is really nice. I tend to save it for instances where I'm reasonably certain that it was all correct, but I wanted to check it instead. Now instead of breaking out the paper and pen and reasoning it out, I can ask it for a second opinion, and if it has no doubts, my paranoid mind is put at ease a bit.I've been unimpressed with the ability of any of these "AI" systems to spit out larger volumes of good code. They're more like ADHD, eager-to-please little interns. They'll spit out the first answer that comes to their mind even if it's wrong, and they fall for all kinds of well-known development pitfalls.
I tried using it for research on what I would consider a novel problem (trying to map SELinux categories onto per file transparent encryption), but it would hallucinate quite a bit.
For script writing (i.e. make a python script that takes in some data and output it in a different format) it was pretty good, at a little back and forth, but usable code with good practices built into it for the most part.
I've used it to develop the framework of my code or to do some boilerplate stuff in some instances but I personally wouldn't want to use it for much else as without understanding what your code does, it could take much longer to fix or expand upon.
I wanna ask it to write me a better AI and bring on the Singularity!
The biggest issue here is that people aren't differentiating between models. gpt-4 is probably 20-30 higher IQ than gpt-3.5-turbo. Also your question could be interpreted to include LLMs in general. Most LLMs are absolutely horrible at programming. OpenAI's actually can do it given some limited specific task. Again, gpt-4 is much better at programming.
Also OpenAI just released new models. They now have one with 16k token context which is four times larger than before. So it can understand more instructions or read more code.
For something specific like writing basic SQL queries or even embedded Chart.js charts to fulfill a user request for a simple report on a table, gpt-4 can be very effective, and gpt-3.5 can often do the job. The trick is that sometimes you have to be very insistent about certain gaps or outdated information in it's knowledge or what you want to do. And you always need to make sure you also feed it the necessary context.
For something a bit complex but still relatively limited in scope, gpt-4 can often handle it when gpt-3.5 screws it up.
What those models are good at doing now especially with the version just released, is translating natural language requests into something like API calls. If there is not a lot of other stuff to figure out, it can be extremely useful for that. You can get more involved programs by combining multiple focused requests but it's quite hard to do that in a fully automated way today. But the new function calling should help a lot.
The thing is, wait 3-6 months and this could be totally out of date if someone releases a more powerful model or some of these "AGI" systems built on top of GPT get more effective.
This has some nice examples of how well large language models do with some fairly basic programming requests
https://youtu.be/m5rsybr6ZIY