Could we make C arrays memory safe? Probably not, but let's try anyway

petsoi@discuss.tchncs.de to Linux@lemmy.ml – 55 points –
nibblestew.blogspot.com
16

You are viewing a single comment

I am always baffled that C didn't ever get a native string type. Strings are used in what feels like 99.99999% of the applications written. Having proper strings that don't require fiddling with pointers on bytes would likely prevent more than 50% of security issues out there.

Without system/external libraries C is more like easier to read assembly, without much on top of it. There are no strings as we understand them in assembly, only pointers to sequential lump of RAM where NULL character means end of string. That's why C is so great as language for libraries at the level where strings are only for debugging and a waste of computing time anyway.
But for some reason often instead of writing a library in C and then linking to it in some high level language to handle the operations where strings are common, people try to use the hammer for everything and end up with overflowing buffers or trying to make exceptions in the kernel for D-Bus

I know that C is meant only as a little step forward from assembler. But it was also fine to introduce arrays (which are also not a thing in asm where you only operate on pointers). So why not also add a datatype for THE ONE THING that is used almost everywhere?

There are thousands of useful things one could argue about if they would make sense in the language or not (and in the case of C I would be totally fine with saying "no" to all of them). But strings IMO are not just some fringe feature that is used here and there. They are mission critical across the board and far too important to be left for libraries.

arrays (which are also not a thing in asm where you only operate on pointers)

I'm afraid that's wrong. Arrays are definitely an asm thing. An array is just a pointer to the first object of consecutively stored objects. You add n*size_of_stored_type to the pointer and you get the nth object

They are mission critical

Do you have an example? I know that many products abandon having control over what is executed because that's cheaper money/developer-time wise and leverage the power of CPU. So instead of securely comparing a string once and then using enum(int) in further code, use string comparison all the time. But that's a design problem, not technical one

Basically every program that deals with some form of user input will come across strings. Be it to print something to the screen, write something to a file, read something from a file, read something from the user interface (even if it's stdin). Even most non-user-facing tools (daemons, drivers, etc) have to deal with strings often enough, even if "just" for something like writing log or debug entries.

For me it's hard to come up with any application where I don't need strings sooner or later. Typically sooner than later.

But this is high level. You shouldn't rely on strings or user input down in the mission critical part of the program

Do you separate that? I mean if the idea is to use C only outside of user interaction, then maybe. But is this a realistic scenario? If I write my whole application/library in C, user interaction is part of the application nonetheless. Maybe not what you consider "mission critical" from a program-reliability standpoint. But still mission critical from a user-experience standpoint. Because the whole application is worthless, if it cannot be used.

If I write my whole application/library in C, user interaction is part of the application nonetheless

That's my point. Human facing interface needs a lot of code that does not really do much, only needs to be there to cover all the edge cases of mixed parameters, cancel buttons, trying to click "next" without filling important textbox... And writing all this in C (I mean the actual user-end program interface, not the general GUI library, like GTK for example) only makes it worse to debug and maintain. You most often don't get any gain from manual memory management. If an operation is taking too long maybe it's time to put it inside the backend library. But if you're optimizing that operation you've already moved away from comparing strings inside - it's the first one to go when a loop takes too long. And once we are speaking about more than one program that we want to have consistent behavior across that might need to change in the future - C is only slowing you down.
Do you really need to reference the "Cancel" button via pointer when checking if the user should be allowed to go back?

Write a general backend library for your important stuff and optimizations in C, so you can easily load it in other languages. And then use something higher level for the interpreter/GUI where sanitizing user input is 5 lines of different libraries from the language (I mean like re or zip in Python - these are not external, these are Python's STL), instead of 50 lines of juggling pointers, which in C you would be doing even if the input was all ints.

You don't care about stack height and jumping to previous frame after being in a procedure (assembler level of looking at the code) - that's what C does for you
So why care about the pointers and structs when resizing a GUI? - Let some higher level language manage that for you

You can just write your own implementation it's not too complex or just use a string library.

I can also write my own programming language. That wasn't the point.

This is part of the problem. Instead of solid primitives you have to implement them yourself or pull in a library, both of which you have to hope are compatible with other libraries (or you have to convert manually all the time).

How many people who write their own string implementation do you think do so perfectly? I'd guess at most 50%. This means that basic operations in a good number of apps will have unknown bugs. Fixing bugs in application logic is one thing, but having to debug low-level type implementations is not something the average developer should do.