Linking parts of the codebase such that changing one forces reviewing the other ?

Programming@programming.dev – 35 points – 4 months ago

Suppose we have a large to-do task manager app with many features. Say we have an entity, which is the task, and it has certain fields like: title, description, deadline, sub-tasks, dependencies, etc. This entity is used in many parts of our codebase.

Suppose we decided to modify this entity, either by modifying, removing, or adding a field. We may have to change most if not all of the code that deals with this entity. How can we do this in a way that protects us from errors and makes maintenance easy?

Bear in mind, this is just an example. The entity may be something more low-key, such as a logged user event in analytics, or a backend API endpoint being used in the frontend, etc.

Potential Solutions

Searching

One way people do this already is by just searching the entity across the codebase. This is not scalable, and not always accurate. You may get a lot of false positives, and some parts of the code may use the entity without using it by name directly.

Importing

Defining the entity in one central place, and importing it everywhere it is used. This will create an error if a deleted field remains in use, but it will not help us when, say, adding a new field and making sure it is used properly everywhere the entity is being used

so what can be done to solve this? plus points if the approach is compatible with Functional Programming

Automated Tests and CICD

Tests can discover these types of issues with high accuracy and precision. The downside is... Well tests have to be written. This requires developers to be proactive, and writing and maintaining tests is non-trivial and needs expensive developer time. It is also quite easy and common to write bad tests that give false positives.

Wouldn't static type checking solve most of these issues?

I think you are right. I did not consider this. Will try that next!

What language are you writing that you didn't even think of this?

Typescript, but that's not the issue. You probably have to leverage types in a specific way to get all the protections I am talking about. For example, I want it such that if a new field is added to a type, every user of the type must explicitly either use it or explicitly declare that it won't. From my experience with type systems, you typically aren't required to explicitly declare that you won't use a field in a dictionary / record type.

Ok, TIL there's a thing called Required, but otherwise, one way to do this is to rename the other part/field/key(s), so that old code reveals itself in much the same way as using a deleted field (because it does, actually)

Another way is explicitly have a separate type for records with/without the feature. (if one is a strict subset, you can have a downgrade/slice method on the more capable class.

Lastly, I would say that you need static typing, testing, both. People from static-land get vertigo without types, and it does give good night sleep, but it's no substitute for testing. Testing can be a substitute for static typing in combination with coverage requirements, but at that point you're doing so much more work that the static typing straight jacket seems pretty chill.

A simple but hackish solution is to version your types. New field? Foo becomes Foo2! Now nothing builds and you're sure you'll have to go over every usage of the type.

Add a second commit to revert to Foo, and there you go. Of course you'd need two reviews but the second one is trivial

every user of the type must explicitly either use it or explicitly declare that it won’t

What? How does someone declare that they won't use a type? What does that even mean?

Do you have an example use case that you're trying to solve? What additional type are you adding that would break existing users usage? If that's the case, maybe use an entirely different type, or change the class name or something

I gave an example use case in the main post, but I'll summarize it again here:

Suppose we have a to-do task manager. A task is an important entity that will be used in many parts of our codebase.

Suppose we add a new field to this task entity. For example, let's say we now added a priority field in our task that previously didn't exist, so users can define if a task is high priority.

The problem: this task entity is being used in many parts or our codebase. How do we make sure that every one of those parts that needs to use the new field does use it? How do we make sure we don't miss any?

I hope this makes sense. If it doesn't, feel free to ask any questions.

Have you considered the Required generic?

https://www.typescriptlang.org/docs/handbook/utility-types.html#requiredtype

Thanks for the tip! I think that is indeed what I need. Thank you :)

Oh are you talking about creating the object? Yeah I think you might get better answers in a TS thread, because that question and the response here makes no sense in most statically typed languages.

I am still confused about what OP is looking for. Even in typescript, if a new field is added and not used in other places, compilation will fail. Unless OP explicitly marks the field as optional.

There’s also the possibility that the codebase is littered with the “any” keyword (I’m not saying OP is doing it but I’ve definitely seen plenty of codebases that do this). If someone says they’re using typescript but their code is full of “any” keywords, they’re not using typescript. They’re just using plain JavaScript at that point.

if a new field is added and not used in other places, compilation will fail

That's if the field is added but never used. If it is used in some parts of the codebase, but not used / handled in others, compilation will pass. I would prefer that it doesn't. I would prefer that if such a change were to occur, that every part of the codebase that uses that entity is explicitly addressed by the change made.

Again, if there's anything you don't understand, feel free to ask me directly. I do not get notifications when you reply to a comment that isn't mine.

It still doesn’t make sense. Obviously your whole explanation hinges very heavily on what exactly you mean when you say “not used/handled” . Depending on your specific use case this could mean anything. As with any code related question, there’s only so much that people who haven’t seen your code can do to help. I think the easiest way to avoid this confusion is to just show some code so we are all on the same page about what the issue is.

On thinking about this a bit more, I feel like you may be expecting the system to handle situations where your business requirement needs the new field to be used now, but used to work without this field before. Based on the example you provided, I am imagining something like a getTasksForUser functionality which previously might have just been filtering on userId but if the business now says that this functionality should now return tasks sorted by priority, you expect the system to somehow know the business requirement and force the developer to use this new priority field ?

If that’s what you’re hoping for, the problem is harder to solve although not impossible. Assuming the example as above , you could maybe just inject the priority field at the data access layer . Another way would be to make the modified entity private and expose a facade with helper functions that are exposed. Now when code that previously used to rely on the entity inevitably breaks , you can replace those usages with usage specific functions exposed from the facade and since the entity is now accessible only from the facade, you can easily update all usages within the facade and make sure no one can miss passing the priority field since the entity is private to the facade and all functions in the facade are known to use the new field.

If there's anything that doesn't make sense in my question, feel free to ask any questions or clarifications on any part of it.

Yeah, in most statically-typed languages this is simply the default behavior unless you specifically declare a field as optional.

If you update your tests to reflect proper usage of the new field then you can catch potential errors.

Automates tests definitely work, but the downside is it requires the developer to be proactive, and the effort put in writing tests is non-trivial (and its easy and common for developers to write bad tests that give false positives).

Hmm I think you're looking for a technical solution to a non-technical problem.

Sometimes it's possible, I think

Depends on what you consider technical. I don't see this as much different than how type systems prevent type errors.

Take your example of adding a field to an entity. Just because you've made that code change doesn't mean other code should be using it. Who should be using it and how is determined by the business rules.

Also your interest in ensuring it is "properly" used is impossible to enforce. What's considered proper even for existing code can change over time.

doesn't mean other code should be using it.

Yes you're right. Sorry it wasn't clear from what I said before, but that's what I am saying too. The point is, if such a change is made, it should explicitly address every code that uses that entity who just added a new field. When I say "address", I mean that the user must at least be forced to "sign off" and explicitly saying a part of the code does not need to be changed due to this change. One possibility is explicitly declaring that a field is not used.

I hope this makes it clearer.

But no matter what you do, you're asking for something that will need to be manually done. Your tests should be done, and they should be reviewed. It will solve the problem you have and many more.

Just like type systems prevent you from type errors that you may otherwise write unit tests for, I don't see it unviable to have something that protects from the errors I mention.

In fact I think my solution might be in particular use of the type system, which I am experimenting with right now.

Having unit and automated integration tests backed by both requirements and high code coverage. As a lead I can verify that not only you made the change to support the requirements though these unit tests but also a really quick verification that other functionality may not have changed based on your large scale change. Helps a lot for significant refactoring too

Simple answer, unit tests.

I addressed this in some of my other replies, but I do not believe unit tests are a good solution here. It's way too common for developers to write tests that give false positives, and its very common for organizations to have low or insufficent coverage due to the higher cost associated with testing.

Tests are good to have as backup though.

An adequate test coverage should help you with these kinds of errors. Your tests should at least somehow fail if you make something incompatible. Also using the tools of your IDE will help you with refactoring.

Testing definitely works, but the downside is it requires the developer to be proactive, and the effort put in writing tests is non-trivial (and it's easy and common for developers to write bad tests that give false positives).

That's why test coverage exists and needs to be a mandated item.

I have absolutely no patience for developers unwilling to make good code. I don't give a shit if it takes a while, bad code means vulnerabilities means another fucking data breach. If you as a developer don't want to do what it takes to make good code, then quit and find a new fucking career.

Test coverage alone is meaningless, you need to think about input-coversge as well, and that's where you can spend almost an infinite amount of time. At some point you also have to ship stuff.

You get it!

Fully agreed things need to get shipped but that's why I'm a fan of test driven development. You'll always have your tests written with your feature.

Then again even if someone does it after as long as you write a test every time you write a feature you'll eventually have the code base covered.

Input coverage is new to me, mind linking me some info so I can learn? (Yes google exists but if someone has the low down on a good source I'd prefer that)

By input coverage I just mean that you test with different inputs. It doesn't matter if you have 100% code coverage, if you only tested with the number "1", and the code crashes if you give it a negative number.

If you can prove that your code can't crash (e.g. using types), it's a lot more valuable then spending time thinking about potentially problematic inputs and writing individual tests for them (there ate tools thst help with this, but they are not perfect).

Ahhh gotcha gotcha. I was doing this by default in my python testing, glad I was doing things right

Alright grandpa time to take your meds

Wrong.

Try "Security focused DevOps Engineer" and try making better tests.

There is a whole field, that looks a bit like religion to me, about how to test right.

I can tell you from experience that testing is a tool that can give confidence. There are a few new tools that can help. Mutation testing is one I know that can find bad tests.

Integration tests can help find the most egregious errors that make your application crash.

Not every getter needs a test but using unit tests while developing a feature can even save time because you don't have to start the app and get to the point where the change happens and test by hand.

A review can find some errors but human brains are not compilers it is hard to miss errors and the more you add to a review the easier it can get lost. The reviews can mostly help make sure that the code is more in line with the times style and that more than one person knows about the changes.

You can't find all mistakes all the time. That's why it is very important to have a strategy to avert the worse and revert errors. If you develop a web app: backups, rolling deployments, revert procedures. And make sure everyone know how and try it at least once. These procedures can fail. Refine them trough failure.

That is my experience from working in the field for a while. No tests is bad. Too many tests is a hassle. There will always be errors. Be prepared.

"What's a technique so woodworkers can make sure their furniture fits together on the first try?"

"Measuring and marking out the plan before making cuts."

"Hmm. No, that sounds tedious and difficult, and requires the woodworker to be proactive. No thank you."

Interesting analogy, but it's probably better to address my point directly instead of arguing about woodworking

It's very clear that you want a magic solution that does what you want without any upfront effort. Please let us all know if you find one.

Nothing is without effort. I want something with high confidence. Most organizations fail at testing in one way or another (riddled with false positives, flaky tests, or outright low coverage). Tests are good to have, but they are not enough for what I want.

magic solution

If you think type systems are magic, then sure :)

plesse let us know if you find one

It looks like I can leverage certain type systems to do this. I might need to work with it more before concluding.

A factory pattern helps. By making a dedicated class that handles the creation and distribution of Task entities, that's at least one point of failure that's than centralised.

Big companies do this all the time. Giant monorepos with good testing and reliability systems manage it. As an example: https://abseil.io/resources/swe-book/html/ch22.html

Most languages have an IDE which will manage the import of that object and when you rename incorrectly, it'll flag it up. If you're calling an incorrect function or variable, it'll flag it etc. Many will have refactoring tools so when you rename something through this, it'll rename all instances of that.

This is related to what I discussed in the "Searching" section. Entity fields may not be necessarily imported, so they would not be caught in this. Say you're using that field's name in a SQL query, HTTP or GraphQL request / query. This may also not be caught by IDE.

This also would not cover the case where a field is modified without necessarily changing its name, or a new field is added and now the code using that entity is not using the field.

Usually when you change your database structure, you would change the object that this is mapped into. If you were to change one without the other, that would be a monumental developer oversight. Adding a field without using it in many frameworks wouldn't necessarily break it, so it wouldn't be a bad change per se.

Any change you make to persistence should reflect as a bare minimum, the object data gets mapped into. This would likely be part of the same branch, and you probably shouldn't merge it until it's complete.

You're looking for tooling to protect you from human errors, and nothing is going to do that. It's like asking, how can I stop myself from choking when eating. You just know to chew. If this isn't obvious, it's a good lesson in development. Make one change at a time and make it right. Don't rush off to presentation changes or logic changes until your persistence changes are complete. When you get into habits like this, it becomes steady, methodical and structured. Rushing is the best way to make mistakes and spend more time fixing them. Less haste, more speed.

For example, if I add a new field. I'd write the SQL, run it, populate a value, get that value and test it. Then I'd move on to the object mapping. I'd load it into the code, and get a bare minimum debug out to see it was loaded out, etc. etc.. Small tweak, test and confirm, small change, test and confirm. Every step is validated so when it doesn't work, you know why, you don't guess.

It depends on the language, since you mentioned you don't want to do manual testing -

Start with a mono-repo, as in, 1 repo where you add every other repo as a git submodule

Then, every time something changes you run that repo though the build server, and validate that it at least compiles.

If it compiles, you can go a step further, build something that detects changes, for example by parsing the syntax tree of everything changed, then check the syntax tree of the entire project which other methods / objects might be affected. In dotnet you'd do this with a Roslyn Analyzer

you mentioned you don't want to do manual testing

Just to clarify: I didn't mean that tests shouldn't be written. I just don't see testing as a sufficient solution to this problem.

If it's a microservice architecture using something like openapi and code generators could be a solution. Then the proper classes / types are created during the build step.

Does not avoid the fields being unused, or service B using an older version before being rebuild.

The approach would be similar as a library, but works across different languages while changing the definition only on one place.