How Python Compares Floats and Ints: When Equals Isn’t Really Equal

abhi9u@lemmy.world to Technology@lemmy.world – 132 points –
How Python Compares Floats and Ints: Why It Can Give Surprising Results
blog.codingconfessions.com
43

TL;DR:

In Python, following returns False.

9007199254740993 == 9007199254740993.0

The floating point number 9007199254740993.0 is internally represented in memory as 9007199254740992.0 (due to how floating point works).

Python has special logic for comparing int with floats. Here it will try to compare the int 9007199254740993 with the float 9007199254740992.0. Python sees that the integer parts are different, so it will stop there and return False.

Comparing floats for equality is generally a bad idea anyways.

Floats should really only be used for approximate math. You need something like Java's BigDecimal or BigInteger to handle floating point math with precision.

Looks like this is the equivalent for Python:

https://docs.python.org/3/library/decimal.html

Comparing is fine, but it should be fuzzy. Less than and greater than are fine, so you basically should only be checking for withing a range of values, not a specific value.

I assume this is because that number is so large that it loses precision, in which case this is more of a quirk of floating point than a quirk of Python.

Disclaimer: Have not read the article yet.

It’s both. As you said it’s because of loss of floating point precision, but it’s also with some of the quirks how Python compares int with float. These two together causes this strange behavior.

If i'm comparing ints with floats, it is my fault in the first place

I geuss it's something like : if close enough, set to true.

Now I'll read the article and discover it's like 100x more complex.

Edit : It is indeed at least 100x more complex.

it's not only more complex it also doesn't work like you described at all

Did nobody read the manual?

IEEE 754 double precision: The 53-bit significand precision gives from 15 to 17 significant decimal digits precision.

I'm not sure where the 17 comes from. It's 15.

The "15 to 17" part is worded somewhat confusingly, but it's not wrong.

The number of bits contained in a double is equivalent to ~15.95 decimal digits. If you want to store exactly a decimal number with a fixed number of significant digits, floor(15.95) = 15 digits is the most you can hope for. However, if you want to store exactly a double by writing it out as a decimal number, you need 17 digits.

Do we have a js type situation here

Probably more like the old precision problem. It ecists in C/C++ too and it's just how fliats and ints work.

I dont think comparisons should be doing type conversion if i compare a float to an int i want it to say false cos types are different.

I don't think that's how most programmers expect it to work at all.

However most people would also expect 0.1+0.2==0.3 to return true, so what do I know.

Floating point is something most of us ignore until it bites us in the ass. And then we never trust it again.

I have to admit: If you (semi-)regularly use floating point comparisons in programming, I don't know why you would ever expect 0.1 + 0.2 == 0.3 to return true. It's common practice to check abs(a - b) < tol, where tol is some small number, to the point that common unit-testing libraries have built-in methods like assertEqual(a, b, tol) specifically for checking whether floats are "equal".

Yeah, a lot of editors throw warnings for using the equals operator with floats by default, as far as I know it's considered bad practice to do it that way.

The issue is a lot of people use floating point numbers, but don't even know it.

How many programmers right now are using JS, the most popular language in the world? How many of them do you think understand floating point numbers and their theoretical levels of accuracy? How many of them are unknowingly using floating points to store currency values?

How many of them could accurately predict the result of the following?

  • 2.99+1.52==4.51
  • 2.99+1.53==4.52
  • 2.99+1.54==4.53

Now imagine that as code to make sure you've paid the right amount in an online store. I guarantee you there is code out there right now that won't let you finish a sale if the total of the basket adds up a certain way.

Then most people shouldn't be writing code, I don't know what else to tell you, this is probably one of the first thing you learn about FP arithmetic, and any decent compiler/linter should warn you about that.

Way too late for that. Every language I know makes some kind of auto conversion for numeric comparisons... and sometimes for strings as well.

I know of Rust, which is pedantic enough to not allow comparing integers to floats directly.

In certain situations, it even disallows making assumptions about equality and ordering between floats.

I still cant properly manage my head around the rust object borrowing. My ray tracer implementation from that blog on ray tracing was slow as shiiiit.

Not sure, what blog post you're talking about, but there's only really three things you can be doing wrong:

  • Tons of cloning.
  • Running your application from a debug build rather than release build.
  • All the usual things one can be doing wrong in any programming language. Like, I imagine real-world raytracing is done on the GPU and uses highly optimized algorithms. I doubt a blog post would dive into those depths. And well, any kind of graphics programming is extremely slow, if you don't issue the exact right incantations that the GPU manufacturer optimized for.

In certain situations, it even disallows making assumptions about equality and ordering between floats.

I'm guessing it does this when you define both floats in the same location with constant values.

The correct way to compare floats whose values you don't know ahead of time is to compare the absolute of their delta against a threshold.

i.e.

abs(a - b) <= 0.00001

The idea being that you can't really compare floats due to how they work. By subtracting them, you can make sure the values are "close enough" and avoid issues with precision not actually mattering all that much past the given threshold. If you define 2 constant values, though, the compiler can probably simplify it for you and just say "Yeah, these two should be the same value at runtime".

Unfortunately, that's not what it's about.

With Rust's language design¹, one could easily forbid ever comparing two floats via the normal operators (not just for literals), but they decided to allow that nonetheless. I'm guessing, because it would just be too annoying for generic implementations, if you'd always need special treatment for floats.

Rather, what I was referring to, is that they've split partial ordering and partial equality from their total variants. And integers are marked that they can do total equality and total ordering, whereas floats are not.

Honestly, it doesn't come up a lot, but I know for example, if you want to use something as a key in a HashMap, it needs to have total equality.
And in a BTreeMap (basically a binary tree, i.e. keys are inserted in a sorted manner), total ordering is required for the keys.

¹) Basically, comparison is typed and each type needs to opt into comparison with any type it wants to be compared to. So, if you define a new data type, then by default you cannot ask it whether it's equal to or less/more than anything else.

Rust has a warning (has it been promoted to error? I think it was supposed to be) about comparing floats. Nothing to do with same being const. You basically don't have an equality operator for them

That makes sense, but then you'd just have people converting the int to a float manually and run into the exact same issues.

They wouldn't be running into an issue, but creating one, that's different

Meh. Imo anyone comparing an integer to a float and not expecting one of them to be implicitly casted to the other's type will create that issue for themselves when doing the same thing with an explicit cast.

What I meant is, the former can be a genuine mistake, the latter is a conscious (probably uneducated) decision

But how far should that be taken should 8 == 8 return false because one is an unsigned int and the other is signed? Or 0.0 == 0.0 where they are floats and doubles? You can make a case for those due to a strong type system but it would go against most peoples idea of what equality is.

If bits aren't same then i dont want it to tell me they are the same. And python just has one implementation for int and float.

I like python cos everything's an object i dont want different types of objects to evaluate the same they are fundamentally different objects is that not what u would expect?

Even in python you can have control of what types of numbers are used under the hood with numpy arrays (and chances are if you are using floats in any quantity you want to be using numpy). I would be very surprised if array([1,2,3], dtype=uint8) == array([1,2,3], dtype=int16) gave [False, False, False]. In general I think == for numbers should give mathematical equivalence, with the understanding that comparing floats is highly likely to give false negatives unless you are extremely careful with what you are comparing.

Numpys more or less a math wrapper for c isnt it?

More Fortran than C, but its the same for any language doing those sorts of array mathematics, they will be calling compiled versions of blas and lapack. Numpy builds up low level highly optimised compiled functions into a coherant python ecosystem. A numpy array is a C array with some metadata sure, but a python list is also just a C array of pointers to pyobjects.

1 more...