Thanks for your offer.
You may post it according to which rounding operation(s) you are using, or in other words: If it is one of rounding modes 1 to 4 of the below table.
But i probably implement these rounding modes only after i finished the math std operations (+ - * /), so it may take a while before i use it.
(That should not discourage you to post it, only that it take a while before i add it to the complete float code. I am thankful for any help.)
The reason:
In true IEE 754 is a bijective mapping between number intervalls and bit coding, and not between numbers and bit coding.
This rounding is used only to secure that the mapping of the number intervalls to strings and vice versa is distinct.
This means, if you convert a (string of a) number (a), and map it to its bit coding, and then convert it back to a (string) number (b), then a and b are in the same Intervall.
This seems to be an unuseful definition, but if you do some math with that numbers it ensures that the result is always the same for all numbers within this so defined intervall.
So other modes of roundings except of the following, defined by IEE 754 should not be implemented:
Code: Select all
:: Rounding to integers using the IEEE 754 rules
:: No Rounding mode +2.5 +1.5 -1.5 -2.5
:: 0 to nearest, ties to even (default) +2.0 +2.0 -2.0 -2.0
:: 1 to nearest, ties away from zero +3.0 +2.0 -2.0 -3.0
:: 2 toward 0 (truncation) +2.0 +1.0 -1.0 -2.0
:: 3 toward +inf (rounding up, or ceiling) +3.0 +2.0 -1.0 -2.0
:: 4 toward -inf (rounding down, or floor) +2.0 +1.0 -2.0 -3.0
Actually i am using the default rounding mode (no 0), even for that cases in which other rounding modes should be used.
Strictly speaking this is false and leads to false results when doing math on it, but this rounding mode is used in round about 75% of all cases where rounding is performed.#
So the error should be not that big.
Additionally there are some additional rounding rules for conversion from the 32 bit coding to floating point string.
I called it 'rounding to nice numbers' as this is what is its result.
Actually this is implemented in a buggy way, but the error again should be very low.
penpen