Inspired by GitHub PR #221
Dictionary order is not stable in Python < 3.6 so we need to sort by key to have consistent results. The LogHandler output is also different on older Python versions. Also, don't stop running python tests after the first error.
It is less error-prone to use functions with a return value that indicates when truncation ocurred.