Dictionary order is not stable in Python < 3.6 so we need to sort by key to have consistent results. The LogHandler output is also different on older Python versions. Also, don't stop running python tests after the first error.
- plugin->show_version is not marked NULL any more. - if verbose, it also displays which python class was loaded from which file