I'm sure TD is open to revising FM's, but you need to provide or cite quantifiable data. Just saying "X turns too well..." doesn't help much unless you've actually flown in the type

German tests of Russian aircraft can be prone to bias: The pilot might not have used the correct settings, the plane may have been a war-weary example with reduced performance, or some top brass fudged things for propaganda, etc. Russian tests of their own aircraft are also biased for similar reasons, but in the other direction. A "reasonable compromise" between all sources might be necessary to make the best FM.
TD have no affiliation with any one country (they are an international volunteer group). Besides, I think Il-2 has a bigger market in the US anyway, so I doubt there's a pandering to Ruskies only.
Also, on the topic of engine reliability: you need to apply this to all aircraft in some form or another. Yaks weren't the only planes with problems. Every engine has to have the potential to suddenly fail (but some more than others).
As for levers, you're never going to see that in Il-2. It's just too much work to apply the same standard for all planes. That's why CloD was released.
Finally, it's important to consider skill and tactics. Now, I'm sure you're all great fliers, but on the Eastern Front, the Russians generally lacked pilot training, discipline, and skill, and didn't use the best tactics, at least at the start. Online, if the team is balanced numerically, I find that on average the skill levels are quite similar. However, there are no tactics employed, and everyone is gunning it out, lone-wolf style. This type of environment is better for Russian aircraft. By using historical situations and tactics, the picture changes.