Performance drop in parallel application
Performance drop in parallel application
Hi,
I am trying to run one application through wine. The application supports parallelization and number of cpus (n_cpus) being used can be passed to the application. What I have noticed with n_cpus=4 is that application run through wine takes 2x more time than on windows. I have also noticed that the performance drop happens when parallelisation starts, and there I even see that 4 processes extra spawned are mostly in sleep mode (Linux top command), but also that cpu time (~100-300%), so I don't know is this reporting issue? I have started using wine recently, so my question is: What could cause such a drop? What is the best way to debug this? I also read that wineserver could be bottleneck and I tried enabling WINESYNC 1, but it didn't help...
I hope someone can help me with this.
I am trying to run one application through wine. The application supports parallelization and number of cpus (n_cpus) being used can be passed to the application. What I have noticed with n_cpus=4 is that application run through wine takes 2x more time than on windows. I have also noticed that the performance drop happens when parallelisation starts, and there I even see that 4 processes extra spawned are mostly in sleep mode (Linux top command), but also that cpu time (~100-300%), so I don't know is this reporting issue? I have started using wine recently, so my question is: What could cause such a drop? What is the best way to debug this? I also read that wineserver could be bottleneck and I tried enabling WINESYNC 1, but it didn't help...
I hope someone can help me with this.
Re: Performance drop in parallel application
@MarkoM
Did you use Wine Staging, as regular Wine does not have the esync patchset available.
You can troubleshoot whether esync is enabled with:
If wineserver is a bottleneck - then you would expect to see high CPU time for this process.
You can run wineserver as a foreground process to troubleshoot it, see:
There is also fsync, which can potentially be more efficient than esync, and uses a custom Linux kernel.
Bob
Did you use Wine Staging, as regular Wine does not have the esync patchset available.
You can troubleshoot whether esync is enabled with:
Code: Select all
export WINEDEBUG=+esync
You can run wineserver as a foreground process to troubleshoot it, see:
Code: Select all
wineserver --help
Bob
Re: Performance drop in parallel application
I used regular wine5.0. Thanks I will check this.
Re: Performance drop in parallel application
It seems that wineserver is not a problem, I don't see huge cpu and sync didn't help. But what I see (and think it is strange, as never developed windows parallel app), that 4 processes that are being spawnd by parent are constantly with new pid (spawn, run, end and then go over again). Does this extra creation iso having one process handling message queue, can be cause of this performance drop? Can it be handled in wine?
Re: Performance drop in parallel application
@MarkoM,
So, if I understand you, the parallel processes are being re-spawned rapidly.
I presume this behaviour can be replicated, when running your application, under native Windows?
You might want to check out the perf profiling tool - to checkout what component is causing the Wine bottleneck.
Bob
So, if I understand you, the parallel processes are being re-spawned rapidly.
I presume this behaviour can be replicated, when running your application, under native Windows?
You might want to check out the perf profiling tool - to checkout what component is causing the Wine bottleneck.
Que?MarkoM wrote:Does this extra creation iso having one process handling message queue, can be cause of this performance drop?
Bob
Re: Performance drop in parallel application
I was thinking than normally process is spawned once and then does handling of tasks (some sort of task/message/job queue where it handles one task/message/job after another). But here it is constantly recreated (new task/message/job requires creation of new process), so maybe that is a reason of performance drop wrt to windows machine?
And yes, I can also see this in windows enviroment, processes are re-spawned.
Re: Performance drop in parallel application
I did profiling with perf, and it seems that the most dominant time consuming parts are
Among those, the first two and take the most.
Is there possibility to solve this in wine?
Code: Select all
[kernel.kallsyms] [k] native_queued_spin_lock_slowpath
[kernel.kallsyms] [k] __entry_trampoline_start
[kernel.kallsyms] [k] syscall_return_via_sysret
ntdll.dll.so [.] RtlLeaveCriticalSection
ntdll.dll.so [.] RtlpWaitForCriticalSection
[kernel.kallsyms] [k] try_to_wake_up
ntdll.dll.so [.] RtlEnterCriticalSection
Code: Select all
native_queued_spin_lock_slowpath
Code: Select all
__entry_trampoline_start
Is there possibility to solve this in wine?
-
- Level 2
- Posts: 15
- Joined: Wed Oct 15, 2014 8:19 pm
Re: Performance drop in parallel application
I'm not sure that's nearly enough information to derive anything useful from. Actual time percentages would be helpful; call stacks even more so. In particular what's calling RtlEnterCriticalSection()?
Note also that symbols marked with [k] are in the kernel and not Wine. That doesn't necessarily mean it's not Wine's fault, but just knowing that there's a lot of time spent in the kernel doesn't really help us understand why.
Does using a different "n_cpus" value than 1 or 4 yield interesting results?
Note also that symbols marked with [k] are in the kernel and not Wine. That doesn't necessarily mean it's not Wine's fault, but just knowing that there's a lot of time spent in the kernel doesn't really help us understand why.
Does using a different "n_cpus" value than 1 or 4 yield interesting results?
Re: Performance drop in parallel application
n_cpus=1 is even slower (no parallelisation, so it is logical). However, what I have noticed sometimes, that in wine, some of the extra (n_cpus) processes can also hang, when n_cpus=1, this issue is gone. RtlEnterCriticalSection() is called by those n_cpus processes that are being regularly re-spawned.ObsequiousNewt wrote: ↑Thu Mar 26, 2020 9:32 am I'm not sure that's nearly enough information to derive anything useful from. Actual time percentages would be helpful; call stacks even more so. In particular what's calling RtlEnterCriticalSection()?
Note also that symbols marked with [k] are in the kernel and not Wine. That doesn't necessarily mean it's not Wine's fault, but just knowing that there's a lot of time spent in the kernel doesn't really help us understand why.
Does using a different "n_cpus" value than 1 or 4 yield interesting results?
Re: Performance drop in parallel application
Collecting stack trace in perf (I am not expert there, maybe doing something wrong) have discovered these function calls (maybe I missed sthg in previous perf analysis). What can be concluded from here?
Code: Select all
# Children Self Command Shared Object Symbol
# ........ ........ ............... ....................... ...........................................................
#
36.08% 1.48% application_subprocess libc-2.27.so [.] syscall
|
--34.61%--syscall
|
|--26.58%--entry_SYSCALL_64_after_hwframe
| |
| --25.49%--do_syscall_64
| |
| --24.62%--sys_futex
| |
| --23.12%--do_futex
| |
| |--14.64%--futex_wake
| | |
| | |--10.93%--wake_up_q
| | | |
| | | --10.73%--try_to_wake_up
| | | |
| | | |--5.28%--__task_rq_lock
| | | | |
| | | | --5.14%--_raw_spin_lock
| | | | |
| | | | --5.08%--native_queued_spin_lock_slowpath
| | | |
| | | --2.13%--ttwu_do_activate