Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1  2
Author: Subject: Speed of C programs, relative to Java
woelen
Super Administrator
*********




Posts: 8012
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 14-12-2023 at 05:27


For solving polynomial equations, the so-called companion matrix is used. This indeed is a sparse matrix, but the methods for determination of eigenvalues destroy this sparsity already after the first iterative step . There are optimized eigenvalue methods for tridiagonal matrices, but the companion matrix has a very different structure.

And conceptually, if there were a matrix-method, which preserves the sparsity structure of the companion matrix while it does its iterative computatations, then what would be the difference with a method, which just works on the polynomial coefficients? Such a matrix method with order O(n) storage instead of order O(n^2) storage would be another O(n) polynomial solving method. E.g. AE gives all roots simultaneously, just like most eigenvalue methods for matrices, and this removes the need of polynomial deflation, which has its own numerical stability issues.

@clearly_not_atara: The fact that the computations are very fast and the runtime may be a dominated by "non-arithmetic" logic is an interesting observation. Indeed, modern JVMs use inlining, combined with so-called escape analysis to strongly optimize the use of small objects, which only have limited scope. These techniques can dramatically increase the performance of Java applications. Such things unfortunately are not easily achieved with statically compiled C-programs.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 876
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 24-12-2023 at 07:36


I did some tests and I see that fma instructions (calculating a*b - c*d by Kahan's algorithm) are not used until you specify -march like this : gcc -O3 -march=skylake ...
Instead, the software implementation of fma is used (call to some subroutines).
This definitely can delay execution a lot.
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 8012
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 27-12-2023 at 11:01


It does make some difference.

In my quartic solver code it hardly matters, but that code only uses fma instruction sparingly. The quintic solver benefits more from it.

Solving 100M quintics on a single core goes from 100 seconds to 85 seconds, with the -fmarch=skylake option added.
I already used -ffast-math besides the -O3 option. Adding that option also already made a difference of appr. 10% in the C code.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 876
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 28-12-2023 at 00:42


So, what is the time difference for Java vs C for different types of equations now?
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 8012
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 29-12-2023 at 02:12


For quartics, generating, solving 100M equations, and analysing the results:
- Java: 35 seconds
- C (with -O3 -ffast-math -march=skylake): 40 seconds

For quintics, same number of equations:
- Java: 89 seconds
- C (with -O3 -ffast-math -march=skylake): 85 seconds

As you can see, things have improved a lot for the C code. For quintics, it even is slightly faster.

Hwever, I'll stick to Java as main language. What I really like with the Java environment is that the good performance of that is obtained completely automatically. And the same compiled code works on all platforms I have. If a piece of hardware has FMA instructions, then these are used, otherwise it uses a software implementation. I do not have to worry about the right compiler option and distributing the right version of software for the right hardware. Most likely, the binaries, compiled with -march=skylake will not run on an older CPU, which does not have FMA instructions.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 876
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 29-12-2023 at 03:13


Indeed, I am impressed by the performance of the modern JIT. But this achievement is only possible because of Java was intensively used by a large business for many years. This is not the case with any new hardware or custom libraries. Also, many scientific applications which require intensive calculation are written in C++ and using JVM libraries from C++ is not an option. But using C library is possible from virtually any language, so I think the effort to make it fast is absolutely necessary.
I will look on quartics, thank you for presenting the current status.
It would be also very interesting to compare the performance on the new Apple M2/M3 architecture.
As you know, I am interesting in DSP applications. Do you have any example how your solver could be used in such application: audio signal filtering, processing, etc?

Did you compare your solver whith what MATLAB has? May be we can make an external library for MATLAB in C++ togetger if you can decide with license conditions.


[Edited on 29-12-2023 by teodor]
View user's profile View All Posts By User
Rainwater
National Hazard
****




Posts: 919
Registered: 22-12-2021
Member Is Offline

Mood: indisposition to activity

[*] posted on 29-12-2023 at 16:17


Are the data sets being processed identical for both platforms?



"You can't do that" - challenge accepted
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 8012
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 31-12-2023 at 11:19


@Rainwater: The data sets are not identical, but comparable. Actually, even between two runs on the same platform, the data set is not the same. For each polynomial, a few pseudo random numbers are generated, uniformly distributed between 0 and 1 and from that, coefficients are derived. The random generator is seeded by the current system time in nanoseconds. Because of the large number of polynomials testen (100M), on average, the properties of the data sets are so close to each other over different runs, and also for the C and Java versions, that it is possible to do a meaningful comparison of performance.

@teodor: I compared my polynomial solvers with Matlab's roots-function. My generic polynomial solver performs better than roots, especially for somewhat larger degrees. This is because roots uses a matrix eigenvalue method for finding the roots. It is fine to me if the C-code is used for making a Matlab library. At the moment, I only have the Java code on my website, but I'll put C-code on it next week.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
Random
International Hazard
*****




Posts: 1120
Registered: 7-5-2010
Location: In ur closet
Member Is Offline

Mood: Energetic

[*] posted on 22-10-2024 at 04:16


I will now write this post in what we would call Coherent English

I have been dealing with programming since I was very young. My first serious programming started with C++.

Assembly Language is only one. That means, that if an application is performing faster than another one which does the same deed, then this slower application has data which is not necessary for performance of that deed.

It is not known what is incorporated into code. This again touches 1960s. What I would date, I mean, I would definitely date this to 20th century.

.

I used dot as a break in the text.

Original people who dealt with programming languages is what I classify as Psychiatry 1967.

.

Assembly Language is only one. That is why I posted this last post about smallest application that does the deed.

.

The question is, if assembly language itself is optimized enough and I will say, that there is probability, if I could call that, that assembly language itself could be more, how we could call that, efficient.

Last statement is touching Hardware - Software relation.

.

I have those documents from what I will date to 07 2020, concerning this pathway from source code into binary. Which is not from source code directly into binary, but also incorporates this, is it source code - assembly - machine code - binary pathway.

.

Fastest application would be most optimized one.

.

Depends, also what if the specific compiler is adding the unnecessary data?

I have been dealing with something what we would call, Digital Forensics.

.

You never know what is in the code unless you review it.

This again puts me into period 2007 - 2010 when I was in contact with various WarezGroups.

.

It took me, I would say, that I grew up in terms of thinking that I started dealing with this, what they call, low level language of assembly.

Again, assembly is assembly.

If every source code is eventually put into assembly then we should review what is making it slower.

Edit:

I was thinking inside 07 2024 how 64kB is not that small, in terms of that would be, ... Yes, I remember that Google provided incorrect answer how much is 64kB in bit amount, because I was thinking about how much would that FORMAT be if we multiplied a number with another number to get 64kB.

.

Was it 64kB I was researching.

[Edited on 22-10-2024 by Random]
View user's profile View All Posts By User
 Pages:  1  2

  Go To Top