Hi all,
I bought this new laptop for machine learning applications that I wrote myself.
I need to accelerate matrix multiplication with single precision (FP32) which I use cublasSgemm since many years.
My hope was that the new laptop GPU NVIDIA GeForce RTX 5090 mobile would be much faster than my 6 years old NVIDIA GeForce RTX 2080 mobile from my current Octane 18" laptop. But benchmarks showed me the opposite! My old GPU is faster! (or equal)
here the same bechmark on my 6 years old laptop:
Also my own code bechmark and neural net training shows same result.
Both GPUs (RTX 5090, RTX 2080) are equal fast.
How can this be ?
RTX 5090 mobile has 31TFLOP peak, RTX 2080 mobile only 9TFLOP based on specs.
Anybody here has an explaination or same results ?
Best Regards,
Michael
I bought this new laptop for machine learning applications that I wrote myself.
I need to accelerate matrix multiplication with single precision (FP32) which I use cublasSgemm since many years.
My hope was that the new laptop GPU NVIDIA GeForce RTX 5090 mobile would be much faster than my 6 years old NVIDIA GeForce RTX 2080 mobile from my current Octane 18" laptop. But benchmarks showed me the opposite! My old GPU is faster! (or equal)
Code:
mich@recoil:~/Downloads$ git clone https://github.com/hma02/cublasgemm-benchmark
mich@recoil:~/Downloads$ cd cublasgemm-benchmark/
mich@recoil:~/Downloads/cublasgemm-benchmark$ nano run.sh <-- uncomment line 4
mich@recoil:~/Downloads/cublasgemm-benchmark$ ./run.sh
nvcc gemm.cu -lcublas --std=c++11 -arch=sm_60 -o gemm
INFO: Running test for all 1 GPU deivce(s) on host recoil
==================
INFO: testing GPU0
==================
timestamp, index, name, pcie.link.gen.current, pcie.link.gen.max, pstate, clocks.current.graphics [MHz], clocks.max.graphics [MHz]
2025/05/25 10:00:42.125, 0, NVIDIA GeForce RTX 5090 Laptop GPU, 1, 5, P8, 22 MHz, 3090 MHz
2025/05/25 10:00:47.229, 0, NVIDIA GeForce RTX 5090 Laptop GPU, 5, 5, P0, 2152 MHz, 3090 MHz
2025/05/25 10:00:52.231, 0, NVIDIA GeForce RTX 5090 Laptop GPU, 5, 5, P0, 2152 MHz, 3090 MHz
2025/05/25 10:00:57.233, 0, NVIDIA GeForce RTX 5090 Laptop GPU, 5, 5, P2, 1957 MHz, 3090 MHz
cublasSgemm test result:
running with min_m_k_n: 2 max_m_k_n: 16384 repeats: 2
allocating device variables
float32: size 2 average: 0.0114416 s
float32: size 4 average: 2.1184e-05 s
float32: size 8 average: 7.792e-06 s
float32: size 16 average: 6.56e-06 s
float32: size 32 average: 0.00238846 s
float32: size 64 average: 1.6352e-05 s
float32: size 128 average: 1.4144e-05 s
float32: size 256 average: 1.8224e-05 s
float32: size 512 average: 4.6032e-05 s
float32: size 1024 average: 0.000232144 s
float32: size 2048 average: 0.00174157 s
float32: size 4096 average: 0.0146068 s
float32: size 8192 average: 0.138247 s
float32: size 16384 average: 1.09813 s
here the same bechmark on my 6 years old laptop:
Code:
mich@i9:~/Downloads$ git clone https://github.com/hma02/cublasgemm-benchmark
mich@i9:~/Downloads$ cd cublasgemm-benchmark/
mich@i9:~/Downloads/cublasgemm-benchmark$ nano run.sh <-- uncomment line 4
mich@i9:~/Downloads/cublasgemm-benchmark$ ./run.sh
nvcc gemm.cu -lcublas --std=c++11 -arch=sm_60 -o gemm
INFO: Running test for all 1 GPU deivce(s) on host i9
==================
INFO: testing GPU0
==================
timestamp, index, name, pcie.link.gen.current, pcie.link.gen.max, pstate, clocks.current.graphics [MHz], clocks.max.graphics [MHz]
2025/05/25 10:01:29.915, 0, NVIDIA GeForce RTX 2080, 1, 3, P8, 300 MHz, 2100 MHz
2025/05/25 10:01:34.919, 0, NVIDIA GeForce RTX 2080, 3, 3, P2, 1380 MHz, 2100 MHz
2025/05/25 10:01:39.920, 0, NVIDIA GeForce RTX 2080, 3, 3, P2, 1380 MHz, 2100 MHz
cublasSgemm test result:
running with min_m_k_n: 2 max_m_k_n: 16384 repeats: 2
allocating device variables
float32: size 2 average: 2.7088e-05 s
float32: size 4 average: 6.72e-06 s
float32: size 8 average: 4.992e-06 s
float32: size 16 average: 9.568e-06 s
float32: size 32 average: 1.0832e-05 s
float32: size 64 average: 7.072e-06 s
float32: size 128 average: 9.808e-06 s
float32: size 256 average: 1.3472e-05 s
float32: size 512 average: 5.0576e-05 s
float32: size 1024 average: 0.000238624 s
float32: size 2048 average: 0.00165302 s
float32: size 4096 average: 0.0131607 s
float32: size 8192 average: 0.12615 s
float32: size 16384 average: 1.02281 s
Also my own code bechmark and neural net training shows same result.
Both GPUs (RTX 5090, RTX 2080) are equal fast.
How can this be ?
RTX 5090 mobile has 31TFLOP peak, RTX 2080 mobile only 9TFLOP based on specs.
Anybody here has an explaination or same results ?
Best Regards,
Michael