Skip to main content

Posts

SPO600 - Project - Stage Three

In this last stage of my SPO600 project, Since I don't have results suitable for upstreaming, I am going to wrap up my project results and do some thorough technical analysis of my results. First of all, I am going to summary what I did for my project. (If you want to go over the details, you can see my previous posts.) I picked a software called SSDUP, it is a traffic-aware SSD burst buffer for HPC systems. I noticed that it uses 3 different Murmurhash3 hash functions, the first two hash functions are optimized for x86 platforms and the third hash function is optimized for x64 platforms. I also noticed that it uses 'gcc -std=gnu99' to compile. In order to easier to handler these 3 hash functions, I split them into 3 files and separately testing them on an AArch64 and x86_64 systems. As the professor said my results in stage two is hard to read, I am going to show my results again in a table format. First hash function (MurmurHash3_x86_32), the execution time for -O3
Recent posts

SPO600 - Project - Stage Two

In the second stage of my SPO600 project, I will need to implement my optimizations to the function in the open source software that I have chosen, which is SSDUP. I will also need to prove that the optimized code produces the same results to the original code. I will also compare the performance of optimized function results between AArch64 and non AArch64 platforms. As I mentioned in my stage one , SSDUP uses 3 different Murmur3 hash function, which is optimized for x86 and x64 platforms. The first function is for 32-bit machines and it produces a 32-bit output. The second function is also for 32-bit machines but it produces a 128-bit output. The third function is for 64-bit machines and it produces a 128-bit output. Each hash function will produce a different hash value. Here is the source code of my file: ----------------------------------------------------------------- #include "murmur3.h" #include <stdio.h> #include <stdlib.h> #include <stdint.h&

SPO600 - Project - Stage One

In our final project, the project will split into 3 stages. This is the first stage of my SPO600 course project. In this stage, we are given a task to find an open source software package that includes a CPU-intensive function or method that compiles to machine code. After I chose the open source software package, I will have to benchmark the performance of the software function on an AArach64 system. When the benchmark job is completed, I will have to think about my strategy that attempts to optimize the hash function for better performance on an AArch64 system and identify it, because those strategies will be used in the second stage of the project. With so many software, I would say picking software is the hardest job in the project, which is the major reason it took me so long to get this post going. But after a lot of research, I picked a software called SSDUP , it is a traffic-aware SSD burst buffer for HPC systems. You can find the source code over here: https://github.com/CGC

Lab 6A

This lab is separated into two parts, I'll blog my work in different post. In the first part, we've got a source code from professor Chris, which is a similar stuff to our lab5, scaling the volume of sound, but it includes inline assembler. The first thing I'll do is add a timer to the code in order to check the performing time. Build and run the program, here is the output: ------------------------------------------------------------------------- [qichang@aarchie spo600_20181_inline_assembler_lab]$ ./vol_simd Generating sample data. Scaling samples. Summing samples. Result: -462 Time: 0.024963 seconds. ------------------------------------------------------------------------- Then I adjusted the number of samples to 5000000 in vol.h: ------------------------------------------------------------------------- [qichang@aarchie spo600_20181_inline_assembler_lab]$ cat vol_simd.c // vol_simd.c :: volume scaling in C using AArch64 SIMD // Chris Tyler 2017.11.29-2018

Lab 5

In this lab, we are going to use different approaches to scale volume of sound, and the algorithm’s effect on system performance. Here is some basic knowledge of digital sound: Digital sound is usually represented by a signed 16-bit integer signal sample, taken at a rate of around 44.1 or 48 thousand samples per second for one stream of samples for the left and right stereo channels. In order to change the volume of sound, we will have to scale the volume factor for each sample, the range of 0.00 to 1.00 (silence to full volume). Here is the source code I got from professor: (vol1.h) ------------------------------------------------- #include <stdlib.h> #include <stdio.h> #include <stdint.h> #include "vol.h" // Function to scale a sound sample using a volume_factor // in the range of 0.00 to 1.00. static inline int16_t scale_sample(int16_t sample, float volume_factor) { return (int16_t) (volume_factor * (float) sample); } int main() { // Al

Lab 4

This lab is going to exploring single instruction/multiple data (SIMD) vectorization, and the auto-vectorization capabilities of the GCC compiler. For the people who not familiar with Vectorization, this article will help: Automatic vectorization In this lab, we are going to write a short program that: -Create two 1000-element integer arrays -Fill them with random numbers in the rang -1000 to +1000 -Sum up those two arrays element-by-element to a third array -Sum up the third array -Print out the result Here is the source code I wrote: ------------------------------------------------------ #include <stdlib.h> #include <stdio.h> #include <time.h> int main(){ int sum; int arr1[1000]; int arr2[1000]; int arr3[1000]; srand(time(NULL)); for(int i=0; i<1000; i++){ arr1[i] = rand() % 2001 - 1000; arr2[i] = rand() % 2001 - 1000; } for(int i=0; i<1000; i++){ arr3[i] = arr1[i] + arr2[i]; } for(int i=0; i<1000; i++){ su

Lab 3

In this lab, we are going to use Assembly language to finish 3 parts. 1. As we are getting familiar with Assembly language, we will create a loop in Assembly to prints out 10 times of "Hello World!". This part is quite easy to do it, here is the source code for x86_64 assembler: ------------------------------------------------------ .text .globl    _start start = 0                       /* starting value for the loop index; note that this is a symbol (constant), not a variable */ max = 10                        /* loop exits when the index hits this number (loop condition is i<max) */ _start:     mov     $start,%r15         /* loop index */     mov     %r15,%r10 loop:         /* ... body of the loop ... do something useful here ... */     movq        $len,%rdx         movq    $msg, %rsi         movq    $1, %rax         movq    $1, %rax         syscall     inc     %r15                /* increment index */     cmp     $max,%r15           /* see if we&#