HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining.
(Sep 2012 - Oct 2013)
Graph mining is an important research area within the domain of data mining. One of the most challenging tasks of graph mining is frequent subgraph mining. This work presents the first FPGA-based implementation, to the best of our knowledge, of the most efficient and well-known algorithm for the Frequent Subgraph Mining (FSM) problem, i.e. gSpan. The proposed system, named High Performance Computing-gSpan (HPC-gSpan), achieves manyfold speedup vs. the official software solution of the gboost library when executed on a high-end CPU for various real-world datasets.
This work was developed as a diploma thesis for my undergraduate degree. The results of the work are published at the 24th International Conference on Field Programmable Logic and Applications.
Real Time Fractal Flame Rendering.
(Dec 2012 - June 2013)
Iterated Functions are a broad class of mathematical equations with interesting properties when applied iteratively at some initial data. The Mandelbrot set (and Julia set) is based on iterated functions. However, there are large classes of iterated functions which remain largely unexplored, and due to their computational complexity they are very suitable for hardware acceleration. In this work we have developed a real-time fractal flame rendering algorithm. The algorithm was designed and fully implemented on an ALTERA DE2-115 FPGA board with HDMI output. The implemented system produces 200,000,000 points per second, vs. 1,280,000 in highly optimized software in C, yielding a speedup of ~156x. This is the first FPGA-based flame rendering implementation to our knowledge, and the results are very promising for further study.
Our design was ranked at the first place of the "Most Impressive use of an FPGA" category of the ALTERA Innovate Europe Contest 2012-2013 and it was presented with a Demo and a Poster at the 23rd International Conference on Field Programmable Logic and Applications.
Low Overhead & Energy Efficient Storage Path for Next Generation Computer Systems
(Sep 2014 - Sep 2018)
The constant growth of data is pushing the storage systems towards ever-increasing I/O bandwidth and lower latency requirements. In recent years, the Non-Volatile Memory Express (NVMe) standard has enabled SSD drives to deliver high I/O rates by allowing the storage to be connected directly via the fastest available interconnect (i.e. PCIe) to the processing chip. Although SSDs have become ubiquitous in data centres, reducing the latency gap with the main memory is still a first-order challenge. Additionally, the adoption of FPGAs in data centres is creating opportunities to accelerate various applications and/or OS operations. While FPGAs in data centres have been connected via PCIe to mostly x86 servers, there are now also available heterogeneous System on Chips (SoCs) with multi-cores, and FPGAs integrated on the same die and connected by an on-chip interconnect.
This work analyses the source of performance overhead on existing state-of-the-art storage devices and proposes a novel low overhead and energy efficient storage path called FastPath, that operates transparently to the processing cores. The experimental results showed that FastPath can achieve up to 82% lower latency, up to 12x higher performance, and up to 10x more energy efficiency for standard microbenchmark on an Arm-FPGA Zynq 7000 SoC. Further experiments were conducted on a state-of-the-art SoC, such as the Zynq UltraScale+ MPSoC, using a real application, such as the Redis in-memory database, which received requests by the Yahoo! Cloud Serving Benchmark (YCSB). The experimental evaluation showed that FastPath achieved up to 60% lower latency and 15% higher throughput than the baseline storage path in the Linux kernel.