Sunday, May 27, 2007

"compile-in" method v.s. "sampling" method

There are two approaches to do the profiling, i.e. compile-in and sampling.
Compile-in is a method to record the timing of executing each component or function, and the one using the longest time is considered the bottleneck.
Sampling is a method to periodically check what the current active component is, and the one
appearing most frequently is considered the bottleneck.

Compile-in needs to instrument the source code, and may affect the performance due to the overhead of counting and logging. To implement compile-in method, one way is to instrument the source code directly, for instance, using macros to redefine the calls and recompile. We are thinking to use aspect of programming techniques to help the instrumentation so that we can keep the instrumented code and the original code separately.

Sampling involves interrupting the process, and grabbing a stack trace. Sampling may affect the performance less if the frequency is not intense. The implementation of sampling is more complicate, and we may reply on some system profiling functions.

After the discussion with Paul, we decide to first implement a compile-in method for profiling file I/O, i.e. pread() and pwrite() in c_filesys.c . Our goal is to find out where in the code, a thread is "waiting". Note, when we count the wall time as the execution time, the context switch period is also included. We expect the impact of the context switch among processes is ignorable if the waiting duration is significant.

Saturday, May 26, 2007

Coding Environment Setup

1. How to build PBXT with MySQL
2. How to use SysBench
3. PBXT code download
4. SysBench code download
5. MySQL5.1 code download

Introduction of the "PAT" project

This project is one of the Google Summer of Code 2007 projects for MySQL.

I am a graduate student of University of Toronto, and my mentor of this project is Paul McCullagh.

The goal of this project to provide a performance analysis tool for a MySQL storage engine PrimeBase XT. We aim to help developers to locate the bottleneck of the system. In contrast to traditional profiling tools, we focus on how to capture the impact of the resource contention through measuring the time spent on waiting critical resources, such as I/O, memory and locks. We also try to provide context information to help developers to identify the critical path.