Brian Silverman | 72890c2 | 2015-09-19 14:37:37 -0400 | [diff] [blame] | 1 | Bench Template Library |
| 2 | |
| 3 | **************************************** |
| 4 | Introduction : |
| 5 | |
| 6 | The aim of this project is to compare the performance |
| 7 | of available numerical libraries. The code is designed |
| 8 | as generic and modular as possible. Thus, adding new |
| 9 | numerical libraries or new numerical tests should |
| 10 | require minimal effort. |
| 11 | |
| 12 | |
| 13 | ***************************************** |
| 14 | |
| 15 | Installation : |
| 16 | |
| 17 | BTL uses cmake / ctest: |
| 18 | |
| 19 | 1 - create a build directory: |
| 20 | |
| 21 | $ mkdir build |
| 22 | $ cd build |
| 23 | |
| 24 | 2 - configure: |
| 25 | |
| 26 | $ ccmake .. |
| 27 | |
| 28 | 3 - run the bench using ctest: |
| 29 | |
| 30 | $ ctest -V |
| 31 | |
| 32 | You can run the benchmarks only on libraries matching a given regular expression: |
| 33 | ctest -V -R <regexp> |
| 34 | For instance: |
| 35 | ctest -V -R eigen2 |
| 36 | |
| 37 | You can also select a given set of actions defining the environment variable BTL_CONFIG this way: |
| 38 | BTL_CONFIG="-a action1{:action2}*" ctest -V |
| 39 | An exemple: |
| 40 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2 |
| 41 | |
| 42 | Finally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option: |
| 43 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2 |
| 44 | |
| 45 | 4 : Analyze the result. different data files (.dat) are produced in each libs directories. |
| 46 | If gnuplot is available, choose a directory name in the data directory to store the results and type: |
| 47 | $ cd data |
| 48 | $ mkdir my_directory |
| 49 | $ cp ../libs/*/*.dat my_directory |
| 50 | Build the data utilities in this (data) directory |
| 51 | make |
| 52 | Then you can look the raw data, |
| 53 | go_mean my_directory |
| 54 | or smooth the data first : |
| 55 | smooth_all.sh my_directory |
| 56 | go_mean my_directory_smooth |
| 57 | |
| 58 | |
| 59 | ************************************************* |
| 60 | |
| 61 | Files and directories : |
| 62 | |
| 63 | generic_bench : all the bench sources common to all libraries |
| 64 | |
| 65 | actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested. |
| 66 | |
| 67 | libs/* : bench sources specific to each tested libraries. |
| 68 | |
| 69 | machine_dep : directory used to store machine specific Makefile.in |
| 70 | |
| 71 | data : directory used to store gnuplot scripts and data analysis utilities |
| 72 | |
| 73 | ************************************************** |
| 74 | |
| 75 | Principles : the code modularity is achieved by defining two concepts : |
| 76 | |
| 77 | ****** Action concept : This is a class defining which kind |
| 78 | of test must be performed (e.g. a matrix_vector_product). |
| 79 | An Action should define the following methods : |
| 80 | |
| 81 | *** Ctor using the size of the problem (matrix or vector size) as an argument |
| 82 | Action action(size); |
| 83 | *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments) |
| 84 | action.initialize(); |
| 85 | *** calculate : this method actually launch the calculation to be benchmarked |
| 86 | action.calculate; |
| 87 | *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation) |
| 88 | *** name() : this method returns the name of the action (std::string) |
| 89 | |
| 90 | ****** Interface concept : This is a class or namespace defining how to use a given library and |
| 91 | its specific containers (matrix and vector). Up to now an interface should following types |
| 92 | |
| 93 | *** real_type : kind of float to be used (float or double) |
| 94 | *** stl_vector : must correspond to std::vector<real_type> |
| 95 | *** stl_matrix : must correspond to std::vector<stl_vector> |
| 96 | *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface |
| 97 | *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface |
| 98 | |
| 99 | + the following common methods |
| 100 | |
| 101 | *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A |
| 102 | *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B |
| 103 | *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A. |
| 104 | The allocation of A is done in this function. |
| 105 | *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B. |
| 106 | The allocation of B is done in this function. |
| 107 | *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl. |
| 108 | The size of A_STL must corresponds to the size of A. |
| 109 | *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl. |
| 110 | The size of B_STL must corresponds to the size of B. |
| 111 | *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source |
| 112 | and cible must be sized NxN. |
| 113 | *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source |
| 114 | and cible must be sized N. |
| 115 | |
| 116 | and the following method corresponding to the action one wants to be benchmarked : |
| 117 | |
| 118 | *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N) |
| 119 | *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N) |
| 120 | *** ata_product(const gene_matrix & A, gene_matrix & X, int N) |
| 121 | *** aat_product(const gene_matrix & A, gene_matrix & X, int N) |
| 122 | *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N) |
| 123 | |
| 124 | The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with |
| 125 | an interface. A typical main.cpp source stored in a given library directory libs/A_LIB |
| 126 | looks like : |
| 127 | |
| 128 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| 129 | |
| 130 | this function will produce XY data file containing measured mflops as a function of the size for 50 |
| 131 | sizes between 10 and 10000. |
| 132 | |
| 133 | This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time |
| 134 | measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides |
| 135 | a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer |
| 136 | so |
| 137 | |
| 138 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| 139 | |
| 140 | is equivalent to |
| 141 | |
| 142 | bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ; |
| 143 | |
| 144 | If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer). |
| 145 | replace |
| 146 | bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point); |
| 147 | with |
| 148 | bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point); |
| 149 | in generic/bench.hh |
| 150 | |
| 151 | . |
| 152 | |
| 153 | |
| 154 | |