Brian Silverman | 7c33ab2 | 2018-08-04 17:14:51 -0700 | [diff] [blame^] | 1 | [/============================================================================ |
| 2 | Boost.odeint |
| 3 | |
| 4 | Copyright 2013 Karsten Ahnert |
| 5 | Copyright 2013 Pascal Germroth |
| 6 | Copyright 2013 Mario Mulansky |
| 7 | |
| 8 | Use, modification and distribution is subject to the Boost Software License, |
| 9 | Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
| 10 | http://www.boost.org/LICENSE_1_0.txt) |
| 11 | =============================================================================/] |
| 12 | |
| 13 | |
| 14 | [section Parallel computation with OpenMP and MPI] |
| 15 | |
| 16 | Parallelization is a key feature for modern numerical libraries due to the vast |
| 17 | availability of many cores nowadays, even on Laptops. |
| 18 | odeint currently supports parallelization with OpenMP and MPI, as described in |
| 19 | the following sections. |
| 20 | However, it should be made clear from the beginning that the difficulty of |
| 21 | efficiently distributing ODE integration on many cores/machines lies in the |
| 22 | parallelization of the system function, which is still the user's |
| 23 | responsibility. |
| 24 | Simply using a parallel odeint backend without parallelizing the system function |
| 25 | will bring you almost no performance gains. |
| 26 | |
| 27 | [section OpenMP] |
| 28 | |
| 29 | [import ../examples/openmp/phase_chain.cpp] |
| 30 | |
| 31 | odeint's OpenMP support is implemented as an external backend, which needs to be |
| 32 | manually included. Depending on the compiler some additional flags may be |
| 33 | needed, i.e. [^-fopenmp] for GCC. |
| 34 | [phase_chain_openmp_header] |
| 35 | |
| 36 | In the easiest parallelization approach with OpenMP we use a standard `vector` |
| 37 | as the state type: |
| 38 | [phase_chain_vector_state] |
| 39 | |
| 40 | We initialize the state with some random data: |
| 41 | [phase_chain_init] |
| 42 | |
| 43 | Now we have to configure the stepper to use the OpenMP backend. |
| 44 | This is done by explicitly providing the `openmp_range_algebra` as a template |
| 45 | parameter to the stepper. |
| 46 | This algebra requires the state type to be a model of Random Access Range and |
| 47 | will be used from multiple threads by the algebra. |
| 48 | [phase_chain_stepper] |
| 49 | |
| 50 | Additional to providing the stepper with OpenMP parallelization we also need |
| 51 | a parallelized system function to exploit the available cores. |
| 52 | Here this is shown for a simple one-dimensional chain of phase oscillators with |
| 53 | nearest neighbor coupling: |
| 54 | [phase_chain_rhs] |
| 55 | |
| 56 | [note In the OpenMP backends the system function will always be called |
| 57 | sequentially from the thread used to start the integration.] |
| 58 | |
| 59 | Finally, we perform the integration by using one of the integrate functions from |
| 60 | odeint. |
| 61 | As you can see, the parallelization is completely hidden in the stepper and the |
| 62 | system function. |
| 63 | OpenMP will take care of distributing the work among the threads and join them |
| 64 | automatically. |
| 65 | [phase_chain_integrate] |
| 66 | |
| 67 | After integrating, the data can be accessed immediately and be processed |
| 68 | further. |
| 69 | Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule` |
| 70 | in the beginning of your program: |
| 71 | [phase_chain_scheduling] |
| 72 | |
| 73 | See [github_link examples/openmp/phase_chain.cpp |
| 74 | openmp/phase_chain.cpp] for the complete example. |
| 75 | |
| 76 | [heading Split state] |
| 77 | |
| 78 | [import ../examples/openmp/phase_chain_omp_state.cpp] |
| 79 | |
| 80 | For advanced cases odeint offers another approach to use OpenMP that allows for |
| 81 | a more exact control of the parallelization. |
| 82 | For example, for odd-sized data where OpenMP's thread boundaries don't match |
| 83 | cache lines and hurt performance it might be advisable to copy the data from the |
| 84 | continuous `vector<T>` into separate, individually aligned, vectors. |
| 85 | For this, odeint provides the `openmp_state<T>` type, essentially an alias for |
| 86 | `vector<vector<T>>`. |
| 87 | |
| 88 | Here, the initialization is done with a `vector<double>`, but then we use |
| 89 | odeint's `split` function to fill an `openmp_state`. |
| 90 | The splitting is done such that the sizes of the individual regions differ at |
| 91 | most by 1 to make the computation as uniform as possible. |
| 92 | [phase_chain_state_init] |
| 93 | |
| 94 | Of course, the system function has to be changed to deal with the |
| 95 | `openmp_state`. |
| 96 | Note that each sub-region of the state is computed in a single task, but at the |
| 97 | borders read access to the neighbouring regions is required. |
| 98 | [phase_chain_state_rhs] |
| 99 | |
| 100 | Using the `openmp_state<T>` state type automatically selects `openmp_algebra` |
| 101 | which executes odeint's internal computations on parallel regions. |
| 102 | Hence, no manual configuration of the stepper is necessary. |
| 103 | At the end of the integration, we use `unsplit` to concatenate the sub-regions |
| 104 | back together into a single vector. |
| 105 | [phase_chain_state_integrate] |
| 106 | |
| 107 | [note You don't actually need to use `openmp_state<T>` for advanced use cases, |
| 108 | `openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>` |
| 109 | and supports any model of Random Access Range as the outer, parallel state type, |
| 110 | and will use the given algebra on its elements.] |
| 111 | |
| 112 | See [github_link examples/openmp/phase_chain_omp_state.cpp |
| 113 | openmp/phase_chain_omp_state.cpp] for the complete example. |
| 114 | |
| 115 | [endsect] |
| 116 | |
| 117 | [section MPI] |
| 118 | |
| 119 | [import ../examples/mpi/phase_chain.cpp] |
| 120 | |
| 121 | To expand the parallel computation across multiple machines we can use MPI. |
| 122 | |
| 123 | The system function implementation is similar to the OpenMP variant with split |
| 124 | data, the main difference being that while OpenMP uses a spawn/join model where |
| 125 | everything not explicitly paralleled is only executed in the main thread, in |
| 126 | MPI's model each node enters the `main()` method independently, diverging based |
| 127 | on its rank and synchronizing through message-passing and explicit barriers. |
| 128 | |
| 129 | odeint's MPI support is implemented as an external backend, too. |
| 130 | Depending on the MPI implementation the code might need to be compiled with i.e. |
| 131 | [^mpic++]. |
| 132 | [phase_chain_mpi_header] |
| 133 | |
| 134 | Instead of reading another thread's data, we asynchronously send and receive the |
| 135 | relevant data from neighbouring nodes, performing some computation in the interim |
| 136 | to hide the latency. |
| 137 | [phase_chain_mpi_rhs] |
| 138 | |
| 139 | Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which |
| 140 | automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious |
| 141 | inner algebra (since our inner state is a `vector`, the inner algebra will be |
| 142 | `range_algebra` as in the OpenMP example). |
| 143 | [phase_chain_state] |
| 144 | |
| 145 | In the main program we construct a `communicator` which tells us the `size` of |
| 146 | the cluster and the current node's `rank` within that. |
| 147 | We generate the input data on the master node only, avoiding unnecessary work on |
| 148 | the other nodes. |
| 149 | Instead of simply copying chunks, `split` acts as a MPI collective function here |
| 150 | and sends/receives regions from master to each slave. |
| 151 | The input argument is ignored on the slaves, but the master node receives |
| 152 | a region in its output and will participate in the computation. |
| 153 | [phase_chain_mpi_init] |
| 154 | |
| 155 | Now that `x_split` contains (only) the local chunk for each node, we start the |
| 156 | integration. |
| 157 | |
| 158 | To print the result on the master node, we send the processed data back using |
| 159 | `unsplit`. |
| 160 | [phase_chain_mpi_integrate] |
| 161 | |
| 162 | [note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it |
| 163 | simply calls the inner algebra on the local chunk and the system function is not |
| 164 | guarded by any barriers either, so if you don't manually place any (for example |
| 165 | in parameter studies cases where the elements are completely independent) you |
| 166 | might see the nodes diverging, returning from this call at different times.] |
| 167 | |
| 168 | See [github_link examples/mpi/phase_chain.cpp |
| 169 | mpi/phase_chain.cpp] for the complete example. |
| 170 | |
| 171 | [endsect] |
| 172 | |
| 173 | [section Concepts] |
| 174 | |
| 175 | [section MPI State] |
| 176 | As used by `mpi_nested_algebra`. |
| 177 | [heading Notation] |
| 178 | [variablelist |
| 179 | [[`InnerState`] [The inner state type]] |
| 180 | [[`State`] [The MPI-state type]] |
| 181 | [[`state`] [Object of type `State`]] |
| 182 | [[`world`] [Object of type `boost::mpi::communicator`]] |
| 183 | ] |
| 184 | [heading Valid Expressions] |
| 185 | [table |
| 186 | [[Name] [Expression] [Type] [Semantics]] |
| 187 | [[Construct a state with a communicator] |
| 188 | [`State(world)`] [`State`] [Constructs the State.]] |
| 189 | [[Construct a state with the default communicator] |
| 190 | [`State()`] [`State`] [Constructs the State.]] |
| 191 | [[Get the current node's inner state] |
| 192 | [`state()`] [`InnerState`] [Returns a (const) reference.]] |
| 193 | [[Get the communicator] |
| 194 | [`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]] |
| 195 | ] |
| 196 | [heading Models] |
| 197 | * `mpi_state<InnerState>` |
| 198 | |
| 199 | [endsect] |
| 200 | |
| 201 | [section OpenMP Split State] |
| 202 | As used by `openmp_nested_algebra`, essentially a Random Access Container with |
| 203 | `ValueType = InnerState`. |
| 204 | [heading Notation] |
| 205 | [variablelist |
| 206 | [[`InnerState`] [The inner state type]] |
| 207 | [[`State`] [The split state type]] |
| 208 | [[`state`] [Object of type `State`]] |
| 209 | ] |
| 210 | [heading Valid Expressions] |
| 211 | [table |
| 212 | [[Name] [Expression] [Type] [Semantics]] |
| 213 | [[Construct a state for `n` chunks] |
| 214 | [`State(n)`] [`State`] [Constructs underlying `vector`.]] |
| 215 | [[Get a chunk] |
| 216 | [`state[i]`] [`InnerState`] [Accesses underlying `vector`.]] |
| 217 | [[Get the number of chunks] |
| 218 | [`state.size()`] [`size_type`] [Returns size of underlying `vector`.]] |
| 219 | ] |
| 220 | [heading Models] |
| 221 | * `openmp_state<ValueType>` with `InnerState = vector<ValueType>` |
| 222 | |
| 223 | [endsect] |
| 224 | |
| 225 | [section Splitter] |
| 226 | [heading Notation] |
| 227 | [variablelist |
| 228 | [[`Container1`] [The continuous-data container type]] |
| 229 | [[`x`] [Object of type `Container1`]] |
| 230 | [[`Container2`] [The chunked-data container type]] |
| 231 | [[`y`] [Object of type `Container2`]] |
| 232 | ] |
| 233 | [heading Valid Expressions] |
| 234 | [table |
| 235 | [[Name] [Expression] [Type] [Semantics]] |
| 236 | [[Copy chunks of input to output elements] |
| 237 | [`split(x, y)`] [`void`] |
| 238 | [Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into |
| 239 | `y.size()` chunks.]] |
| 240 | [[Join chunks of input elements to output] |
| 241 | [`unsplit(y, x)`] [`void`] |
| 242 | [Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x` |
| 243 | is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]] |
| 244 | ] |
| 245 | [heading Models] |
| 246 | * defined for `Container1` = __boost_range and `Container2 = openmp_state` |
| 247 | * and `Container2 = mpi_state`. |
| 248 | |
| 249 | To implement splitters for containers incompatible with __boost_range, |
| 250 | specialize the `split_impl` and `unsplit_impl` types: |
| 251 | ``` |
| 252 | template< class Container1, class Container2 , class Enabler = void > |
| 253 | struct split_impl { |
| 254 | static void split( const Container1 &from , Container2 &to ); |
| 255 | }; |
| 256 | |
| 257 | template< class Container2, class Container1 , class Enabler = void > |
| 258 | struct unsplit_impl { |
| 259 | static void unsplit( const Container2 &from , Container1 &to ); |
| 260 | }; |
| 261 | ``` |
| 262 | [endsect] |
| 263 | |
| 264 | [endsect] |
| 265 | |
| 266 | [endsect] |