The
PAMELA
Project
Introduction
The Pamela project
aims at the development of a modeling methodology that
enables a rapid development of
low complexity, parameterized performance models
(aka symbolic performance models) of parallel programs running
on shared-memory as well as distributed-memory (vector) machines.
The methodology is based on the use of the formalism Pamela,
which is an acronym for PerformAnce ModEling LAnguage.
Pamela is a process-oriented performance simulation language.
After modeling the (parallel) program and machine in terms
of a Pamela model, the source is compiled
to a symbolic performance model.
Symbolic performance models offer analytic and diagnostic
insight in the complex interplay between parallel program and machine.
Moreover, symbolic performance modeling offers the potential
of reduction of model evaluation cost. As parallel
programs, machines and code/data mappings typically feature
a large degree of regularity, the cost models are
also regular, which offers the opportunity of a huge
reduction of evaluation cost through symbolic simplification.
However, the derivation of symbolic performance models is
a labor-intensive and error-prone process. This is where
Pamela comes in. As Pamela is intuitively close to the
program or machine, modeling is relatively straightforward.
As the subsequent derivation of the symbolic model is
mechanized (through the Pamela compiler) symbolic
performance modeling effort is significantly reduced.
Example
Here's a simple modeling example.
Consider P workstations sharing one remote file server.
The average time a workstation spends computing locally is t_l.
The average workload the workstation requires of the file server is t_s.
In Pamela the system would be modeled by the following equations:
resource parameter fcfs(i) % predefined FCFS array
resource fs = fcfs(0,1) % fs is an FCFS resource
numeric parameter P % # workstations
numeric parameter N % each WS loops N times
numeric parameter t_l % local compute time
numeric parameter t_s % service time
process my_model = par (p = 1, P)
seq (i = 1, N) {
delay(t_l) ; use(fs,t_s)
}
Compiling the above source using the Pamela compiler produces
a new Pamela model:
numeric parameter P
numeric parameter N
numeric parameter t_l
numeric parameter t_s
numeric T_my_model =
max(((N * t_l) + (N * t_s)),max((t_s * (N * (P * [ 1 ])))))
numeric phi_my_model =
((N * t_l) + (N * t_s))
numeric delta_my_model =
(t_s * (N * (P * [ 1 ])))
numeric omega_my_model =
max((t_s * (N * (P * [ 1 ]))))
In this model T_my_model represents the symbolic performance model
of the system. The other equations are generated for specific
performance-diagnostic purposes (see the Pamela papers).
As t_s, N, and P are scalars the T_my_model expression
reduces to
numeric T_my_model = N * max((t_l + t_s),(P * t_s))
This last transformation was done by hand. Soon however, also this
optimization will be part of the Pamela compiler's
symbolic expression simplification repertoire.
Why is the above symbolic performance model also expressed in Pamela?
Typically, the above model is evaluated for various parameter settings.
One could use a standard mathematical tool, of course.
On the other hand, one could now also use the Pamela compiler itself,
using the evaluation engine within the Pamela compiler
that reduces (evaluates) numeric expressions.
For instance, suppose one would want to study the
execution time as a function of P for t_l = 10, t_s = 0.1,
and N = 1,000,000. Then simply remove the parameter
modifiers from the corresponding numeric equations and
enter the numeric values, as in
numeric parameter P
numeric N = 1000000
numeric t_l = 10
numeric t_s = 0.1
numeric T_my_model =
N * max((t_l + t_s),(P * t_s))
numeric phi_my_model =
((N * t_l) + (N * t_s))
numeric delta_my_model =
(t_s * (N * P * [ 1 ]))
numeric omega_my_model =
(t_s * (N * P))
Recompilation will trigger a number of additional
simplifications, resulting in the following
result.
numeric parameter P
numeric N = 1000000
numeric t_l = 10
numeric t_s = 0.1
numeric T_my_model =
(1000000 * max(1.010000e+01,(1.000000e-01 * P)))
numeric phi_my_model =
10100000
numeric delta_my_model =
(100000 * (P * [ 1 ]))
numeric omega_my_model =
(100000 * P)
This model evaluates even faster as 3 parameters (and a number
of expressions) have been numerically reduced. In larger
models (e.g., 50 parameters) this form of "parameter refinement"
may yield considerable savings in evaluation cost.
As can be seen by the above example,
the Pamela compiler supports symbolic modeling in a much broader sense
than just translating process models to numeric models.
It is an algebraic translation, simplification, and evaluation tool
that applies to process algebras as well as ordinary algebras
("numeric" in terms of Pamela). All the model manipulations can be
performed by Pamela is needed. An important part of the tool
is the numeric and symbolic simplification engine.
This (prototype) engine is still being extended.
Simple and Automatic
The example shows how Pamela can serve as a convenient
modeling tool. As the language is, in fact, a (restricted)
performance simulation language, modeling in terms of Pamela
is much more intuitive than directly deriving the symbolic result.
Once, the model is coded, deriving the symbolic execution model
is merely a matter of compiling.
Also note that, in contrast to some other analytic approaches,
such as those using Petri nets, queuing networks, or process algebras,
our model has a solution complexity of O(1)!
On top of this, the choice of the Pamela modeling language
also allows for automatic performance modeling
of parallel programs (or algorithms) that are expressed
in a high-level parallel programming model, such as
the data parallel programming model, where synchronization
structure is limited to series-parallel (SP) structure.
Apart from some program annotation, required as a result of
the inherent undecidability problems in static program analysis,
Pamela model generation from data parallel programs can be
entirely mechanized. In practice this means that one
can compile a data parallel program into
a symbolic performance model and have it generate, e.g.,
a speedup plot in a matter of seconds!
See our latest technical report on the results of
such an automatic cost estimator in the papers section.
Trade-off
How come deriving symbolic performance models turns out
to be so easy? Well, there's a catch, of course. Giving model
development cost the highest priority, the Pamela approach
trades accuracy for solution cost.
This has two consequences. First, in order to enable
automatic symbolic analysis, the synchronization
structure of Pamela models must be series-parallel (SP)
structured, introducing modeling restrictions that
may affect modeling accuracy. Second, the symbolic
predictions are essentially a lower bound on
the actual execution time. Although theory and practice tells us
this bound is limited to tens of percents (typically less),
the prediction error is still much more than what one gets
from simulating the Pamela model.
Utility
Despite these prediction errors, Pamela proves to be a helpful
tool at the early development stages of high-performance
algorithm and architecture development.
We've performed many manual modeling case studies
as well as recent modeling studies using a Pamela
generator, which is an engine built within a compiler
that translates data parallel programs for distributed-memory
machines. The results of these experiments show that
the average prediction error ranges around 10% to 20%.
The accuracy is more than sufficient to provide a
good assessment of program speedup potential and
also allows a correct selection between alternative
coding or data partitioning strategies.
Especially for the data parallel program performance
modeling experiments the results are compelling.
With a minimum of program annotation effort,
symbolic performance models are compiled in a matter
of seconds, while for regular kernels such as
ADI, matrix multiplication, Gaussian elimination,
and parallel sorting (PSRS), the evaluation cost of the
symbolic models ranges in the milliseconds!
(again, see our latest technical report on the results of
the automatic cost estimator in the papers section.)
Simulation or Analytic
As Pamela is a performance simulation language, simulation is
also a way of obtaining performance predictions from
a Pamela model.
The following figure shows the
entire modeling and analysis road map offered by Pamela.
The upper road applies to symbolic performance modeling.
The lower road applies to simulation.
For each route there is a (prototype) compiler available
(we hope to merge them into one some day).
In the simulation case, the Pamela compiler generates a
(C) simulator where each parallel Pamela process is mapped
onto a thread.
While prediction accuracy will generally
be much better compared to the symbolic model evaluation,
prediction cost is huge compared to the
analytic model. Let's take a look at the above file server
example with P = 1,000 workstations executing applications
that generate N = 1,000,000 service calls.
On a 350 MHz Pentium II simulation takes in the order of
100,000 seconds (we estimated this number based on
extrapolation). Evaluating the symbolic model, in contrast,
only takes about 100 microseconds!
Still, simulation can be of use when (1) more accuracy
is required, and (2) when functional simulation
is required. The latter feature is possible as
Pamela allows C code to be inlined (simulation compiler only).
The following
example shows how a simulation model
is derived of a sparse DSCAL vector routine running on a simple bus-based
multiprocessor. The example also discusses the migration from
simulation model to simulation performance model (the latter can be
compiled to a symbolic performance model).
While simulation is supported, the Pamela methodology
is primarily focused at enabling symbolic performance modeling.
Publications
NOTE:
The following material is presented to ensure timely dissemination
of scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the
terms and constraints invoked by each author's copyright.
In most cases, these works may not be reposted without the explicit permission
of the copyright holder.
There is a journal paper that summarizes the Pamela methodology:
-
A.J.C. van Gemund, "Symbolic Performance Modeling of Parallel Systems",
in IEEE Transactions on Parallel and Distributed Systems,
Feb, 2003.
(ps download)
The following technical report presents
an application of Pamela
within an automatic cost estimator for data parallel
programs.
This work has recently been performed within
an ESPRIT II compiler research project called
JOSES.
The report provides an impression of
what Pamela is currently capable of.
-
A.J.C. van Gemund, Automatic Cost Estimation of
Data Parallel Programs, Technical Report 1-68340-44(2001)09,
Faculty of Information Technology and Systems,
Delft University of Technology, Oct 2001, 79 pp.
(ps download)
Here are also a few conference papers that focus on various aspects
of the Pamela methodology:
-
A.J.C. van Gemund, "Performance Prediction of Parallel
Processing Systems: The Pamela Methodology", in
Proc. ACM International Conf. on Supercomputing, Tokio, ACM,
July 1993, pp. 318-327.
(ps download)
(Introduces the methodology)
-
A.J.C. van Gemund, "Compiling Performance Models from
Parallel Programs", in
Proc. ACM International Conf. on Supercomputing, Manchester, ACM,
July 1994, pp. 303-312.
(ps download)
(Discusses the idea of automatic performance prediction)
-
A.J.C. van Gemund, "Compile-time Performance Prediction of
Parallel Systems", in
Proc. Tools'95, Sept 1995, LNCS 977, Springer, pp. 299-313.
(ps download)
(Discusses the accuracy of symbolic prediction)
There is also a PhD thesis on the Pamela methodology:
-
A.J.C. van Gemund, Performance Modeling of Parallel Systems,
PhD thesis, Department of Information Technology and Systems,
Delft University of Technology, The Netherlands, April 1996, 156 pp.
(ps download)
The following technical reports describe the Pamela tools.
- A specific report on the Pamela symbolic model compiler
is not available. Please consult the papers, and the
JOSES
cost estimator deliverable reports.
-
M. Nijweide, The Pamela Compiler, Technical Report
1-68340-44(1996)08, Faculty of Information Technology and Systems,
Delft University of Technology,
The Netherlands, August 1996.
(ps download)
(Pamela simulation compiler)
-
A.J.C. van Gemund, The Pamela Run-Time Library Version 1.0,
Technical Report 1-68340-44(1994)03,
Faculty of Information Technology and Systems,
Delft University of Technology,
The Netherlands, April 1994.
(ps download)
(Run-time system used by simulation compiler)
Software
WARNING:
This section contains links to software that may be
covered by copyright.
Retrieving, copying, or distributing these files may violate
copyright protection laws.
We recommend that the user abides international law in accessing these
links.
As in many university research projects, our tools are developed
to prove the concept, rather than to serve a commercial purpose.
Hence, the tools are essentially prototypes.
Of course, we like to give you the opportunity to experiment with them.
Needless to say, however, the tools are provided as is
without any warranty; without even the implied warranty of
merchantibility or fitness for a particular purpose,
and are intended for academic, non-commercial use.
-
The prototype Pamela symbolic compiler
can be downloaded
here.
This 2.0 version generates symbolic models whereas the older 1.0
version generates simulators. Despite the version number note
that the tool is still a prototype. Read the README file for
further details.
-
The (older) prototype Pamela simulation compiler
can be downloaded
here.
This 1.0 version translates Pamela models to simulators instead of
symbolic time models.
-
The simulation compiler 1.0 translates Pamela models to C,
that is linked with the Pamela
Run-Time System Library, a simple, stand-alone, discrete-event simulation
C library based on light-weight threads
that can be
downloaded
(many students used the Pamela RTL instead of the Pamela simulation
compiler as the RTL has an extremely simple API and is in plain C).
Other Links
Pamela has been used or is being used at various Dutch universities,
the TNO national research organization (modeling distributed,
embedded systems), Philips research (architecture research),
and the Indian Institute of Science (architecture research),
The technology is used in research projects such as
JOSES (EC Esprit, embedded systems), and
Automap (Netherlands, high-performance computing).
A stochastic extension of the methodology is
currently being developed by PhD student
Hasyim Gautama.
His work will allow us to derive symbolic execution time
distribution models. For instance, we
predict the mean, variance, skewness, and kurtosis of
the execution time of a sorting application as a symbolic
function of the data set length at ultra-low cost.
Being able to (symbolically) predict the effects of
data dependent program execution on performance,
is essential in time-critical (embedded) programs.
Also check out this
link for some of the research work in which Pamela is being used.
The Pamela tools are also used in performance modeling courses at our
department (in4078 MSc and DOI courses).
Arjan van
Gemund <A.J.C.vanGemund@its.tudelft.nl>
Last modified: Oct 2001