[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
binary distribution issues
Clint,
I have been playing around with ATLAS and have been thinking
about how I would create a binary installation package for it. I
have a fair amount of experience and expertise building binary
installation packages for HP machines running HP-UX, but I don't
see an easy way to convert ATLAS to a binary package. (Many
of the arguments hold for other environments, such as Linux where
vendors might want to create binary installation packages for
their distributions.)
The key issues and relevant facts as I see it are:
- ATLAS really needs to know a fair amount about the
system it will eventually run on, such as the cache size
- ATLAS actually needs to do the experiments on the
target machine, not the build machine
- A single binary distribution may be installed on a wide
variety of hardware instances (different cache sizes,
CPU versions, ...)
- Much of the latency in the tuning process is due to the
compilation time
- ATLAS appears to do a full grid search for optimization
- HP-UX does not include an ANSI-C compiler by default
(it is an add-on product)
I am thinking about the following solution:
- As part of the packaging process, the system would
pre-build binary versions of all parameter combinations
so no compilation is necessary on the target machine
- During the installation process, ATLAS would run a
program that does the timing using the pre-compiled
routines to find the optimal configuration. It would
then build libatlas.a and optionally liblapack.a.
- I have been thinking that perhaps ATLAS might further
speed up installation by using an optimization algorithm
to locate the best configuration, rather than a full grid
search.
It is not that I am against building systems from source, since
I do that for nearly all my own software. Rather, it is that most
people have neither the time nor the skill to build everything from
scratch, and yet it would still be useful if they could realize the
full capabilities of the software they have on their machine.
An example of how this might be useful: Travis Oliphant has
been assembling a number of RPM packages for various
number crunching packages, such as lapack. He is building
a MatLab-like environment in Python using these packages.
(http://numpy.sourceforge.net/) It would be very nice if users
who simply want the high-level environment were able to get
the benefits of ATLAS's fast, tuned, array operations without
having to rebuild it themselves.
Does this sound interesting to you? Do you know of anyone
else working on this problem? Would you be amenable to
integrating patches to enable this sort of capability in ATLAS?
Am I missing something which makes this impossible
or undesirable?
Cheers,
Carl Staelin
PS Some quick information on my background: I am a co-author
of the lmbench micro-benchmark suite, and I am the author of
mkpkg which helps automatically generate SD-UX binary
installation packages. I also worked very closely with the
Liverpool Porting and Archive Centre to help them build a
library of binary installation packages for HP-UX. My homepage
is http://www.hpl.hp.com/personal/Carl_Staelin and the
lmbench home page is http://www.bitmover.com/lmbench.
PPS I have also been thinking that it might be possible to create
lmbench-like micro-benchmarks to measure more aspects of
CPU performance to try and predict performance for each
configuration. These predictions could then be used to seed
the search algorithm with likely configurations. It looks like
you already have some benchmarks and I don't know what
additional benchmarks might be needed to provide a rough
prediction of performance for each grid point.