[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
binary installation issues (cont'd)
Clint,
I apologize --- I looked a bit more carefully at the search code,
and it looks like you minimize search time by using binary
search instead of a full grid search.
Given that a significant fraction of the time spent tuning ATLAS
is spent compiling, and given that you already use binary
search, I think that building binaries for the full grid search
would likely be prohibitively expensive. Also, I think the
resulting installation package would be huge, since the
aggregate size of all .o files for the full search space would
be overwhelming.
If the package were distributed by CD-ROM, size wouldn't
matter so much, and since the build only happens once,
the huge build time would not be a showstopper. However,
it feels a bit like a kludge to me...
So, do you have any suggestions as to whether it might be
possible to extend/enhance/modify ATLAS to enable
distributors to build/redistribute binary packages of ATLAS
which could tune themselves with minimal delay and
recompilation on target machines?
Cheers,
Carl
PS Some buglets in 3.1.4D:
- config.c line 2131: There is a newline in the string
which causes HP's compiler to complain/barf
- Instead of using "-Aa -D_INCLUDE_POSIX_SOURCE"
or simply "-Aa" for HP flags, you should probably use
"-Ae" instead.
- Add HP-UX specific code to discover number of CPUs
- HP-PA machines generally only have an L1 cache,
they don't have an L2/L3 cache. However, the L1
cache is usually between 256KB and 2MB for many
machines. tune/sysinfo/L1CacheSize.c assumes
that L1 caches are at most 256KB. I think a better
algorithm might be to use binary search where the
two end points are very small (say 1K) and very
large (say 2x MaxL2CacheSize). The binary search
would be on the log2() size of the prospective cache
sizes.
I have attached a copy of the updated config.c incorporating
these bug fixes... I have also attached a copy of the updated
L1CacheSize.c which is perhaps more reliable. You should
call the new program with a value that is roughly 2x MaxL2
CacheSize.
<<config.c>>
<<L1CacheSize.c>>
config.c
L1CacheSize.c