[MBDyn-users] reference frames

Pierangelo Masarati masarati at aero.polimi.it
Wed Jul 7 16:29:13 CEST 2010


Rix, Patrick wrote:
>>> I feel, it would be more advantegeous to have multiple but
>>> smaller files and of binary (NetCDF) format (using float values
>>> the data size is over 3 times smaller compared to the text
>>> format).
> 
>> I'd stick with double, for two reasons: - we save casting effort,
>> and directly pass pointers to actual storage arrays to NetCDF - the
>> NetCDF format could eventually become a restart database at any 
>> (saved) simulation time with no loss of accuracy (long term project
>> far in the future)
> 
> I did not mean to change MBDyn's NetCDF output. Using DOUBLE and
> writing to multi-variable *.nc output files would be perfectly o.k.
> for run time (especially if you say that casting to FLOAT would
> significantly slow down the simulation).

I'm not saying that; I'm just saying that what we're sending out is 
stored in arrays of doubles, and NetCDF primitives allow to output 
arrays.  Writing floats does not come at no cost, as you first need to 
cast the arrays and then call NetCDF's primitives.  The cost may be 
tolerable.

> Also Ascii output is fine -
> though of course a binary format would be preferable as we would use
> less disk space (by a factor of somewhat ~1.5 when using DOUBLE) for
> the result files. What I meant was doing the splitting of a multi-var
> files into several single-var files & double-to-float conversion as
> post-processing. Then everybody can safely decide what should be
> his/her preferred data type and structure for storing the results.

What might not be "tolerable" is making the storage format customizable: 
I think it should either be float or double, and I personally prefer double.

> I
> personally feel that having many but small files is more flexible and
> faster during subsequent post-proc steps (e.g. viewing data from
> large multi-var files takes definitely longer - compared to small
> single-var files - as for each data set / time series always the
> whole file has to be read until the data are complete).

This may be true for "single shot" tools, but it shouldn't for math 
environments where you basically open a handle to the database, and it 
remains open (and data possibly cached) for repeated access (I note this 
may depend on how data is internally handled by NetCDF and by the 
specific client, so a general rule may not be inferred).

> I agree that
> for restarting a model surely a very precise data type is required -
> but for all-day data handling for most of the users FLOATs might be
> precise enough (with a factor ~3 of less disk space compared to
> text). For our results its even sufficient to go down to (2-byte)
> SHORT INT compression with a saving of factor ~5 w.r.t. text format -
> then we get 65000 discrete niveaus to represent the data range of a
> time series what means due to the loss of precison we have to accept
> an uncertainty/error of 1/65000 = 1.5e-5 = 0.0015% . Compared to the
> uncertainties of many of the model parameters in the order of a few %
> and considering the gain of using 5x less disk space this seems to be
> a good trade off for our needs. The only thing that is required is a
> bunch of tools providing these data manipulation features - and
> fortunately for NetCDF format there is quite a big palette of tools
> availble.
> 
> 
>>> I'm experimenting with that. Having many but small units is also
>>> a good environment to easily bring the post-proc on a network
>>> cluster like CONDOR where you can have many machines working in
>>> parallel on the same task with each node machine working on small
>>> units.
> 
>> If you have multiple cores, a single DB should work fine as well,
>> as NetCDF is multiple-read, single write thread safe.
> 
> How many cores do nowadays machine have ?  4  and if one can effort
> maybe 8. In a CONDOR cluster you can easily have up to several
> hundeds of machines - all the PCs of a company could join the cluster
> and you would use the power of those which are free (e.g. where the
> user is in meeting). So at night when people have left the office the
> full power is available. And everything with buying extra machines or
> extra software as CONDOR is free. We made good experiences with that.
>  Now as the data files are transferred to the node machines smaller
> file sizes are better suited for network transfer and take less time
> to be processed on a node machine.

Makes sense.  Cheers, p.


More information about the MBDyn-users mailing list