[MBDyn-users] Refactoring of MBDyn's NetCDF interface

Rix, Patrick Patrick.Rix at repower.de
Wed May 18 09:53:28 CEST 2011


@Pierangelo:
Hmmh..if you've made such good experience, seems that (file IO) performance is
more related to the operating system and to the specific (plot-)utillity than to absolute
file size.
Your observations and the performance tests made by Jens during the abs2rel
discussion seem to prove that Windows file IO is significantly slower than on Linux.
I think for now I'll skip the idea of a splitted data base and see how far we come
with a single-file data base. If for our wind turbine models (working on Windows) we run
at some point of time into (post-proc) perfomance problems due to file sizes a splitting
of the data base can also be done in a post-proc step. The idea was just to have the
possiblity to skip one post-processing step.

@Marco:
..I think my Flex annotation was confusing. Our Flex5 standard format does not use NetCDF.
It's a format of it's own using (16Bit) integers for data compression (with some loss of precision)
generated in a post-processing step (Flex5 uses floats for it's raw time series output).
I agree with you that e.g. a 75MB (16Bit-int Flex5) data file is not that big (the files could become much
bigger) but one can realize a significant increase in time for data access (e.g. during plotting) and
I expect at some point of size this will be the case for  NetCDF files, too.
In general the NetCDF format has no explicit limit from verison 4. For version 3.x I read something
about a 4GB limit for large files when using a so called 'classic' NetCDF format.
So for normal use cases file size should not be a problem for the nc data format - but I discovered
that some of the 3rd party open source tools seem to have problems with large files >2GB.
(BTW: with NetCDF format the same sort of (16Bit integer) data packing can be done when using
'short' as a variable's data type in conjuction with the variable attributes 'scale_factor'  and  'add_offset',
with  scale_factor=(Max - Min)/65535   and  add_offset=0.5*(Min + Max)
then it is:   unpacked_data = ( packed_data * scale_factor ) + add_offset .
I found an example for Matlab illustrating this <http://jisao.washington.edu/data/matlab_netcdf.html>)




-----Ursprüngliche Nachricht-----
Von: masarati at aero.polimi.it [mailto:masarati at aero.polimi.it]
Gesendet: Dienstag, 17. Mai 2011 20:11
An: Rix, Patrick
Cc: mbdyn-users at mbdyn.org
Betreff: Re: AW: [MBDyn-users] Refactoring of MBDyn's NetCDF interface

> Pierangelo,
>
> in general I prefer keeping all data in single file, too.
> Up to a size of some 10MB this seems to be a good idea, but
> for larger models and/or simulation time the data base would become
> extremely large - so large that post-processing & visualization might
> slow down significantly or could even fail. If one runs into such problems
> it
> would ge good to be able to reduce file sizes by simply splitting the
> data base to several output files.
> For our Flex5 wind turbine simulations we made good experiences
> in writing all data into one file - up to a size of 50-75MB per data base.
> Beyond that working with such large files becomes a torture..
> and we already use a compressed 16Bit integer format as standard output,
> so
> using floats or doubles for the nc files a size of  >100MB will be reached
> quite fast:
> e.g. a wind turbine model with 4*beam3 for the tower and 3 blades made
> of 5*beam3 gives 53 nodes. Simulating 600sec at 50msec output rate
> results in 12000 time steps. Using floats for the results gives
> 12000[steps] * 4[byte] * 53[nodes] * 18[columns] = 43.7MB just for the
> .mov data. (..for offshore wind turbines the simulations would be even
> longer due to the low frequency of the waves).
> With the other files (.act, .ine, .jnt, etc.) this amount would be roughly
> doubled (..tripled when having .aer output, too).
> As you can see a critical size would be already reached with a standard
> (wind turbine) simulation without having an extraordinary model.
> Now each time when accessing (e.g.plotting) a single time series from
> the data base a (plot) utility would have to read 100-150MB just to get
> some 48kb of data of interest.
> Being able to choose between either a single or a splitted data base
> depending on the size of the problem/output would offer the most
> flexibility, wouldn't it ?

I need to object that my experience is rather different.  For example,
I've just generated a database of about 0.5GB (the corresponding textual
output is more than twice in terms of MB), containing the full output
currently available for 200 nodes and 100 beams running for 10000 time
steps, using double precision data; plotting time series on my laptop
using octave is instantaneous.

octave:7> nc = netcdf('y.nc', 'r');
octave:8> plot(nc{'time'}(:), nc{'node.struct.100.X'}(:,3))

I believe the mileage may significantly vary depending on what hardware,
OS and tool you use.  However, I don't see the advantage in the need to
open more than one file to access cross-dependent data; think of plotting
an internal force as a function of the location of a node, or anything
like that.

Cheers, p.


Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte umgehend den Absender und löschen Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe der in dieser E-Mail enthaltenen Daten ist nicht gestattet. Wie Sie wissen, kann die Sicherheit von Übermittlungen per E-Mail nicht gewährleistet werden, E-Mails können missbräuchlich unter fremdem Namen erstellt oder verändert werden. Aus diesem Grund bitten wir um Verständnis dafür, dass wir zu Ihrem und unserem Schutz die rechtliche Verbindlichkeit der vorstehenden Erklärungen ausschließen müssen. Diese Regelung gilt nur dann nicht, wenn wir mit Ihnen eine anderweitige schriftliche Vereinbarung über die Einhaltung von Sicherheits- und Verschlüsselungsstandards getroffen haben.
This e-mail contains confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. As you know, the security of e-mail transmissions can not be guaranteed. E-mails can be misused to be written or modified under false names. For that reason, we ask you to understand the necessity for us to rule out the legal obligation of the above statement, for your protection and ours. This regulation is only invalid if we have concluded a special written agreement with you about the compliance with security and encryption standards.
REpower Systems AG Sitz: Hamburg Vorstand: Andreas Nauen (Vorsitz), Gregor Gnädig, Derrick Noe, Matthias Schubert
Aufsichtsratsvorsitzender: Tulsi Tanti Registergericht: AG Hamburg (Mitte) HRB Nr.: 75543



More information about the MBDyn-users mailing list