forked from Unidata/netcdf-c
-
Notifications
You must be signed in to change notification settings - Fork 0
/
guide.dox
2875 lines (2332 loc) · 130 KB
/
guide.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*! \page user_guide The NetCDF User's Guide
<H2>The sections of the netCDF User's Guide</H2>
- \subpage netcdf_introduction
- \subpage file_structure_and_performance
- \subpage data_type
- \subpage netcdf_data_set_components
- \subpage netcdf_perf_chunking
- \subpage netcdf_utilities_guide
- \subpage dap_support
- \subpage BestPractices
- \subpage users_guide_appendices
<H2>The Purpose of NetCDF</H2>
The purpose of the Network Common Data Form (netCDF) interface is to
allow you to create, access, and share array-oriented data in a form
that is self-describing and portable. "Self-describing" means that a
dataset includes information defining the data it contains. "Portable"
means that the data in a dataset is represented in a form that can be
accessed by computers with different ways of storing integers,
characters, and floating-point numbers. Using the netCDF interface for
creating new datasets makes the data portable. Using the netCDF
interface in software for data access, management, analysis, and
display can make the software more generally useful.
The netCDF software includes C, Fortran 77, Fortran 90, and C++
interfaces for accessing netCDF data. These libraries are available
for many common computing platforms.
The community of netCDF users has contributed ports of the software to
additional platforms and interfaces for other programming languages as
well. Source code for netCDF software libraries is freely available to
encourage the sharing of both array-oriented data and the software
that makes the data useful.
This User's Guide presents the netCDF data model. It explains how the
netCDF data model uses dimensions, variables, and attributes to store
data.
Reference documentation for UNIX systems, in the form of UNIX 'man'
pages for the C and FORTRAN interfaces is also available at the netCDF
web site (http://www.unidata.ucar.edu/netcdf), and with the netCDF
distribution.
The latest version of this document, and the language specific guides,
can be found at the netCDF web site (http://www.unidata.ucar.edu/netcdf/docs) along with extensive additional information about netCDF, including pointers to other software that works with netCDF data.
Separate documentation of the Java netCDF library can be found at
http://www.unidata.ucar.edu/software/netcdf-java/.
\page netcdf_introduction An Introduction to NetCDF
\tableofcontents
\section netcdf_interface The netCDF Interface
The Network Common Data Form, or netCDF, is an interface to a library
of data access functions for storing and retrieving data in the form
of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...)
rectangular structure containing items which all have the same data
type (e.g., 8-bit character, 32-bit integer). A scalar (simple single
value) is a 0-dimensional array.
NetCDF is an abstraction that supports a view of data as a collection
of self-describing, portable objects that can be accessed through a
simple interface. Array values may be accessed directly, without
knowing details of how the data are stored. Auxiliary information
about the data, such as what units are used, may be stored with the
data. Generic utilities and application programs can access netCDF
datasets and transform, combine, analyze, or display specified fields
of the data. The development of such applications has led to improved
accessibility of data and improved re-usability of software for
array-oriented data management, analysis, and display.
The netCDF software implements an abstract data type, which means that
all operations to access and manipulate data in a netCDF dataset must
use only the set of functions provided by the interface. The
representation of the data is hidden from applications that use the
interface, so that how the data are stored could be changed without
affecting existing programs. The physical representation of netCDF
data is designed to be independent of the computer on which the data
were written.
Unidata supports the netCDF interfaces for C (see <a href="http://www.unidata.ucar.edu/netcdf/docs/" >NetCDF-C User's Guide</a>), FORTRAN 77 (see <a href="http://www.unidata.ucar.edu/netcdf/documentation/historic/netcdf-f77/index.html#Top" >NetCDF Fortran 77 Interface Guide</a>), FORTRAN 90 (see <a href="http://www.unidata.ucar.edu/netcdf/documentation/historic/netcdf-f90/index.html" >NetCDF Fortran 90 Interface Guide</a>), and C++ (see <a href="http://www.unidata.ucar.edu/netcdf/documentation/historic/cxx4/index.html" >NetCDF C++ Interface Guide</a>).
The netCDF library is supported for various UNIX operating systems. A
MS Windows port is also available. The software is also ported and
tested on a few other operating systems, with assistance from users
with access to these systems, before each major release. Unidata's
netCDF software is freely available <a
href="ftp://ftp.unidata.ucar.edu/pub/netcdf">via FTP</a> to encourage
its widespread use.
For detailed installation instructions, see \ref getting_and_building_netcdf.
\section netcdf_format The netCDF File Format
Until version 3.6.0, all versions of netCDF employed only one binary
data format, now referred to as netCDF classic format. NetCDF classic
is the default format for all versions of netCDF.
In version 3.6.0 a new binary format was introduced, 64-bit offset
format. Nearly identical to netCDF classic format, it uses 64-bit
offsets (hence the name), and allows users to create far larger
datasets.
In version 4.0.0 a third binary format was introduced: the HDF5
format. Starting with this version, the netCDF library can use HDF5
files as its base format. (Only HDF5 files created with netCDF-4 can
be understood by netCDF-4).
By default, netCDF uses the classic format. To use the 64-bit offset
or netCDF-4/HDF5 format, set the appropriate constant when creating
the file.
To achieve network-transparency (machine-independence), netCDF classic
and 64-bit offset formats are implemented in terms of an external
representation much like XDR (eXternal Data Representation, see
http://www.ietf.org/rfc/rfc1832.txt), a standard for describing and
encoding data. This representation provides encoding of data into
machine-independent sequences of bits. It has been implemented on a
wide variety of computers, by assuming only that eight-bit bytes can
be encoded and decoded in a consistent way. The IEEE 754
floating-point standard is used for floating-point data
representation.
Descriptions of the overall structure of netCDF classic and 64-bit
offset files are provided later in this manual. See Structure.
The details of the classic and 64-bit offset formats are described in
an appendix. See File Format. However, users are discouraged from
using the format specification to develop independent low-level
software for reading and writing netCDF files, because this could lead
to compatibility problems if the format is ever modified.
\subsection select_format How to Select the Format
With three different base formats, care must be taken in creating data
files to choose the correct base format.
The format of a netCDF file is determined at create time.
When opening an existing netCDF file the netCDF library will
transparently detect its format and adjust accordingly. However,
netCDF library versions earlier than 3.6.0 cannot read 64-bit offset
format files, and library versions before 4.0 can't read netCDF-4/HDF5
files. NetCDF classic format files (even if created by version 3.6.0
or later) remain compatible with older versions of the netCDF library.
Users are encouraged to use netCDF classic format to distribute data,
for maximum portability.
To select 64-bit offset or netCDF-4 format files, C programmers should
use flag NC_64BIT_OFFSET or NC_NETCDF4 in function nc_create. See
nc_create.
In Fortran, use flag nf_64bit_offset or nf_format_netcdf4 in function
NF_CREATE. See NF_CREATE.
It is also possible to change the default creation format, to convert
a large body of code without changing every create call. C programmers
see nc_set_default_format. Fortran programs see NF_SET_DEFAULT_FORMAT.
\subsection classic_format NetCDF Classic Format
The original netCDF format is identified using four bytes in the file
header. All files in this format have “CDF\001” at the beginning of
the file. In this documentation this format is referred to as “netCDF
classic format.”
NetCDF classic format is identical to the format used by every
previous version of netCDF. It has maximum portability, and is still
the default netCDF format.
For some users, the various 2 GiB format limitations of the classic
format become a problem. (see Classic Limitations). 1.4.2 NetCDF
64-bit Offset Format
For these users, 64-bit offset format is a natural choice. It greatly
eases the size restrictions of netCDF classic files (see 64 bit Offset
Limitations).
Files with the 64-bit offsets are identified with a “CDF\002” at the
beginning of the file. In this documentation this format is called
“64-bit offset format.”
Since 64-bit offset format was introduced in version 3.6.0, earlier
versions of the netCDF library can't read 64-bit offset files.
\subsection netcdf_4_format NetCDF-4 Format
In version 4.0, netCDF included another new underlying format: HDF5.
NetCDF-4 format files offer new features such as groups, compound
types, variable length arrays, new unsigned integer types, parallel
I/O access, etc. None of these new features can be used with classic
or 64-bit offset files.
NetCDF-4 files can't be created at all, unless the netCDF configure
script is run with –enable-netcdf-4. This also requires version 1.8.0
of HDF5.
For the netCDF-4.0 release, netCDF-4 features are only available from
the C and Fortran interfaces. We plan to bring netCDF-4 features to
the CXX API in a future release of netCDF.
NetCDF-4 files can't be read by any version of the netCDF library
previous to 4.0. (But they can be read by HDF5, version 1.8.0 or
better).
For more discussion of format issues see The NetCDF Tutorial.
\section architecture NetCDF Library Architecture
\image html netcdf_architecture.png "NetCDF Architecture"
\image latex netcdf_architecture.png "NetCDF Architecture"
\image rtf netcdf_architecture.png "NetCDF Architecture"
The netCDF C-based libraries depend on a core C library and some externally developed libraries.
- NetCDF-Java is an independent implementation, not shown here
- C-based 3rd-party netCDF APIs for other languages include Python, Ruby, Perl, Fortran-2003, MATLAB, IDL, and R
- Libraries that don't support netCDF-4 include Perl and old C++
- 3rd party libraries are optional (HDF5, HDF4, zlib, szlib, pnetcdf, libcurl), depending on what features are needed and how netCDF is configured
- "Apps" in the above means applications, not mobile apps!
\section performance What about Performance?
One of the goals of netCDF is to support efficient access to small
subsets of large datasets. To support this goal, netCDF uses direct
access rather than sequential access. This can be much more efficient
when the order in which data is read is different from the order in
which it was written, or when it must be read in different orders for
different applications.
The amount of overhead for a portable external representation depends
on many factors, including the data type, the type of computer, the
granularity of data access, and how well the implementation has been
tuned to the computer on which it is run. This overhead is typically
small in comparison to the overall resources used by an
application. In any case, the overhead of the external representation
layer is usually a reasonable price to pay for portable data access.
Although efficiency of data access has been an important concern in
designing and implementing netCDF, it is still possible to use the
netCDF interface to access data in inefficient ways: for example, by
requesting a slice of data that requires a single value from each
record. Advice on how to use the interface efficiently is provided in
Structure.
The use of HDF5 as a data format adds significant overhead in metadata
operations, less so in data access operations. We continue to study
the challenge of implementing netCDF-4/HDF5 format without
compromising performance.
\section creating_self Creating Self-Describing Data conforming to Conventions
The mere use of netCDF is not sufficient to make data
"self-describing" and meaningful to both humans and machines. The
names of variables and dimensions should be meaningful and conform to
any relevant conventions. Dimensions should have corresponding
coordinate variables (See \ref coordinate_variables) where sensible.
Attributes play a vital role in providing ancillary information. It is
important to use all the relevant standard attributes using the
relevant conventions. For a description of reserved attributes (used
by the netCDF library) and attribute conventions for generic
application software, see Attribute Conventions.
A number of groups have defined their own additional conventions and
styles for netCDF data. Descriptions of these conventions, as well as
examples incorporating them can be accessed from the netCDF
Conventions site, http://www.unidata.ucar.edu/netcdf/conventions.html.
These conventions should be used where suitable. Additional
conventions are often needed for local use. These should be
contributed to the above netCDF conventions site if likely to interest
other users in similar areas.
\section limitations Limitations of netCDF
The netCDF classic data model is widely applicable to data that can be
organized into a collection of named array variables with named
attributes, but there are some important limitations to the model and
its implementation in software. Some of these limitations have been
removed or relaxed in netCDF-4 files, but still apply to netCDF
classic and netCDF 64-bit offset files.
Currently, netCDF classic and 64-bit offset formats offer a limited
number of external numeric data types: 8-, 16-, 32-bit integers, or
32- or 64-bit floating-point numbers. (The netCDF-4 format adds 64-bit
integer types and unsigned integer types.)
With the netCDF-4/HDF5 format, new unsigned integers (of various
sizes), 64-bit integers, and the string type allow improved expression
of meaning in scientific data. The new VLEN (variable length) and
COMPOUND types allow users to organize data in new ways.
With the classic netCDF file format, there are constraints that limit
how a dataset is structured to store more than 2 GiBytes (a GiByte is
2^30 or 1,073,741,824 bytes, as compared to a Gbyte, which is
1,000,000,000 bytes.) of data in a single netCDF dataset. (see Classic
Limitations). This limitation is a result of 32-bit offsets used for
storing relative offsets within a classic netCDF format file. Since
one of the goals of netCDF is portable data, and some file systems
still can't deal with files larger than 2 GiB, it is best to keep
files that must be portable below this limit. Nevertheless, it is
possible to create and access netCDF files larger than 2 GiB on
platforms that provide support for such files (see Large File
Support).
The new 64-bit offset format allows large files, and makes it easy to
create to create fixed variables of about 4 GiB, and record variables
of about 4 GiB per record. (see 64 bit Offset Limitations). However,
old netCDF applications will not be able to read the 64-bit offset
files until they are upgraded to at least version 3.6.0 of netCDF
(i.e. the version in which 64-bit offset format was introduced).
With the netCDF-4/HDF5 format, size limitations are further relaxed,
and files can be as large as the underlying file system
supports. NetCDF-4/HDF5 files are unreadable to the netCDF library
before version 4.0.
Another limitation of the classic (and 64-bit offset) model is that
only one unlimited (changeable) dimension is permitted for each netCDF
data set. Multiple variables can share an unlimited dimension, but
then they must all grow together. Hence the classic netCDF model does
not permit variables with several unlimited dimensions or the use of
multiple unlimited dimensions in different variables within the same
dataset. Variables that have non-rectangular shapes (for example,
ragged arrays) cannot be represented conveniently.
In netCDF-4/HDF5 files, multiple unlimited dimensions are fully
supported. Any variable can be defined with any combination of limited
and unlimited dimensions.
The extent to which data can be completely self-describing is limited:
there is always some assumed context without which sharing and
archiving data would be impractical. NetCDF permits storing meaningful
names for variables, dimensions, and attributes; units of measure in a
form that can be used in computations; text strings for attribute
values that apply to an entire data set; and simple kinds of
coordinate system information. But for more complex kinds of metadata
(for example, the information necessary to provide accurate
georeferencing of data on unusual grids or from satellite images), it
is often necessary to develop conventions.
Specific additions to the netCDF data model might make some of these
conventions unnecessary or allow some forms of metadata to be
represented in a uniform and compact way. For example, adding explicit
georeferencing to the netCDF data model would simplify elaborate
georeferencing conventions at the cost of complicating the model. The
problem is finding an appropriate trade-off between the richness of
the model and its generality (i.e., its ability to encompass many
kinds of data). A data model tailored to capture the shared context
among researchers within one discipline may not be appropriate for
sharing or combining data from multiple disciplines.
The classic netCDF data model (which is used for classic-format and
64-bit offset format data) does not support nested data structures
such as trees, nested arrays, or other recursive structures. Through
use of indirection and conventions it is possible to represent some
kinds of nested structures, but the result may fall short of the
netCDF goal of self-describing data.
In netCDF-4/HDF5 format files, the introduction of the compound type
allows the creation of complex data types, involving any combination
of types. The VLEN type allows efficient storage of ragged arrays, and
the introduction of hierarchical groups allows users new ways to
organize data.
Finally, using the netCDF-3 programming interfaces, concurrent access
to a netCDF dataset is limited. One writer and multiple readers may
access data in a single dataset simultaneously, but there is no
support for multiple concurrent writers.
NetCDF-4 supports parallel read/write access to netCDF-4/HDF5 files,
using the underlying HDF5 library and parallel read/write access to
classic and 64-bit offset files using the parallel-netcdf library.
For more information about HDF5, see the HDF5 web site:
http://hdfgroup.org/HDF5/.
For more information about parallel-netcdf, see their web site:
http://www.mcs.anl.gov/parallel-netcdf.
\page netcdf_data_set_components The Components of a NetCDF Data Set
\tableofcontents
\section data_model The Data Model
A netCDF dataset contains dimensions, variables, and attributes, which
all have both a name and an ID number by which they are
identified. These components can be used together to capture the
meaning of data and relations among data fields in an array-oriented
dataset. The netCDF library allows simultaneous access to multiple
netCDF datasets which are identified by dataset ID numbers, in
addition to ordinary file names.
\subsection Enhanced Data Model in NetCDF-4/HDF5 Files
Files created with the netCDF-4 format have access to an enhanced data
model, which includes named groups. Groups, like directories in a Unix
file system, are hierarchically organized, to arbitrary depth. They
can be used to organize large numbers of variables.
\image html nc4-model.png "Enhanced NetCDF Data Model"
\image latex nc4-model.png "Enhanced NetCDF Data Model"
\image rtf nc4-model.png "Enhanced NetCDF Data Model"
Each group acts as an entire netCDF dataset in the classic model. That
is, each group may have attributes, dimensions, and variables, as well
as other groups.
The default group is the root group, which allows the classic netCDF
data model to fit neatly into the new model.
Dimensions are scoped such that they can be seen in all descendant
groups. That is, dimensions can be shared between variables in
different groups, if they are defined in a parent group.
In netCDF-4 files, the user may also define a type. For example a
compound type may hold information from an array of C structures, or a
variable length type allows the user to read and write arrays of
variable length values.
Variables, groups, and types share a namespace. Within the same group,
variables, groups, and types must have unique names. (That is, a type
and variable may not have the same name within the same group, and
similarly for sub-groups of that group.)
Groups and user-defined types are only available in files created in
the netCDF-4/HDF5 format. They are not available for classic or 64-bit
offset format files.
\section dimensions Dimensions
A dimension may be used to represent a real physical dimension, for
example, time, latitude, longitude, or height. A dimension might also
be used to index other quantities, for example station or
model-run-number.
A netCDF dimension has both a name and a length.
A dimension length is an arbitrary positive integer, except that one
dimension in a classic or 64-bit offset netCDF dataset can have the
length UNLIMITED. In a netCDF-4 dataset, any number of unlimited
dimensions can be used.
Such a dimension is called the unlimited dimension or the record
dimension. A variable with an unlimited dimension can grow to any
length along that dimension. The unlimited dimension index is like a
record number in conventional record-oriented files.
A netCDF classic or 64-bit offset dataset can have at most one
unlimited dimension, but need not have any. If a variable has an
unlimited dimension, that dimension must be the most significant
(slowest changing) one. Thus any unlimited dimension must be the first
dimension in a CDL shape and the first dimension in corresponding C
array declarations.
A netCDF-4 dataset may have multiple unlimited dimensions, and there
are no restrictions on their order in the list of a variables
dimensions.
To grow variables along an unlimited dimension, write the data using
any of the netCDF data writing functions, and specify the index of the
unlimited dimension to the desired record number. The netCDF library
will write however many records are needed (using the fill value,
unless that feature is turned off, to fill in any intervening
records).
CDL dimension declarations may appear on one or more lines following
the CDL keyword dimensions. Multiple dimension declarations on the
same line may be separated by commas. Each declaration is of the form
name = length. Use the “/” character to include group information
(netCDF-4 output only).
There are four dimensions in the above example: lat, lon, level, and
time (see \ref data_model). The first three are assigned fixed
lengths; time is assigned the length UNLIMITED, which means it is the
unlimited dimension.
The basic unit of named data in a netCDF dataset is a variable. When a
variable is defined, its shape is specified as a list of
dimensions. These dimensions must already exist. The number of
dimensions is called the rank (a.k.a. dimensionality). A scalar
variable has rank 0, a vector has rank 1 and a matrix has rank 2.
It is possible (since version 3.1 of netCDF) to use the same dimension
more than once in specifying a variable shape. For example,
correlation(instrument, instrument) could be a matrix giving
correlations between measurements using different instruments. But
data whose dimensions correspond to those of physical space/time
should have a shape comprising different dimensions, even if some of
these have the same length.
\section variables Variables
Variables are used to store the bulk of the data in a netCDF
dataset. A variable represents an array of values of the same type. A
scalar value is treated as a 0-dimensional array. A variable has a
name, a data type, and a shape described by its list of dimensions
specified when the variable is created. A variable may also have
associated attributes, which may be added, deleted or changed after
the variable is created.
A variable external data type is one of a small set of netCDF
types. In classic and 64-bit offset files, only the original six types
are available (byte, character, short, int, float, and
double). Variables in netCDF-4 files may also use unsigned short,
unsigned int, 64-bit int, unsigned 64-bit int, or string. Or the user
may define a type, as an opaque blob of bytes, as an array of variable
length arrays, or as a compound type, which acts like a C struct. (See
\ref data_type).
In the CDL notation, classic and 64-bit offset type can be used. They
are given the simpler names byte, char, short, int, float, and
double. The name real may be used as a synonym for float in the CDL
notation. The name long is a deprecated synonym for int. For the exact
meaning of each of the types see External Types. The ncgen utility
supports new primitive types with names ubyte, ushort, uint, int64,
uint64, and string.
CDL variable declarations appear after the variable keyword in a CDL
unit. They have the form
\code
type variable_name ( dim_name_1, dim_name_2, ... );
\endcode
for variables with dimensions, or
\code
type variable_name;
\endcode
for scalar variables.
In the above CDL example there are six variables. As discussed below,
four of these are coordinate variables (See coordinate_variables). The
remaining variables (sometimes called primary variables), temp and rh,
contain what is usually thought of as the data. Each of these
variables has the unlimited dimension time as its first dimension, so
they are called record variables. A variable that is not a record
variable has a fixed length (number of data values) given by the
product of its dimension lengths. The length of a record variable is
also the product of its dimension lengths, but in this case the
product is variable because it involves the length of the unlimited
dimension, which can vary. The length of the unlimited dimension is
the number of records.
\section coordinate_variables Coordinate Variables
It is legal for a variable to have the same name as a dimension. Such
variables have no special meaning to the netCDF library. However there
is a convention that such variables should be treated in a special way
by software using this library.
A variable with the same name as a dimension is called a
<em>coordinate variable</em>. It typically defines a physical coordinate corresponding to
that dimension. The above CDL example includes the coordinate
variables lat, lon, level and time, defined as follows:
\code
int lat(lat), lon(lon), level(level);
short time(time);
...
data:
level = 1000, 850, 700, 500;
lat = 20, 30, 40, 50, 60;
lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
time = 12;
\endcode
These define the latitudes, longitudes, barometric pressures and times
corresponding to positions along these dimensions. Thus there is data
at altitudes corresponding to 1000, 850, 700 and 500 millibars; and at
latitudes 20, 30, 40, 50 and 60 degrees north. Note that each
coordinate variable is a vector and has a shape consisting of just the
dimension with the same name.
A position along a dimension can be specified using an index. This is
an integer with a minimum value of 0 for C programs, 1 in Fortran
programs. Thus the 700 millibar level would have an index value of 2
in the example above in a C program, and 3 in a Fortran program.
If a dimension has a corresponding coordinate variable, then this
provides an alternative, and often more convenient, means of
specifying position along it. Current application packages that make
use of coordinate variables commonly assume they are numeric vectors
and strictly monotonic (all values are different and either increasing
or decreasing).
\section attributes Attributes
NetCDF attributes are used to store data about the data (ancillary
data or metadata), similar in many ways to the information stored in
data dictionaries and schema in conventional database systems. Most
attributes provide information about a specific variable. These are
identified by the name (or ID) of that variable, together with the
name of the attribute.
Some attributes provide information about the dataset as a whole and
are called global attributes. These are identified by the attribute
name together with a blank variable name (in CDL) or a special null
"global variable" ID (in C or Fortran).
In netCDF-4 file, attributes can also be added at the group level.
An attribute has an associated variable (the null "global variable"
for a global or group-level attribute), a name, a data type, a length,
and a value. The current version treats all attributes as vectors;
scalar values are treated as single-element vectors.
Conventional attribute names should be used where applicable. New
names should be as meaningful as possible.
The external type of an attribute is specified when it is created. The
types permitted for attributes are the same as the netCDF external
data types for variables. Attributes with the same name for different
variables should sometimes be of different types. For example, the
attribute valid_max specifying the maximum valid data value for a
variable of type int should be of type int, whereas the attribute
valid_max for a variable of type double should instead be of type
double.
Attributes are more dynamic than variables or dimensions; they can be
deleted and have their type, length, and values changed after they are
created, whereas the netCDF interface provides no way to delete a
variable or to change its type or shape.
The CDL notation for defining an attribute is
\code
variable_name:attribute_name = list_of_values;
\endcode
for a variable attribute, or
\code
:attribute_name = list_of_values;
\endcode
for a global attribute.
For the netCDF classic model, the type and length of each attribute
are not explicitly declared in CDL; they are derived from the values
assigned to the attribute. All values of an attribute must be of the
same type. The notation used for constant values of the various netCDF
types is discussed later (see CDL Constants).
The extended CDL syntax for the enhanced data model supported by
netCDF-4 allows optional type specifications, including user-defined
types, for attributes of user-defined types. See ncdump output or the
reference documentation for ncgen for details of the extended CDL
systax.
In the netCDF example (see \ref data_model), units is an attribute for
the variable lat that has a 13-character array value
'degrees_north'. And valid_range is an attribute for the variable rh
that has length 2 and values '0.0' and '1.0'.
One global attribute, called “source”, is defined for the example
netCDF dataset. This is a character array intended for documenting the
data. Actual netCDF datasets might have more global attributes to
document the origin, history, conventions, and other characteristics
of the dataset as a whole.
Most generic applications that process netCDF datasets assume standard
attribute conventions and it is strongly recommended that these be
followed unless there are good reasons for not doing so. For
information about units, long_name, valid_min, valid_max, valid_range,
scale_factor, add_offset, _FillValue, and other conventional
attributes, see Attribute Conventions.
Attributes may be added to a netCDF dataset long after it is first
defined, so you don't have to anticipate all potentially useful
attributes. However adding new attributes to an existing classic or
64-bit offset format dataset can incur the same expense as copying the
dataset. For a more extensive discussion see Structure.
\section differences_atts_vars Differences between Attributes and Variables
In contrast to variables, which are intended for bulk data, attributes
are intended for ancillary data, or information about the data. The
total amount of ancillary data associated with a netCDF object, and
stored in its attributes, is typically small enough to be
memory-resident. However variables are often too large to entirely fit
in memory and must be split into sections for processing.
Another difference between attributes and variables is that variables
may be multidimensional. Attributes are all either scalars
(single-valued) or vectors (a single, fixed dimension).
Variables are created with a name, type, and shape before they are
assigned data values, so a variable may exist with no values. The
value of an attribute is specified when it is created, unless it is a
zero-length attribute.
A variable may have attributes, but an attribute cannot have
attributes. Attributes assigned to variables may have the same units
as the variable (for example, valid_range) or have no units (for
example, scale_factor). If you want to store data that requires units
different from those of the associated variable, it is better to use a
variable than an attribute. More generally, if data require ancillary
data to describe them, are multidimensional, require any of the
defined netCDF dimensions to index their values, or require a
significant amount of storage, that data should be represented using
variables rather than attributes.
\section object_name NetCDF Names
\subsection Permitted Characters in NetCDF Names
The names of dimensions, variables and attributes (and, in netCDF-4
files, groups, user-defined types, compound member names, and
enumeration symbols) consist of arbitrary sequences of alphanumeric
characters, underscore '_', period '.', plus '+', hyphen '-', or at
sign '@', but beginning with an alphanumeric character or
underscore. However names commencing with underscore are reserved for
system use.
Beginning with versions 3.6.3 and 4.0, names may also include UTF-8
encoded Unicode characters as well as other special characters, except
for the character '/', which may not appear in a name.
Names that have trailing space characters are also not permitted.
Case is significant in netCDF names.
\subsection Name Length
A zero-length name is not allowed.
Names longer than ::NC_MAX_NAME will not be accepted any netCDF define
function. An error of ::NC_EMAXNAME will be returned.
All netCDF inquiry functions will return names of maximum size
::NC_MAX_NAME for netCDF files. Since this does not include the
terminating NULL, space should be reserved for NC_MAX_NAME + 1
characters.
\subsection NetCDF Conventions
Some widely used conventions restrict names to only alphanumeric
characters or underscores.
\section archival Is NetCDF a Good Archive Format?
NetCDF classic or 64-bit offset formats can be used as a
general-purpose archive format for storing arrays. Compression of data
is possible with netCDF (e.g., using arrays of eight-bit or 16-bit
integers to encode low-resolution floating-point numbers instead of
arrays of 32-bit numbers), or the resulting data file may be
compressed before storage (but must be uncompressed before it is
read). Hence, using these netCDF formats may require more space than
special-purpose archive formats that exploit knowledge of particular
characteristics of specific datasets.
With netCDF-4/HDF5 format, the zlib library can provide compression on
a per-variable basis. That is, some variables may be compressed,
others not. In this case the compression and decompression of data
happen transparently to the user, and the data may be stored, read,
and written compressed.
\section background Background and Evolution of the NetCDF Interface
The development of the netCDF interface began with a modest goal
related to Unidata's needs: to provide a common interface between
Unidata applications and real-time meteorological data. Since Unidata
software was intended to run on multiple hardware platforms with
access from both C and FORTRAN, achieving Unidata's goals had the
potential for providing a package that was useful in a broader
context. By making the package widely available and collaborating with
other organizations with similar needs, we hoped to improve the then
current situation in which software for scientific data access was
only rarely reused by others in the same discipline and almost never
reused between disciplines (Fulker, 1988).
Important concepts employed in the netCDF software originated in a
paper (Treinish and Gough, 1987) that described data-access software
developed at the NASA Goddard National Space Science Data Center
(NSSDC). The interface provided by this software was called the Common
Data Format (CDF). The NASA CDF was originally developed as a
platform-specific FORTRAN library to support an abstraction for
storing arrays.
The NASA CDF package had been used for many different kinds of data in
an extensive collection of applications. It had the virtues of
simplicity (only 13 subroutines), independence from storage format,
generality, ability to support logical user views of data, and support
for generic applications.
Unidata held a workshop on CDF in Boulder in August 1987. We proposed
exploring the possibility of collaborating with NASA to extend the CDF
FORTRAN interface, to define a C interface, and to permit the access
of data aggregates with a single call, while maintaining compatibility
with the existing NASA interface.
Independently, Dave Raymond at the New Mexico Institute of Mining and
Technology had developed a package of C software for UNIX that
supported sequential access to self-describing array-oriented data and
a "pipes and filters" (or "data flow") approach to processing,
analyzing, and displaying the data. This package also used the "Common
Data Format" name, later changed to C-Based Analysis and Display
System (CANDIS). Unidata learned of Raymond's work (Raymond, 1988),
and incorporated some of his ideas, such as the use of named
dimensions and variables with differing shapes in a single data
object, into the Unidata netCDF interface.
In early 1988, Glenn Davis of Unidata developed a prototype netCDF
package in C that was layered on XDR. This prototype proved that a
single-file, XDR-based implementation of the CDF interface could be
achieved at acceptable cost and that the resulting programs could be
implemented on both UNIX and VMS systems. However, it also
demonstrated that providing a small, portable, and NASA CDF-compatible
FORTRAN interface with the desired generality was not
practical. NASA's CDF and Unidata's netCDF have since evolved
separately, but recent CDF versions share many characteristics with
netCDF.
In early 1988, Joe Fahle of SeaSpace, Inc. (a commercial software
development firm in San Diego, California), a participant in the 1987
Unidata CDF workshop, independently developed a CDF package in C that
extended the NASA CDF interface in several important ways (Fahle,
1989). Like Raymond's package, the SeaSpace CDF software permitted
variables with unrelated shapes to be included in the same data object
and permitted a general form of access to multidimensional
arrays. Fahle's implementation was used at SeaSpace as the
intermediate form of storage for a variety of steps in their
image-processing system. This interface and format have subsequently
evolved into the Terascan data format.
After studying Fahle's interface, we concluded that it solved many of
the problems we had identified in trying to stretch the NASA interface
to our purposes. In August 1988, we convened a small workshop to agree
on a Unidata netCDF interface, and to resolve remaining open
issues. Attending were Joe Fahle of SeaSpace, Michael Gough of Apple
(an author of the NASA CDF software), Angel Li of the University of
Miami (who had implemented our prototype netCDF software on VMS and
was a potential user), and Unidata systems development
staff. Consensus was reached at the workshop after some further
simplifications were discovered. A document incorporating the results
of the workshop into a proposed Unidata netCDF interface specification
was distributed widely for comments before Glenn Davis and Russ Rew
implemented the first version of the software. Comparison with other
data-access interfaces and experience using netCDF are discussed in
Rew and Davis (1990a), Rew and Davis (1990b), Jenter and Signell
(1992), and Brown, Folk, Goucher, and Rew (1993).
In October 1991, we announced version 2.0 of the netCDF software
distribution. Slight modifications to the C interface (declaring
dimension lengths to be long rather than int) improved the usability
of netCDF on inexpensive platforms such as MS-DOS computers, without
requiring recompilation on other platforms. This change to the
interface required no changes to the associated file format.
Release of netCDF version 2.3 in June 1993 preserved the same file
format but added single call access to records, optimizations for
accessing cross-sections involving non-contiguous data, subsampling
along specified dimensions (using 'strides'), accessing non-contiguous
data (using 'mapped array sections'), improvements to the ncdump and
ncgen utilities, and an experimental C++ interface.
In version 2.4, released in February 1996, support was added for new
platforms and for the C++ interface, significant optimizations were
implemented for supercomputer architectures, and the file format was
formally specified in an appendix to the User's Guide.
FAN (File Array Notation), software providing a high-level interface
to netCDF data, was made available in May 1996. The capabilities of
the FAN utilities include extracting and manipulating array data from
netCDF datasets, printing selected data from netCDF arrays, copying
ASCII data into netCDF arrays, and performing various operations (sum,
mean, max, min, product, and others) on netCDF arrays.
In 1996 and 1997, Joe Sirott implemented and made available the first
implementation of a read-only netCDF interface for Java, Bill Noon
made a Python module available for netCDF, and Konrad Hinsen
contributed another netCDF interface for Python.
In May 1997, Version 3.3 of netCDF was released. This included a new
type-safe interface for C and Fortran, as well as many other
improvements. A month later, Charlie Zender released version 1.0 of
the NCO (netCDF Operators) package, providing command-line utilities
for general purpose operations on netCDF data.
Version 3.4 of Unidata's netCDF software, released in March 1998,
included initial large file support, performance enhancements, and
improved Cray platform support. Later in 1998, Dan Schmitt provided a
Tcl/Tk interface, and Glenn Davis provided version 1.0 of netCDF for
Java.
In May 1999, Glenn Davis, who was instrumental in creating and
developing netCDF, died in a small plane crash during a
thunderstorm. The memory of Glenn's passions and intellect continue to
inspire those of us who worked with him.
In February 2000, an experimental Fortran 90 interface developed by
Robert Pincus was released.
John Caron released netCDF for Java, version 2.0 in February
2001. This version incorporated a new high-performance package for
multidimensional arrays, simplified the interface, and included
OpenDAP (known previously as DODS) remote access, as well as remote
netCDF access via HTTP contributed by Don Denbo.
In March 2001, netCDF 3.5.0 was released. This release fully
integrated the new Fortran 90 interface, enhanced portability,
improved the C++ interface, and added a few new tuning functions.
Also in 2001, Takeshi Horinouchi and colleagues made a netCDF
interface for Ruby available, as did David Pierce for the R language
for statistical computing and graphics. Charles Denham released
WetCDF, an independent implementation of the netCDF interface for
Matlab, as well as updates to the popular netCDF Toolbox for Matlab.
In 2002, Unidata and collaborators developed NcML, an XML
representation for netCDF data useful for cataloging data holdings,
aggregation of data from multiple datasets, augmenting metadata in
existing datasets, and support for alternative views of data. The Java
interface currently provides access to netCDF data through NcML.
Additional developments in 2002 included translation of C and Fortran
User Guides into Japanese by Masato Shiotani and colleagues, creation
of a “Best Practices” guide for writing netCDF files, and provision of
an Ada-95 interface by Alexandru Corlan.
In July 2003 a group of researchers at Northwestern University and
Argonne National Laboratory (Jianwei Li, Wei-keng Liao, Alok
Choudhary, Robert Ross, Rajeev Thakur, William Gropp, and Rob Latham)
contributed a new parallel interface for writing and reading netCDF
data, tailored for use on high performance platforms with parallel
I/O. The implementation built on the MPI-IO interface, providing
portability to many platforms.
In October 2003, Greg Sjaardema contributed support for an alternative
format with 64-bit offsets, to provide more complete support for very
large files. These changes, with slight modifications at Unidata, were
incorporated into version 3.6.0, released in December, 2004.
In 2004, thanks to a NASA grant, Unidata and NCSA began a
collaboration to increase the interoperability of netCDF and HDF5, and
bring some advanced HDF5 features to netCDF users.
In February, 2006, release 3.6.1 fixed some minor bugs.
In March, 2007, release 3.6.2 introduced an improved build system that
used automake and libtool, and an upgrade to the most recent autoconf
release, to support shared libraries and the netcdf-4 builds. This
release also introduced the NetCDF Tutorial and example programs.
The first beta release of netCDF-4.0 was celebrated with a giant party
at Unidata in April, 2007. Over 2000 people danced 'til dawn at the
NCAR Mesa Lab, listening to the Flaming Lips and the Denver Gilbert &
Sullivan repertory company. Brittany Spears performed the
world-premire of her smash hit "Format me baby, one more time."
In June, 2008, netCDF-4.0 was released. Version 3.6.3, the same code
but with netcdf-4 features turned off, was released at the same
time. The 4.0 release uses HDF5 1.8.1 as the data storage layer for
netcdf, and introduces many new features including groups and
user-defined types. The 3.6.3/4.0 releases also introduced handling of
UTF8-encoded Unicode names.
NetCDF-4.1.1 was released in April, 2010, provided built-in client
support for the DAP protocol for accessing data from remote OPeNDAP
servers, full support for the enhanced netCDF-4 data model in the
ncgen utility, a new nccopy utility for copying and conversion among
netCDF format variants, ability to read some HDF4/HDF5 data archives
through the netCDF C or Fortran interfaces, support for parallel I/O
on netCDF classic and 64-bit offset files using the parallel-netcdf
(formerly pnetcdf) library from Argonne/Northwestern, a new nc-config
utility to help compile and link programs that use netCDF, inclusion
of the UDUNITS library for hadling “units” attributes, and inclusion
of libcf to assist in creating data compliant with the Climate and
Forecast (CF) metadata conventions.
In September, 2010, the Netcdf-Java/CDM (Common Data Model) version
4.2 library was declared stable and made available to users. This
100%-Java implementation provided a read-write interface to netCDF-3
classic and 64-bit offset data, as well as a read-only interface to
netCDF-4 enhanced model data and many other formats of scientific data
through a common (CDM) interface. More recent releases support
writing netCDF-4 data. The NetCDF-Java library also
implements NcML, which allows you to add metadata to CDM datasets. A ToolsUI
application is also included that provides a graphical user interface
to capabilities similar to the C-based ncdump and ncgen utilities, as
well as CF-compliance checking and many other features.
\section remote_client The Remote Data Access Client