For such purposes netCDF and efficient mechanisms for storing large datasets that are serves as a portable, efficient file format and programming created and used by their applications. The Network Com- interface, which is popular in numerous scientific applica- mon Data Form netCDF [10, 9] is one such mechanism tion domains.
However, the original interface does not pro- used by a number of applications. This interface is de- mon data access method for storage of structured datasets. Because there is no support for concurrently writing to a netCDF file, paral- lel applications writing netCDF files must serialize access. This serialization is usually performed by passing all data to a single process that then writes all data to netCDF files.
As such, the Government retains a nonexclusive, application programmer. Both versions store multidimen- ROMIO [17], which we would otherwise need to implement sional arrays together with ancillary data in portable, self- ourselves or simply do without.
The support ancillary data in a single file. Unfortunately In this paper we describe the design of our paral- this high degree of flexibility can sometimes come at the lel netCDF PnetCDF interface and discuss preliminary cost of high performance, as seen in previous studies [6, 11]. Section 2 objects that can be accessed through a simple interface.
It reviews some related work. Section 3 presents the design defines a file format as well as a set of programming inter- background of netCDF and points out its potential usage faces for storing and retrieving data in the form of arrays in parallel scientific applications.
Section 4 describes the in netCDF files. Section 5 gives and its serial API and then consider various approaches to experimental performance results. Section 6 concludes the access netCDF files in parallel computing environments. File Format 2.
Related Work NetCDF stores data in an array-oriented dataset, which Considerable research has been done on data access contains dimensions, variables, and attributes. Physically, for scientific applications. Two ray data. The header contains all information or metadata projects, MPI-IO and HDF, are most closely related to our about dimensions, attributes, and variables except for the research. It is implemented and used on a wide range of The netCDF file header first defines a number of dimen- platforms.
In the serial netCDF library, a typical sequence of operations 1st record for rth record variable to write a new netCDF dataset is to create the dataset; de- nd 2 records for 1st , 2 nd, Reading an existing netCDF dataset involves first opening the dataset; inquiring about dimensions, variables, and attributes; reading variable data; and closing the dataset.
Refer to [9] for details of each function in the netCDF library. Figure 1. The basic units of named data in a netCDF dataset are variables, which are multidimensional arrays. Its design and optimization tech- icant dimension and are expected to grow together along niques are suitable for serial access but are not efficient that dimension.
The other, less significant dimensions all or even not possible for parallel access, nor do they allow together define the shape for one record of the variable. For variable-sized arrays, netCDF first defines a record of an array as a subarray com- 3.
Figure 1 Today most scientific applications are programmed to illustrates the storage layouts for fixed-sized and variable- run in parallel environments because of the increasing re- sized arrays in a netCDF file. Before pre- face and provide a prototype implementation. Since a senting our PnetCDF design, we discuss current approaches large number of existing users are running their applica- for using netCDF in parallel programs in a message-passing tions over netCDF, our parallel netCDF design retains the environment.
The drawback of this 4. Figure 3 describes the overall architecture for ure 2 b. In this case, all netCDF operations can proceed our design. In PnetCDF a file is opened, operated, and closed by the However, managing a netCDF dataset is more difficult when participating processes in a communication group.
In order it is spread across multiple files. This approach also violates for these processes to operate on the same file space, es- the netCDF design goal of easy data integration and man- pecially on the structural information contained in the file agement.
This approach, as shown in close scope. An MPI Info object is also added to performance. We discuss the details of this parallel netCDF pass user access hints to the implementation for further opti- design and implementation in the next section.
For instance, all processes must call Node Node Node Node the define mode functions with the same values to get con- sistent dataset definitions. One drawback of the original netCDF interface, and our high-level one, is that only contiguous memory regions may Figure 3. Specif- a library between user space and file system ically, the flexible API provides the user with the ability to space. It processes parallel netCDF requests describe noncontiguous regions in memory, which is miss- from user compute nodes and, after optimiza- ing from the original interface.
The file regions are still described by us- over the end storage on behalf of the user. All our high-level data access routines are actually written using this interface. The most important change from the original netCDF in- terface with respect to data access functions is the split of can be passed in, indicating no hints.
However, hints pro- data mode into two distinct modes: collective and noncol- vide users the ability to deliver the high-level access in- lective data modes. Similar to MPI-IO, the collective functions cific platform and expected low-level access pattern, such as must be called by all the processes in the communicator as- enabling or disabling certain algorithms or adjusting inter- sociated to the opened netCDF file, while the noncollec- nal buffer sizes and policies.
These are passed through the tive functions do not have this constraint. PnetCDF operations provides the underlying PnetCDF implementa- hints can be used to describe expected access patterns at tion an opportunity to further optimize access to the netCDF the netCDF level of abstraction, in terms of variables and file.
These hints can be interpreted by the PnetCDF im- these optimizations are performed without further interven- plementation and either used internally or converted into ap- tion by the application programmer and have been proven to propriate MPI-IO hints. For example, given a hint indicat- provide dramatic performance improvement in multidimen- ing that only a certain small set of variables were going to be sional dataset access [17]. For applica- tions that pull a small amount of data from a large number 4.
Parallel Implementation of separate netCDF files, this type of optimization could be a big win, but is only possible with this additional informa- Based on our parallel interface design, we provide an im- tion.
We first describe our implementa- functions as the original ones. These functions are also tion strategies for dataset functions, define mode functions, made collective to guarantee consistency of dataset struc- attribute functions, and inquiry functions that access the ture among the participating processes in the same MPI netCDF file header.
Example of using PnetCDF. Typically header and start[], count[], stride[], imap[], mpi datatype ar- there are 4 main steps: 1. In such cases, more optimization in- 4. Since they are all in-memory operations not in- fers. They are made collective, but this feature does plement our user hints in PnetCDF as extensions to the MPI not necessarily imply interprocess synchronization.
Thus experienced users have the op- all processes match. We build these Our design and implementation of PnetCDF offers a functions over MPI-IO so that they have better portability number of advantages, as compared with related work, such and provide more optimization opportunities. The basic as HDF5. Since all define mode and attribute tion. The netCDF file chooses linear data layout, in which functions are collective and require all processes in the com- the data arrays are either stored in contiguous space and municator to provide the same arguments when adding, re- in a predefined order or interleaved in a regular pattern.
Since it lays out the data in a linear buffer, metadata file view, MPI Datatype, etc. Thus, there is very little overhead, and the area is performed in parallel. Nor does On the other hand, parallel HDF5 uses a tree-like file netCDF support data compression within its file format al- structure that is similar to the UNIX file system: the data is though compressed writes must be serialized in HDF5, lim- irregularly laid out using super block, header blocks, data iting their usefulness.
Fortunately, these features can all blocks, extended header blocks, and extended data blocks. However, this irreg- ular layout pattern can make it difficult to pass user access 5.
Instead, parallel HDF5 uses dataspace and hyperslabs To evaluate the performance and scalability of our to define the data organization, map and transfer data be- PnetCDF with that of serial netCDF, we compared the two tween memory space and the file space, and does buffer with a synthetic benchmark. Each com- formation for direct access of each data array, and each ar- pute node has 4 GB of memory shared among its eight ray is associated with a predefined, numerical ID that can MHz Power3 processors.
Reading an existing netCDF dataset involves first opening the dataset; inquiring about dimensions, variables, and attributes; reading variable data; and closing the dataset.
Refer to [9] for details of each function in the netCDF library. Figure 1. The basic units of named data in a netCDF dataset are variables, which are multidimensional arrays.
Its design and optimization tech- icant dimension and are expected to grow together along niques are suitable for serial access but are not efficient that dimension. The other, less significant dimensions all or even not possible for parallel access, nor do they allow together define the shape for one record of the variable. For variable-sized arrays, netCDF first defines a record of an array as a subarray com- 3.
Figure 1 Today most scientific applications are programmed to illustrates the storage layouts for fixed-sized and variable- run in parallel environments because of the increasing re- sized arrays in a netCDF file. Before pre- face and provide a prototype implementation. Since a senting our PnetCDF design, we discuss current approaches large number of existing users are running their applica- for using netCDF in parallel programs in a message-passing tions over netCDF, our parallel netCDF design retains the environment.
The drawback of this 4. Figure 3 describes the overall architecture for ure 2 b. In this case, all netCDF operations can proceed our design. In PnetCDF a file is opened, operated, and closed by the However, managing a netCDF dataset is more difficult when participating processes in a communication group.
In order it is spread across multiple files. This approach also violates for these processes to operate on the same file space, es- the netCDF design goal of easy data integration and man- pecially on the structural information contained in the file agement. This approach, as shown in close scope. An MPI Info object is also added to performance. We discuss the details of this parallel netCDF pass user access hints to the implementation for further opti- design and implementation in the next section.
For instance, all processes must call Node Node Node Node the define mode functions with the same values to get con- sistent dataset definitions. One drawback of the original netCDF interface, and our high-level one, is that only contiguous memory regions may Figure 3. Specif- a library between user space and file system ically, the flexible API provides the user with the ability to space. It processes parallel netCDF requests describe noncontiguous regions in memory, which is miss- from user compute nodes and, after optimiza- ing from the original interface.
The file regions are still described by us- over the end storage on behalf of the user. All our high-level data access routines are actually written using this interface. The most important change from the original netCDF in- terface with respect to data access functions is the split of can be passed in, indicating no hints. However, hints pro- data mode into two distinct modes: collective and noncol- vide users the ability to deliver the high-level access in- lective data modes.
Similar to MPI-IO, the collective functions cific platform and expected low-level access pattern, such as must be called by all the processes in the communicator as- enabling or disabling certain algorithms or adjusting inter- sociated to the opened netCDF file, while the noncollec- nal buffer sizes and policies. These are passed through the tive functions do not have this constraint. PnetCDF operations provides the underlying PnetCDF implementa- hints can be used to describe expected access patterns at tion an opportunity to further optimize access to the netCDF the netCDF level of abstraction, in terms of variables and file.
These hints can be interpreted by the PnetCDF im- these optimizations are performed without further interven- plementation and either used internally or converted into ap- tion by the application programmer and have been proven to propriate MPI-IO hints.
For example, given a hint indicat- provide dramatic performance improvement in multidimen- ing that only a certain small set of variables were going to be sional dataset access [17]. For applica- tions that pull a small amount of data from a large number 4. Parallel Implementation of separate netCDF files, this type of optimization could be a big win, but is only possible with this additional informa- Based on our parallel interface design, we provide an im- tion.
We first describe our implementa- functions as the original ones. These functions are also tion strategies for dataset functions, define mode functions, made collective to guarantee consistency of dataset struc- attribute functions, and inquiry functions that access the ture among the participating processes in the same MPI netCDF file header. Example of using PnetCDF. Typically header and start[], count[], stride[], imap[], mpi datatype ar- there are 4 main steps: 1.
In such cases, more optimization in- 4. Since they are all in-memory operations not in- fers. They are made collective, but this feature does plement our user hints in PnetCDF as extensions to the MPI not necessarily imply interprocess synchronization. Thus experienced users have the op- all processes match. We build these Our design and implementation of PnetCDF offers a functions over MPI-IO so that they have better portability number of advantages, as compared with related work, such and provide more optimization opportunities.
The basic as HDF5. Since all define mode and attribute tion. The netCDF file chooses linear data layout, in which functions are collective and require all processes in the com- the data arrays are either stored in contiguous space and municator to provide the same arguments when adding, re- in a predefined order or interleaved in a regular pattern. Since it lays out the data in a linear buffer, metadata file view, MPI Datatype, etc.
Thus, there is very little overhead, and the area is performed in parallel. Nor does On the other hand, parallel HDF5 uses a tree-like file netCDF support data compression within its file format al- structure that is similar to the UNIX file system: the data is though compressed writes must be serialized in HDF5, lim- irregularly laid out using super block, header blocks, data iting their usefulness. Fortunately, these features can all blocks, extended header blocks, and extended data blocks.
However, this irreg- ular layout pattern can make it difficult to pass user access 5. Instead, parallel HDF5 uses dataspace and hyperslabs To evaluate the performance and scalability of our to define the data organization, map and transfer data be- PnetCDF with that of serial netCDF, we compared the two tween memory space and the file space, and does buffer with a synthetic benchmark.
Each com- formation for direct access of each data array, and each ar- pute node has 4 GB of memory shared among its eight ray is associated with a predefined, numerical ID that can MHz Power3 processors. All the compute nodes are inter- be efficiently inquired when it is needed to access the array.
Once the defini- tem with 16 Power3 processors per node. Results in its permanent ID and accessed at any time by any process, these tests are the average of 20 runs. Version 0. No hints were passed persed in separate header blocks for each object, and, in to PnetCDF. HDF5 runs were executed both without hints order to operate on an object, it has to iterate through the and with sieve buf size and alignment hints; these entire namespace to get the header information of that ob- have been helpful in some previous runs on this system but ject before accessing it.
This kind of access method may be did not appear to be useful in this particular case. Scalability Analysis tive operation, which forces all participating processes to communicate when accessing a single object, not to men- We wrote a test code in C to evaluate the perfor- tion the cost of file access to locate and fetch the header mance of the current implementation of PnetCDF.
This information of that object. Further, HDF5 metadata is up- test code was originally developed in Fortran by Woo-sun dated during data writes in some cases. Thus additional Yang and Chris Ding at Lawrence Berkeley National Lab- synchronization is necessary at write time in order to main- oratory. It solves the compressible Euler equations on a block- ZY Partition ZX Partition YX Partition structured adaptive mesh and incorporates the necessary physics to describe the environment, including the equation of state, reaction network, and diffusion.
The code scales well up to thousands of processors, has been ported Figure 5. Various 3-D array partitions on 8 pro- to over a dozen platforms, and has been used for numerous cessors production runs. It recreates the primary data structures in the FLASH code and produces a checkpoint file, a plotfile with is the least significant dimension. For com- guard cells that are left out of the data written to file.
In the parison purpose, we prepared the same test using the origi- simulation 80 of these blocks are held by each processor. Generally, each of the files, the benchmark writes the related arrays PnetCDF performance scales with the number of processes. As expected, PnetCDF outper- point files are the largest of the three output data sets.
In the forms the original serial netCDF as the number of processes 8x8x8 case each processor outputs approximately 8 MB and increases. Testing on alter- cursive handling of the hyperslab used for parallel access, native platforms and with additional benchmarks is also on- which makes the packing of the hyperslabs into contiguous going.
In particular we are interested in seeing how read buffers take a relatively long time. Conclusion and Future Work performance is more comparable. We also need to develop a mechanism for matching the file organization to access patterns, and we need to develop cross-file optimizations In this work we extend the serial netCDF interface to for addressing common data access patterns.
0コメント