DataSpaces 2.x documention:
In progress, to be hosted at dspaces.readthedocs.io
DataSpaces 1.x documentation (deprecated):
Overview
DataSpaces is a programming system targeted at current large-scale systems and designed to support dynamic interaction and coordination between scientific applications. DataSpaces essentially provides a semantically specialized shared-space abstraction using a set of staging nodes. This abstraction derives from the tuple-space model and can be associatively accessed by the interacting applications of a simulation workflow. DataSpaces also provides services including distributed in-memory associative object store, scalable messaging, as well as runtime mapping and scheduling of online data analysis operations.
Download and Install DataSpaces
DataSpaces can be downloaded from this page. Currently, DataSpaces supports the following platforms: Cray Gemini, IBM DCMF, IBM PAMI and InfiniBand. You can follow these examples to configure DataSpaces on supported architecture. You will need to customize the configure command to your specific system configurations and programming environments
Untar source pre
$ tar zxf dataspaces-1.3.0.tar.gz $ ./autogen.sh
Cray XE and XK series
1. Setting up the protection domain to use.
Protection domain is a unique communication domain that all the applications in a job can connect to. While installing DataSpaces, a usable protection domain has to be created and used. After a protection domain has been created, a unique (ptag, cookie) pair will be given to define this unique domain. You may either use an existing system protection domain or create a user-defined one for your job.
To check exising usable protection domain information, please run:
$ apstat -P
You may see the following information.
PDomainID TYPE Uid Ptag Cookie ADIOS system 0 250 0x5420000 CCI system 0 251 0x5430000
To create a user-defined protection domain, please run:
$ apmgr pdomain -c USER_DEFINED_DOMAIN_NAME
You will see the following information while check usable protection domain information.
PDomainID TYPE Uid Ptag Cookie ADIOS system 0 250 0x5420000 CCI system 0 251 0x5430000 USER_DEFINED_DOMAIN_NAME user 6444 253 0xec6a0000
To release a user-defined protection domain, please run:
$ apmgr pdomain -r USER_DEFINED_DOMAIN_NAME
*For more help, please look at man apmgr.
2. Setting up DataSpaces by using <ptag, cookie="””">pair.</ptag,>
There are two ways of setting <ptag, cookie="””">pair of a protection domain to DataSpaces. (a) through environment variables; (b) through DataSpaces configuration</ptag,>
Through environment variable: please set environment variables DSPACES_GNI_PTAG and DSPACES_GNI_COOKIE in your job script by using corresponding values of <ptag, cookie="””">pair. For example, if you are using exising system protection domain ADIOS, then you should set:</ptag,>
export DSPACES_GNI_PTAG=250 export DSPACES_GNI_COOKIE=0x5420000
Through DataSpaces configuration: please configure DataSpaces by using the following options
--with-gni-ptag=ptag decimal value --with-gni-ptag=cookie hexa value
3. Configure DataSpaces.
$ ./configure CC=cc FC=ftn
Infiniband cluster
$ ./configure CC=mpicc FC=mpif90
IBM BlueGene/P
$ ./configure CC=mpixlc FC=mpixlf90 CFLAGS="-g -O0 -qarch=450 -qtune=450" --with-dcmf="/bgsys/drivers/ppcfloor"
IBM BlueGene/Q
$ ./configure CC=/bgsys/drivers/ppcfloor/comm/xl/bin/mpixlc_r FC=/bgsys/drivers/ppcfloor/comm/xl/bin/mpixlf90 CFLAGS="-O0 -g -qlanglvl=extc99 -O3 -qarch=qp -qtune=qp -qfullpath"
If you have problems configuring and building DataSpaces, please refer to this FAQ or contact us directly.
Building Application with DataSpaces
Let us look at an example of a simple but complete DataSpaces application workflow. In this workflow, there are 2 applications. A writer which puts data and a reader that gets data.
- dataspaces_sever.c: DataSpaces staging server
- put.c: Writer application
- get.c: Reader application
Look inside put.c
/* put.c : Example 1: DataSpaces put tutorial * This example will show you the simplest way * to put a 1D array of 3 elements into the DataSpace. * */ #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include "dataspaces.h" #include "mpi.h" int main(int argc, char **argv) { int err; int nprocs, rank; MPI_Comm gcomm; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Barrier(MPI_COMM_WORLD); gcomm = MPI_COMM_WORLD; // Initalize DataSpaces // # of Peers, Application ID, ptr MPI comm, additional parameters // # Peers: Number of connecting clients to the DS server // Application ID: Unique idenitifier (integer) for application // Pointer to the MPI Communicator, allows DS Layer to use MPI barrier func // Addt'l parameters: Placeholder for future arguments, currently NULL. dspaces_init(1, 1, &gcomm, NULL); int timestep=0; while(timestep<10){ timestep++; sleep(2); // DataSpaces: Lock Mechanism // Usage: Prevent other process from modifying // data at the same time as ours dspaces_lock_on_write("my_test_lock", &gcomm); //Name the Data that will be writen char var_name[128]; sprintf(var_name, "ex1_sample_data"); // Create integer array, size 3 int *data = malloc(3*sizeof(int)); // Initialize Random Number Generator srand(time(NULL)); // Populate data array with random values from 0 to 99 data[0] = rand()%100; data[1] = rand()%100; data[2] = rand()%100; printf("Timestep %d: put data %d %d %d\n", timestep, data[0], data[1], data[2]); // ndim: Dimensions for application data domain // In this case, our data array is 1 dimensional int ndim = 1; // Prepare LOWER and UPPER bound dimensions // In this example, we will put all data into a // small box at the origin upper bound = lower bound = (0,0,0) // In further examples, we will expand this concept. uint64_t lb[3] = {0}, ub[3] = {0}; // DataSpaces: Put data array into the space // Usage: dspaces_put(Name of variable, version num, // size (in bytes of each element), dimensions for bounding box, // lower bound coordinates, upper bound coordinates, // ptr to data buffer dspaces_put(var_name, timestep, 3*sizeof(int), ndim, lb, ub, data); // DataSpaces: Release our lock on the data dspaces_unlock_on_write("my_test_lock", &gcomm); } // DataSpaces: Finalize and clean up DS process dspaces_finalize(); MPI_Barrier(gcomm); MPI_Finalize; return 0; }
Look inside get.c
/* get.c : Example 1: DataSpaces get tutorial * This example will show you the simplest way * to get a 1D array of 3 elements out of the DataSpace * and store it in a local variable. * */ #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include "dataspaces.h" #include "mpi.h" int main(int argc, char **argv) { int err; int nprocs, rank; MPI_Comm gcomm; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Barrier(MPI_COMM_WORLD); gcomm = MPI_COMM_WORLD; // DataSpaces: Initalize and identify application // Usage: dspaces_init(num_peers, appid, Ptr to MPI comm, parameters) // Note: appid for get.c is 2 [for put.c, it was 1] dspaces_init(1, 2, &gcomm, NULL); int timestep=0; while(timestep<10){ timestep++; // DataSpaces: Read-Lock Mechanism // Usage: Prevent other processies from changing the // data while we are working with it dspaces_lock_on_read("my_test_lock", &gcomm); // Name our data. char var_name[128]; sprintf(var_name, "ex1_sample_data"); // Create integer array, size 3 // We will store the data we get out of the DataSpace // in this array. int *data = malloc(3*sizeof(int)); // Define the dimensionality of the data to be received int ndim = 1; // Prepare LOWER and UPPER bound dimensions uint64_t lb[3] = {0}, ub[3] = {0}; // DataSpaces: Get data array from the space // Usage: dspaces_get(Name of variable, version num, // size (in bytes of each element), dimensions for bounding box, // lower bound coordinates, upper bound coordinates, // ptr to data buffer dspaces_get(var_name, timestep, 3*sizeof(int), ndim, lb, ub, data); printf("Timestep %d: get data %d %d %d\n", timestep, data[0], data[1], data[2]); // DataSpaces: Release our lock on the data dspaces_unlock_on_read("my_test_lock", &gcomm); } // DataSpaces: Finalize and clean up DS process dspaces_finalize(); MPI_Barrier(gcomm); MPI_Finalize; return 0; }
Before running DataSpaces staging servers, the user need to create a configuration file dataspaces.conf. Here is an example of a dataspaces.conf file. In this example, we specify a 3D domain with xyz size of 128x128x128. Ten versions of any variables is kept in the staging severs at anytime. Also, only one application can read the data at a time.
## Config file for DataSpaces ndim = 3 dims = 128,128,128 # max_versions = 10 max_readers = 1 # Lock type: 1 - generic, 2 - custom lock_type = 2 # Hash function used to map the indexing of data domain to servers: # 1 - Use Hilbert space-filling curve to linearize the data domain, decompose and # map the linearized 1D domain to the servers. # 2 - Decompose the data domain into 2^ceil(log(n)) regions where n is the number of # servers, and map them to the servers. hash_version = 1
There are two types of lock in DataSpaces.
- Generic lock: With Generic lock, Producer will not wait for the data to be consumed by Consumer. It will progress at it own pace. Consumer may try to retrive data before the data is produced. It is user’s reponsibility to check whether the data is ready before reading it.
- Custom lock: Custom lock is a Producer/Consumer type of lock. Consumer will block until Producer put or produce data. Producer will block until Consumber get or consume data. The process is repeated for the next time step.
The DataSpaces server executable has three command line options:
--server, -s number of server instance/staging nodes --cnodes, -c number of compute nodes --conf, -f path to the configuration file [Optional]
The following command runs 1 DataSpaces servers and 2 clients
$ ./dataspaces_server -s 1 -c 2 $ ./put $ ./get
You should see the following output
Output
Please visit our FAQ page if you experience problems compiling and running DataSpaces.