2.2.2 Waveform data and formats

SEISAN works with various waveform formats including SEISAN, GSE2.0, SEED/MINISEED, GURALP GCF(single channel files), Helmberger format and SAC binary and SAC ASCII. The SEISAN format is described in Appendix B, while for a format description of GSE and SAC the user is referred to GSETT-3 (1997) and Goldstein (1999), respectively. The SEED format is described in IRIS Consortium (1993). The GSE reading routines are based on the codeco routines written by Urs Kradolfer, Klaus Stammler and Karl Koch. The routines read GSE2.0 only, not GSE2.1. The format description of GSE2.0 is given in: INF/structure-of-seisan.tex. The different formats can be used in parallel by several programs. With MULPLT for example it is possible to plot data in the four formats at the same time. Other formats can be added by adding reading routines and adding the respective calls to LIB/wave.for . Note that SAC binary files can also be used on Windows from SEISAN version 8.2. To use other formats, a conversion program must be used first, see section 19.

Numbers smaller than 1.0 and real numbers

In general, binary waveform data is written as integers. However sometimes data can be smaller than 1.0. This can only be handled by SEISAN and SAC formats. In the SEISAN format, small numbers can be written by using the scaling constant (see SEISAN waveform format) so the small numbers are written as integers and the scaling constant is then used when reading the data. Most programs in SEISAN can read this data (like MULPLT) but few can write it. Currently WAVETOOL and SEISEI can write small numbers in SEISAN format. SAC small number data can also be read by most programs but probably only written by WAVETOOL. Several ASCII formats use real numbers so any size numbers can be written and read. The most important is the Helmberger format written by the the OutW command in MULPLT. Small numbers can be generated by SEISAN if date correct for a filter or corrected for reponse is written out.

In general it is recommended to keep the waveform data in one format only, mainly for simplicity and maintenance reasons. There may be different arguments for or against one or the other format depending on the user's preferences and requirements. SAC and GSE are widely used formats and therefore may be attractive. SEISAN is a multi-trace binary format with direct read access to individual traces. SAC is a single trace binary or ASCII format with a large number of header parameters. The SAC format is widely used in research-oriented programs. GSE is a multi-trace ASCII waveform format that includes various sub-formats. It is widely used for data exchange. Although the GSE format can keep any number of traces, it is recommended to include no more than 3 traces in a single file depending on the number of samples, since when reading a particular trace, the whole file may have to be read.

The MINISEED format is probably the best option since most data centers use it and almost all intrument manufactures will provide it. SEISAN cannot read SEED files using all options possible in SEED, but data from the largest data centers as well as many observatories have been used for testing. With respect to MINISEED, there are less problems since MINISEED is simpler than SEED. SEISAN can also write MININSEED (program WAVETOOL), but cannot write SEED (unless GSE2SEED is used). The WAV directory contains files with digital waveform data. The directory normally has no subdirectories or any other organization. However, in case of large databases, WAV can be subdivided, see below. In addition any directory can contain waveform data, it has to be specified in SEISAN.DEF (section 3.13).

The amount of data that can be stored is only limited by the disk size. The analysis system will always look in WAV for particular files if they are not in the user's own directory. Waveform files will automatically be transferred to WAV on initial registration into the database (see MULPLT). Registration is the process of automatically creating an S-file in the database with the name of the waveform file and header information. Phase pickings are done later. See section 8.

There is normally no requirement for particular filenames for the waveform files in WAV or elsewhere, however many programs will make file names like:

yyyy-mm-dd-hhmm-ssT.NETWO_nnn e.g. 1995-01-23-1230-20M.BERGE_013

With the abbreviations yyyy: year, mm: month, dd: day, hh: hour, mm: minute, ss: second, T: file type indicator (normally M or S), NETWO: maximum 5 letter network code and nnn: number of channels.

Recommended file type indicators are:
S: Standard SEISAN
R: Resampled
A: Appended
M: Miniseed/SEED

WAV database: In case a large number of waveform data is stored, it might be an advantage to also split up the WAV directory in subdirectories. This is done in the same way as in the REA directory, e.g. waveform files for BER from July 1994 would be found in WAV/BER__/1994/07. Programs that use waveform files will automatically search, in order, the current directory, TMP, WAV and the monthly WAV directory (TMP. see compression below). However, it is a requirement for all programs running outside EEV that the waveform data is in the default data base since only that one is searced. When storing in the WAV database, it is a requirement that the waveform names by default start with either yymm (like 9902) ,yyyymmdd (like 19990101) or yyyy-mm (like 1999-02). If this is not the case, the position in the file name of year (including century) and month must be specified in SEISAN.DEF, see parameter CONT_YEAR_MONTH_POSTION_FILE. In this case all the waveform files in the WAV structure must have the sane type file name.

The SEISAN binary waveform format is explained in Appendix B. The files are written and read with the same Fortran statements on all platforms, however the internal structure and byte order are different. As of SEISAN version 5.1, files written on either machine can be read on the other and there is no need for any conversion when the binary waveform files are moved between Sun, Linux, MaxOSX and Windows.

Compression of waveform data

Waveform files can be stored in compressed format. The compression must be done by the user. Programs that access the compressed waveform files copy the file to the TMP directory, and uncompress there. The uncompressed file remains afterwards and will be found the next time one of the programs is looking for the same waveform file. The content of the TMP directory has to be deleted manually. On Unix, you may automatically delete the content of the TMP directory by a cronjob, see manual pages on crontab. On Unix the compression formats supported include gzip, compress, bzip2 and zip. On Windows, only gzip is supported (gz files). In order for it to work, the command gunzip must working. This can be done with gzip which also can decompress. Fist install gzip if not there. Then create a bat file to be in you part (like in COM) with name gunzip.bat and content

gzip -d

With the introduction of SEED format, there is less need for external compression since the SEED data usually is compressed and therefore decompressed on the fly when read. Also now disk is usully not a problem when dealing with individual events.

Component codes

The SEISAN waveform format until version 8.2 has used 4 characters for the component code. SEISAN now follow the SEED component code (see Appendix A in the SEED manual: http://www.fdsn.org/pdf/SEEDManual_V2.4.pdf). In the format before 8.2, the first character indicates the type of sensor, for example `B' for broadband, `S' for short-period or `L' for long $-$ period. For acceleration data the first character has to be `A' because SEISAN assumes that the corresponding response has been given as acceleration response. The fourth character has to give the channel orientation, `Z' is used for vertical, `E' for east-west and `N' for north-south. Other orientation of the horizontal components is possible in GSE, SEED and SEISAN are not understood by SEISAN. If data are rotated, `T' is used for transverse and `R' for radial. The second and third characters can be chosen by the user. From SEISAN version 8.2, only 3 characters are used, the first 2 and the last. These 3 characters are then defined according to the SEED standard. SEED location codes and network codes are now also stored in the SEISAN format and are displayed when plotting the traces with MULPLT. From SEISAN version 12, several program will use location and network codes, particualrly if the new format, Nordic2, used. The component code is part of the response filename and is used to find the response corresponding to a given station and component. The network code is not part of the response files except for the SEED response files so if using a SEED response file, network and location codes must match. Program WAVFIX can be used to change station and/or component codes for most formats (but not yet network and location codes.

The Nordic format only has space for two characters for the component code. The definition in SEISAN is that these are the first and fourth character of the waveform component code. This means that the relation between the component code in the Nordic file and the waveform data is non-unique. In the the new format, Nordic2, there is room for 3 char componenet code and location and netwotk codes. D The GSE and SEED waveform formats have three characters for the channel code, see GSETT-3 (1997) and IRIS Consortium (1993) for the detailed definition of the component codes. SEISAN, when reading waveform data in either GSE or SEED format internally keeps the first two characters and moves the third to fourth, so for example `BHZ' becomes `BH Z', however the user will only see the name as BHZ in many programs. Data files in SEED also have a location code, which allows to distinguish for example between two `BHZ' components (for example a 30 second and 120 second sensor with the same sampling rate and high gain) at the same site. Z. When converting between SEISAN and SEED/MiniSEED, station, network and location codes are preserved while SAC and GSE only partly can store this information. SAC has more than four characters for the component code and sacsei.def parameter file in DAT has to be used to define the conversion. However, normally SAC data will have three character component codes as well. Conversion of component codes from SEISAN to SAC is also defined in sacsei.def.

When converting between SEISAN and other waveform formats, component conversion is defined in the respective definition files, see section on conversion programs.