Basic reading a
variable from one
file
and saving it to another NetCDF file, also concatenating
files.
We will
learn all about reading basic column ASCII data, defining a
MV array from this data, setting it's dimension and writing it to a
NetCDF file
How to run the tutorial:Start python or cdat by typing 'python'
or 'cdat'
at a shell prompt, then copy and paste the text in blue,
thetext in red is
what you will see as output.
You
could also loop through the lines
one by one, specially if the file is too big to read the whole file at
once. You can use
f.readline() to read one line at a time.
2. Define data as a
list
data=[]#will hold
the list of data values
3.
Go through lines one by one
and try to convert third column to a float value and then append to the
data list
for line in lines: # Splits the
line with space or tabs or return
sp=line.split()
# Now try to convert the 3rd column to a number
try:
val=float(sp[2]) # third column
data.append(val) # append to data
except: # if not a number,
then do nothing
pass
If
there is no
third clumn in the line that was read, then nothing will be done
(pass), also if there is a third column but the data cannot be
converted to a floating point data, the conversion will fail and you
will fall back to the except: branch from the try: block, and again
nothing will be done ('pass' executed). Hence the data will contain
only the data points we are interested in.
4. Convert
data list to an array
We need to convert the data to
a 'MV'
array, so that we can attach some attributes to it, like units and
names, so as to create a descent NetCDF variable.
data = MV.array(data)
a) give it
an id == name and units
Let's give the variable
attributes of 'id' and 'units'
data.id
= 'data'
data.units = 'cm' # for example 'cm'
b)
create a dimention
Now we will create an axis,
that will simply be a 'count'
variable, that is unitless.
Let'ssee how the variable 'data'
looks like now: data.info()
*** Description of Slab data ***
id: data
shape: (3,)
filename:
missing_value: None
comments:
grid_name: N/A
grid_type: N/A
time_statistic:
long_name:
units: cm
No grid present.
** Dimension 1 **
id: count
units:
Length: 3
First: 1
Last: 3
Python id: 0xb7b828ecL
*** End of description for data ***
5.
Write this data to a
NetCDF file
We will write the data to a
NetCDF file that will be created in the current directory
g =
cdms.open('output.nc','w')
g.write(data)
g.close()
Concatenating NetcDF Files
To concatenate files, you need to first import the cdms CDAT module,
then define a list that will hold all the files you want to concatenate
import cdms,sys
path=sys.prefix+'/sample_data/'
#
define a list that holds all the files to concatenate files
= [ path+'u_2000.nc', path+'u_2001.nc', path+'u_2002.nc'
]
#
Now let's open the output file for writing (and possibly erase existing
file) fout=cdms.open('u_concatenated.nc','w')
Now
in the for
loop, we will go into each of the files and read the variable
we
want to concatenate, say it is a horizonatal velocity 'u', we are going
to read it from each file and then immediately write to our output file
# for each file, read
'u' and immediately write to the output file
for file in files:
f=cdms.open(file)
u=f('u')
f.close()
fout.write(u)
# close
the output file, it contains the concatenated variable 'u' fout.close()
Virtually concatenating NetcDF Files
You can also create a dataset that will be a collection of the files
that you want to concatenate. In order to define a dataset run the 'cdscan' script
included in the CDAT installation. We are assuming your python resides
in /usr/local/cdat.
This will create an XML file u.xml that
will have a description of all the data files (u_*.nc) belonging to the
dataset u.xml. In CDAT you can now open the 'u.xml' file, just like you
would open a normal file.
'u'
contains the data from all the files that belong to the 'u.xml' dataset.
You can
learn more about other ways of querying the files, variables,
dimensions
and axes by scanning through the Quick
Reference Guide to CDMS or by following the CDMS
Basics tutorial.
To
learn more about the file manipulation, see the Tips and Tricks page FileI/O Tips and Tricks
Advanced
Tech Tips
Reading
ASCII data, creating axes and saving all to a NetCDF file.
Function to read all lines in the ASCII file,
if you think the file is too big to read as a whole, you should change
it to read chunks of the data.
Import all needed modules first.
Function
to read all floating
value data by splitting a line (with blanks, return or
line end) and converting all the discovered data into floating
point
numbers until the error in conversion occurs or there is no more data.
#-----------------------------------------
# define numerical values
until error in conversion def
read_float_values(ln):
values=[]
try:
while 1:
l=ln[0]
sp=l.split()
try:
for v in sp:
values.append(float(v))
l = ln.pop(0) #
remove the line
except:
return values
except:
return values
return values #-----------------------------------------
Read the variables data
consisting of a
description line that has
a variable's name, and other annotations, followed by the
floating
point data values, that will be read with the read_float_values()
function. The variable annotation is given in the following way: T_skin[Surf. skin
temperature](C)
#-----------------------------------------
def
read_float_variable(ln,vars,var_names):
try:
# extract variable name, long_name and
units from first line
line = ln.pop(0)
line =line.strip()
v_name=line.split('[')[0].strip()
v_long_name=line.split('[')[1].split(']')[0]
v_units=line.split('(')
v_units=v_units[len(v_units)-1][:-1]
if v_units == ')' : v_units = ''
# read all numercial values as floats,
put into the list 'Values' values =
read_float_values(ln)
vars[v_name]={'units':v_units,'long name':v_long_name ,'value':values,
'dim':len(values)}
var_names.append(v_name)
except:
print '***Error in read_float_variable(ln,vars,var_names)'
return 1
return 0
Define a dictionary that will
hold all the variables, their names, and all other
annotations together with the data in dictionary structure keyed on the
variables name. Assume there is a title on the first line, second line
has number of variable's ('nvars') and a third number of samples of
each
variable ('nt'). Then there is a data consisting of a
description line that has a variable's name, and other
annotations, followed by the floating point data values
#-----------------------------------------
def get_variables(file):
try:
print
'---------------------------------------------------------'
print 'reading surface file:
"',file,'"'
print '' #
read all the lines
err, ln = read_all_lines(file)
if (err) : return 1,{},{}
#
define dictionary "vars"
vars ={}
vars['filename'] = file
# read header, assume first
line is a titile
vars['title'] = ln[0] #
second line has the number of variables to read and number of them in a
second line
nvars = int(float(ln[1]))
vars['nvars'] = nvars
vars['nt'] = int(float(ln[2]))
#
kill header from the data to be read
ln=ln[3:]
#
define ordered list of variables to be
written to netCDF file
var1D_names=[]
for i
in range(nvars):
err = read_float_variable(ln,vars,var1D_names) var_names={}
var_names['var1D']=var1D_names
print '-------- finished reading ---------'
except:
print
'***Error in get_varibales(file)'
print
' returning...'
return
1,{},{}
Write all the
data to the NetCDF file, assume that there were following variables
read: 'year','month','day','hour','minute'. From this data we
are going to construct the time axis using a cdtime.comptime routine
time_vars =['time','year','month','day','hour','minute']
year = var['year'].get('value')
day = var['day'].get('value')
month = var['month'].get('value')
hours = var['hour'].get('value')
minutes = var['minute'].get('value')
# create the time axis
t0=cdtime.comptime(int(year[0]),int(month[0]),int(day[0]))
time_units='days since '+str(int(year[0])-1)+'-12-31'
base_t0 =
cdtime.comptime(int(year[0]),int(month[0]),int(day[0]))
print 'base_t0', base_t0
time_units_offset='seconds since '+str(t0)
taxis=[]
taxis_offset=[]
for i in range(len(year)): #
time_units
t=cdtime.comptime(int(year[i]),int(month[i]),int(day[i]),int(hours[i]),int(minutes[i]))
taxis.append(t.torel(time_units).value)
taxis_offset.append(t.torel(time_units_offset).value)
time=cdms.createAxis(taxis,id='time')
time.units=time_units
time.designateTime()
time.long_name='calendar day fraction of the year
'+str(int(year[0]))
time_offset=MV.array(taxis_offset)
time_offset.id='time_offset'
time_offset.long_name='Time offset from base_time'
time_offset.units=time_units_offset
time_offset.setAxis(0,time)
del(time_offset.name)
vbaseTime=MV.array(base_time.value)
vbaseTime.id='base_time'
vbaseTime.long_name='Base time in Epoch'
vbaseTime.units=base_time.units
vbaseTime.string=str(t0)+' GMT'
del(vbaseTime.name)
# write
year,month,day,hour,minute variables
for var_name in time_vars:
if var_name == 'time': continue
v = MV.array(var[var_name].get('value'))
v=v.astype('i')
dim = var[var_name].get('dim')
if dim>1 :
v.setAxis(0,time)
v.id = var_name
v.units = var[var_name].get('units')
# define
level dimension variable = axis
# set to notheing if not in the set of variables
if (len(var1D) < 7) :
lev=cdms.createAxis([],id='lev')
else:
pressure = var['lev'].get('value') #print
pressure
lev=cdms.createAxis(pressure,id='lev')
lev.units=var['lev'].get('units')
lev.long_name='preassure levels'
# non-time variables
for var_name in
var1D:
if var_name in
time_vars : continue
if (var_name ==
'lev'): continue
dim =
var[var_name].get('dim')
value =
var[var_name].get('value')
v = MV.array(value)
v=v.astype('f')
v.id = var_name
v.units =
var[var_name].get('units')
v.long_name =
var[var_name].get('long name')
del(v.name)
v.setMissing(-9999.)
fout.write(v)
# define some global
attributes
fout.title=var['title']
from time import asctime
current_date = asctime()
fout.date_created=current_date
fout.software='python with CDAT program, R.B.McCoy, LLNL,
renata@llnl.gov'
fout.program_name=cmd[0]
params = cmd[1]+' '+cmd[2]
fout.program_parameters=params
return 0,time, lev #-----------------------------------------
#----------------------------------------- def
main(argv):
err, file = get_filename(argv)
if (err) : return
print 'file = ',file # save
command that run that script for later reference
cmd = [argv[0],file]
# call a function
that reads a file and returns variables names and a structure with all
variables info
err, names, vars = get_variables(file)
if (err ) : return
# write all to the NetCDF file
err = write_to_netcdf(names, vars, cmd)
if(err) : return