Data from the UK storms project

As part of a study of severe storms across the British Isles, some observations from the Daily Weather Reports were (commercially) transcribed in the early 2000’s. This transcription project covered the period 1919-1960, but only transcribed part of the observations: only pressure and pressure tendencies were transcribed, and only from stations in the British Isles reporting four times a day. Also, the ‘late arrivals and corrections’ section of the reports was not included.

The raw transcription output has not survived, but the transcriptions were processed into station files containing up to eight mean-sea-level-pressure observations a day (four observations and another four constructed from the 3-hour pressure tendencies).

Aberdeen                 1920  4  1 1010.3 1008.3 1008.4 1007.4 9999.9 9999.9 1003.3 1005.3
Aberdeen                 1920  4  2 1005.1 1009.1 1008.0 1012.0 9999.9 9999.9 1012.7 1013.7
Aberdeen                 1920  4  3 1013.4 1014.4 1014.5 1014.5 9999.9 9999.9 1013.6 1012.6
Aberdeen                 1920  4  4 1012.7 1011.7 9999.9 9999.9 9999.9 9999.9 1010.7 1011.7
Aberdeen                 1920  4  5 1011.9 1011.9 1012.5 1012.5 9999.9 9999.9 1012.2 1012.2
Aberdeen                 1920  4  6 1011.8 1010.8 1010.1 1009.1 9999.9 9999.9 1004.5  999.5
Aberdeen                 1920  4  7  995.5  997.5  995.6 1001.6 9999.9 9999.9 1004.2 1006.2
Aberdeen                 1920  4  8 1006.1 1007.1 1007.4 1006.4 9999.9 9999.9 1003.9 1002.9
Aberdeen                 1920  4  9 1002.3 1002.3 1001.6 1004.6 9999.9 9999.9 1004.1 1005.1
Aberdeen                 1920  4 10 1004.9 1006.9 1006.9 1007.9 9999.9 9999.9 1006.5 1006.5
Aberdeen                 1920  4 11 1005.7 1004.7 1004.2 1003.2 9999.9 9999.9  999.7  998.7
Aberdeen                 1920  4 12  998.9  995.9  995.3  993.3 9999.9 9999.9  989.2  989.2

One complication with the DWR is that exact observation locations are not given, only approximate locations (e.g. ‘Aberdeen’). For MSLP, observations exact locations are not vital, so a nominal location has been assigned for each station.

Another difficulty with the station files is that they don’t give exact times for the observations (they weren’t needed by the original study). The eight values in the station files for each day are the values on the DWR page for that day, and both the format of the page and the times of observations changed over the period 1919-1960.

To make the data easily useable, the station files have been converted into the same monthly data format used for weatherrescue.org data; assigning latitudes, longitudes, and times in the process.

#!/usr/bin/env python

# Convert the digitised data from Lisa's files to the format Ed is
#  using.

import os
import os.path
import pandas
import glob
import datetime
import sys

# Get script directory
sd=os.path.dirname(os.path.abspath(__file__))

# Load the Station names and locations
md=pandas.read_csv("%s/../metadata/names.csv" % sd,
                   header=None)


# The data in Lisa's files is stored as 8 values/day, but the mapping of
#  the values onto GMT times is peculiar and time varying. Return the
#  time offset (hours) for the given date and value, from the date given
#  in the file (assuming times are 0,3,..,21)
def get_time_offset(date,hours):
   # Before April 1922 - at 1,7,13,18 and back by 1/2 day
    offsets=(-14,-14,-15,-15,-14,-14,-14,-14)
   # At beginning of April 1921, move forward by 1 day
    if date.year>1944 or (date.year==1944 and date.month>7):
        offsets=(10,10,9,9,10,10,10,10)
    # At beginning of August 1944, switched to 0,6,12,18
    if date.year>1944 or (date.year==1944 and date.month>7):
        offsets=(9,9,9,9,9,9,9,9)
    return offsets[hours/3]
   
# Convert Lisa's data
Lf=glob.glob("%s/../raw.data/*.dat" % sd)
for stfile in Lf:
    std=pandas.read_fwf(stfile,
                        widths=(24,5,3,3,7,7,7,7,7,7,7,7),
                        header=None)
    # Append each line to the new ouput file
    #  slow - but so what.
    for ln in range(0,len(std.iloc[:,0])):
        Of=("%s/../../data_from_Lisa/%04d/%02d/prmsl.txt" %
                          (sd,std.iloc[ln,1],std.iloc[ln,2]))
        mdl=md[md.iloc[:,0].str.lower()==std.iloc[ln,0].lower()]
        if mdl.empty:
            raise StandardError("No station %s in metadata" % 
                                                std.iloc[ln,0])
        LastF=''
        Of=None
        opfile=None
        ob_tbase=datetime.datetime(std.iloc[ln,1],
                                   std.iloc[ln,2],
                                   std.iloc[ln,3],0)
        for hri in range(1,9):
            # Skip the missing data
            if std.iloc[ln,hri+3]>9000: continue
            # Ob time and date
            ob_time=ob_tbase+datetime.timedelta(hours=
               (hri-1)*3 + get_time_offset(ob_tbase,(hri-1)*3))
            # Output value in Ed's format
            Of=("%s/../../data_from_Lisa/%04d/%02d/prmsl.txt" %
                             (sd,ob_time.year,ob_time.month))
            if Of!=LastF:
                dn=os.path.dirname(Of)
                if not os.path.isdir(dn):
                    os.makedirs(dn)
                if opfile is not None:
                    opfile.close()
                    if os.path.getsize(LastF)==0:
                        os.remove(Of)
                        dn=os.path.dirname(LastF)
                        if not os.listdir(dn):
                            os.rmdir(dn)
                opfile=open(Of, "a")
                LastF=Of

            opfile.write(("%04d %02d %02d %02d %02d %6.2f "+
                          "%7.2f %6.1f %16s\n") %
                         (ob_time.year,ob_time.month,
                          ob_time.day,ob_time.hour,
                          ob_time.minute,
                          mdl.iloc[0,2],mdl.iloc[0,3], #latlon
                          std.iloc[ln,hri+3],          # ob value
                          mdl.iloc[0,1]))              # name