Example of WDf usage with multi data segments and multi-processing¶

author: Elena Cuoco

Cuoco et al., Wavelet-Based Classification of Transient Signals for Gravitational Wave Detectors DO - 10.23919/EUSIPCO.2018.8553393

Please note that many packages, as graphic one or logging are not part of WDF docker, but you can install them locally or use your preferred ones

[1]:

# import libraries
import time
import os
from pytsa.tsa import *
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import *
from wdf.processes.wdfUnitDSWorker import *
from wdf.processes.wdfUnitWorker import *
import logging
import coloredlogs
#select level of logging
coloredlogs.install(isatty=True)

logging.basicConfig(level=logging.DEBUG)

[2]:

new_json_config_file = True    # set to True if you want to create new Configuration

if new_json_config_file==True:
    configuration = {
  "window":1024,
  "overlap":768,
  "threshold": 0.2,
  "file": "./data/test.gwf",
  "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
  "run":"offLine",
  "len":10.0,
  "gps":1167559608,
  "segments":[[1167559008,1167559408],[1167559408,1167560008],[1167560008,1167560308]],
  #"segments":[[1167559608,1167560008] ],
  "outdir": "local_dir/",
  "dir":"local_dir/",
  "ID":"WDF_test",
  "ARorder": 1000,
  "learn": 200,
  "preWhite":2,
  "ResamplingFactor":2,
  "LowFrequencyCut":12,
  "FilterOrder":6,
  "nproc":4
}

    filejson = os.path.join(os.getcwd(),"inputWDF.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()

logging.info("read parameters from JSON file")
par = Parameters()

filejson = "inputWDF.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

par.print()

2023-03-29 14:18:51 hal2022 root[4076457] INFO read parameters from JSON file

{   'ARorder': 1000,
    'FilterOrder': 6,
    'ID': 'WDF_test',
    'LowFrequencyCut': 12,
    'ResamplingFactor': 2,
    'channel': 'H1:GWOSC-4KHZ_R1_STRAIN',
    'dir': 'local_dir/',
    'file': './data/test.gwf',
    'gps': 1167559608,
    'learn': 200,
    'len': 10.0,
    'nproc': 4,
    'outdir': 'local_dir/',
    'overlap': 768,
    'preWhite': 2,
    'run': 'offLine',
    'segments': [   [1167559008, 1167559408],
                    [1167559408, 1167560008],
                    [1167560008, 1167560308]],
    'threshold': 0.2,
    'window': 1024}

It is important that you define correctly the parameters in the configuration files. WDf is a pipeline which performs a series of steps before producing triggers for transient signals in your data. Here it is the list of parameters you can fix in your configuration file.

ARorder = order of AutoRegressive model for whitening
ID = identification numer for your run
ResamplingFactor = the ratio between the original sampling frequency and the downsampled one
channel = the name of the channel in you .gwf o .ffl file you want to analyze
dir = where to find the parameters
file = file to be anlyzed
gps = the starting time for for analysis (overwritten by values in segments)
learn = the length in seconds of the data you will use to estimate AR parameters
len = the time window in second of data loaded in you loop
nproc = the number of processors you will use
outdir = where you want to save the results
overlap = overlapping number between 2 consecutives windows for WDF analysis
prewhite = the number of iter to pre-heat the whitening procedure (leave as it is)
run = additional tag for your data run
segments = 1 or more segments defined as [start time, end time] where you will run WDF, usually 1 segment/processor
threshold = the minimum value for WDF snr to identify a trigger
window = the analyzing window in point for WDF. It should be a power of 2

If you set the par.dir or par.outdir as relative path, we need to give the absolute path.

[3]:

import os
par.dir=os.getcwd()+'/'+par.dir
par.outdir=os.getcwd()+'/'+par.outdir

Load information for sampling frequency¶

[4]:

strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
if par.ResamplingFactor!=None:
    par.resampling = int(par.sampling / par.ResamplingFactor)
    logging.info("sampling frequency= %s, resampled frequency= %s" %(par.sampling, par.resampling))
del Info, strInfo

2023-03-29 14:18:52 hal2022 root[4076457] INFO sampling frequency= 4096, resampled frequency= 2048

Launch WDF runs¶

The fullPrint option is important to save information about the WDF triggers¶

fullPrint = 0 –> you save only the metaparameters for the triggers
fullPrint = 1 –> you save the metaparameters and the wavelet coefficients for that trigger
fullPrint = 2 –> you save the metaparameters and the reconstructed waveform for that trigger
fullPring = 3 –> you save the metaparameters, the wavelet coefficients and the reconstructed waveform for that trigger in the ‘window’ time (window/sampling frequency)

[5]:

import multiprocessing as mp
print("Number of processors: ", mp.cpu_count())
pool = mp.Pool(par.nproc)

wdf=wdfUnitDSWorker(par,fullPrint=2)
pool.map(wdf.segmentProcess, [segment for segment in par.segments])
pool.close()

Number of processors:  32

2023-03-29 14:18:52 hal2022 root[4076576] INFO Analyzing segment: 1167560008-1167560308 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076575] INFO Analyzing segment: 1167559408-1167560008 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076574] INFO Analyzing segment: 1167559008-1167559408 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076575] INFO Start AR parameter estimation
2023-03-29 14:18:52 hal2022 root[4076574] INFO Start AR parameter estimation
2023-03-29 14:18:52 hal2022 root[4076576] INFO Start AR parameter estimation
2023-03-29 14:19:27 hal2022 root[4076575] INFO Estimated sigma= 4.0675451444151897e-22
2023-03-29 14:19:27 hal2022 root[4076576] INFO Estimated sigma= 4.0062637269449494e-22
2023-03-29 14:19:27 hal2022 root[4076574] INFO Estimated sigma= 4.0023147527036696e-22
2023-03-29 14:19:32 hal2022 root[4076575] INFO Starting detection loop
2023-03-29 14:19:32 hal2022 root[4076576] INFO Starting detection loop
2023-03-29 14:19:32 hal2022 root[4076574] INFO Starting detection loop
2023-03-29 14:22:25 hal2022 root[4076576] INFO analyzed 300 seconds in 212.9326889514923 seconds
2023-03-29 14:23:22 hal2022 root[4076574] INFO analyzed 400 seconds in 270.102082490921 seconds
2023-03-29 14:25:12 hal2022 root[4076575] INFO analyzed 600 seconds in 379.5118489265442 seconds

In the output dir with ‘run’ tag you will find the estimated AR coefficients, and a .csv files containing the trigger lists

Let’s have a look at the results¶

[6]:

import pandas as pd

import glob

dirName = par.outdir # use your path
all_files = glob.glob(os.path.join(dirName, "*","*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent
df_from_each_file = (pd.read_csv(f) for f in all_files)
triggers  = pd.concat(df_from_each_file, ignore_index=True)

[7]:

triggers.shape

[7]:

(5810, 1035)

[8]:

import matplotlib.pylab as plt
pd.set_option('display.max_rows', 999)
pd.set_option('max_colwidth',100)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
import numpy as np

%matplotlib inline
# Alternatives include bmh, fivethirtyeight, ggplot,
# dark_background, seaborn-deep, etc
plt.style.use('ggplot')
plt.rcParams['font.monospace'] = 'Ubuntu Mono'
plt.rcParams['font.size'] = 10
plt.rcParams['axes.labelsize'] = 10
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['xtick.labelsize'] = 8
plt.rcParams['ytick.labelsize'] = 8
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['figure.titlesize'] = 12
plt.rcParams['figure.figsize'] = (14, 10)

colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
colors = np.hstack([colors] * 20)
plt.figure(0)
print (triggers.shape)

triggers.head(10)

(5810, 1035)

[8]:

	gps	gpsPeak	duration	EnWDF	snrMean	snrPeak	freqMin	freqMean	freqMax	freqPeak	...	rw1014	rw1015	rw1016	rw1017	rw1018	rw1019	rw1020	rw1021	rw1022	rw1023
0	1167559009.000	1167559009.421	0.453	0.290	0.256	1.786	62.000	148.346	250.000	136.000	...	0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	0.000
1	1167559009.125	1167559009.421	0.500	0.331	0.275	1.809	62.000	155.500	274.000	116.000	...	-0.000	-0.000	-0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
2	1167559009.250	1167559009.421	0.473	0.275	0.248	1.734	58.000	153.423	266.000	116.000	...	-0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	0.000
3	1167559009.375	1167559009.421	0.477	0.235	0.220	1.628	48.000	154.846	272.000	140.000	...	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	-0.000	-0.000
4	1167559011.375	1167559011.782	0.500	0.305	0.219	1.788	88.000	173.769	282.000	112.000	...	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	-0.000	-0.000
5	1167559011.500	1167559011.782	0.404	0.311	0.224	1.782	86.000	169.269	280.000	124.000	...	-0.000	-0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
6	1167559011.625	1167559011.782	0.500	0.350	0.255	1.769	78.000	170.000	302.000	122.000	...	-0.000	-0.000	-0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
7	1167559011.750	1167559011.782	0.473	0.369	0.278	1.794	70.000	168.385	298.000	138.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000
8	1167559011.875	1167559012.096	0.461	0.252	0.221	1.293	56.000	164.808	318.000	156.000	...	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	-0.000	-0.000
9	1167559012.000	1167559012.096	0.458	0.246	0.219	1.269	54.000	174.000	356.000	156.000	...	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	0.000	0.000	0.000

10 rows × 1035 columns

<Figure size 1400x1000 with 0 Axes>

[9]:

ax2 = triggers.plot.scatter(x='gps',
                      y='freqPeak',
                      c='snrPeak',
                      colormap='jet')

../../_images/structure_examples_Multi-WDF_20_0.png

[10]:

df=triggers[triggers['snrPeak']>4]

[11]:

plt.figure(0)
print (df.shape)

from matplotlib.ticker import FormatStrFormatter

sc = plt.scatter(df.gpsPeak,
                 df.freqPeak,c=df.EnWDF, cmap='jet')

# legend
cbar = plt.colorbar(sc)
cbar.ax.set_yticklabels(['5','10', '20', '30', '40', '50', '60','>60'])
cbar.set_label('SNR value', rotation=270)

plt.ylim(50, 1000)
plt.yscale('log')
plt.gca().get_xaxis().get_major_formatter().set_useOffset(False)
plt.gca().get_xaxis().get_major_formatter().set_scientific(False)
plt.xlabel("time")
plt.ylabel("Frequency")
plt.title("Frequency versus time for triggers")

(12, 1035)

/tmp/ipykernel_4076457/2239143324.py:11: UserWarning: FixedFormatter should only be used together with FixedLocator
  cbar.ax.set_yticklabels(['5','10', '20', '30', '40', '50', '60','>60'])

[11]:

Text(0.5, 1.0, 'Frequency versus time for triggers')

../../_images/structure_examples_Multi-WDF_22_3.png

[12]:

df=triggers.sort_values('EnWDF', ascending=False)

[13]:

df.head(10)

[13]:

	gps	gpsPeak	duration	EnWDF	snrMean	snrPeak	freqMin	freqMean	freqMax	freqPeak	...	rw1014	rw1015	rw1016	rw1017	rw1018	rw1019	rw1020	rw1021	rw1022	rw1023
4255	1167559535.375	1167559535.639	0.019	6.242	4.661	70.169	76.000	152.577	262.000	160.000	...	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000	-0.000
4254	1167559535.250	1167559535.639	0.019	6.219	4.659	70.154	74.000	146.962	260.000	160.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
4256	1167559535.500	1167559535.639	0.019	6.170	4.658	70.118	76.000	150.308	262.000	160.000	...	-0.000	-0.000	-0.000	-0.000	-0.000	0.000	0.000	0.000	0.000	0.000
4257	1167559535.625	1167559535.639	0.019	5.798	4.624	69.640	80.000	152.192	264.000	160.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	0.000	0.000
4341	1167559560.875	1167559561.331	0.276	0.706	0.529	8.874	58.000	156.846	276.000	140.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000
4343	1167559561.125	1167559561.331	0.107	0.687	0.517	8.830	58.000	152.923	278.000	136.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000
1673	1167559322.500	1167559322.928	0.497	0.685	0.474	7.734	54.000	146.731	240.000	154.000	...	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	0.000	0.000
4344	1167559561.250	1167559561.331	0.018	0.679	0.506	8.830	68.000	149.731	222.000	170.000	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000
4342	1167559561.000	1167559561.331	0.276	0.675	0.522	8.674	58.000	155.308	276.000	136.000	...	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	-0.000
1674	1167559322.625	1167559322.928	0.411	0.648	0.462	7.642	62.000	150.154	240.000	154.000	...	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.000	-0.000	-0.000

10 rows × 1035 columns

[14]:

wav=np.array(triggers.loc[triggers['EnWDF'].idxmax()][11:].values)

[15]:

import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['agg.path.chunksize']=10000
%matplotlib inline

plt.figure(figsize=(10,4)),
plt.plot(wav,'gray',label='h'),

[15]:

([<matplotlib.lines.Line2D at 0x7fc742d42310>],)

../../_images/structure_examples_Multi-WDF_26_1.png

[16]:

wav=np.array(triggers.loc[11][11:].values)

plt.figure(figsize=(10,4)),
plt.plot(wav,'gray',label='h'),

[16]:

([<matplotlib.lines.Line2D at 0x7fc742d950d0>],)

../../_images/structure_examples_Multi-WDF_27_1.png

[ ]: