Welcome to Wavelet Detection Filter’s documentation!

The library covers the application of the Wavelet Detection Filter (WDF) on the time-series data. The core part of the wdf is based on the WDF implementation p4TSA developed by Elena Cuoco(see p4TSA github repository https://github.com/elenacuoco/p4TSA/). p4TSA can be wrapped in Python and within this library it is denoted by pytsa.

Table of content

Introduction

Wavelet detection filter (WDF) is a python library which wraps some of the routines in C++ of p4TSA and its python wrapper pytsa. In WDF library you can find the WDF itself, that is a pipeline able to detect transient signal, using a wavelet decomposition of the data, followed by a denoising procedure, and later on by the estimation of the energy content of the signal. In the same library you can find the whitening, and double whitneing (equivalent to dividing by the PSD the data) based on the AutoRegressive fit, which is implemented in time domain. You can find also other functionalities linked to parametrice modeling (ARMA, AR, MA) or downsampling filter.

Requirements

  • p4TSA
  • pytsa
  • numpy
  • scipy (upgraded version)
  • docker with p4TSA and WDF environment

Installation

To install the wdf library, one has to run setup.py script from the main directory of the library.

python setup.py install

Howto

For a quick start, you can have a look at the Tutorial section

Contacts

If you find any issues, please contact:

Tutorials

The following tutorials presents few examples of the WDF and other tool usage.

Whitening

Whitening procedure with AutoRegressive (AR) model

author: Elena Cuoco

We can whitening the data in time domain, using the Autoregressive parameters we estimated on a given chunck of data in frame format.

Double whitening refers to the procedure applied in the time domain of data whitening, using the inverse of PSD. However, the method used in pytsa is based on the parametric estimation (AR) of the PSD and the Lattice Filter implementation in the time domain.

[1]:
import time
import os
import json
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import Parameters
from wdf.processes.Whitening import Whitening
from wdf.processes.DWhitening import  DWhitening
from pytsa.tsa import FrameIChannel
import logging, sys

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.debug("info")

new_json_config_file = True    # set to True if you want to create new Configuration
if new_json_config_file==True:
    configuration = {
      "file": "./data/test.gwf",
      "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
      "len":1.0,
      "gps":1167559200,
      "outdir": "./",
      "dir":"./",
      "ARorder": 1000,
      "learn": 200,
      "preWhite":4
    }

    filejson = os.path.join(os.getcwd(),"parameters.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()
logging.info("read parameters from JSON file")

par = Parameters()
filejson = "parameters.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
logging.info("channel= %s at sampling frequency= %s" %(par.channel, par.sampling))

whiten=Whitening(par.ARorder)
par.ARfile = "./ARcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)
par.LVfile ="./LVcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)

if os.path.isfile(par.ARfile) and os.path.isfile(par.LVfile):
    logging.info('Load AR parameters')
    whiten.ParametersLoad(par.ARfile, par.LVfile)

else:
    logging.info('Start AR parameter estimation')
    ######## read data for AR estimation###############
    strLearn = FrameIChannel(par.file, par.channel, par.learn, par.gps)
    Learn = SV()
    strLearn.GetData(Learn)
    whiten.ParametersEstimate(Learn)
    whiten.ParametersSave(par.ARfile, par.LVfile)
INFO:root:read parameters from JSON file
INFO:root:channel= H1:GWOSC-4KHZ_R1_STRAIN at sampling frequency= 4096
INFO:root:Load AR parameters
[2]:
# sigma for the noise
par.sigma = whiten.GetSigma()
logging.info('Estimated sigma= %s' % par.sigma)
INFO:root:Estimated sigma= 5.09281e-22

We use some chunck of data to pre-heating the whitening procedure and avoiding the filter tail.

[3]:
#Initialize the loop for the whitening and double whitening
data = SV()
dataw = SV()
dataww =SV()

streaming = FrameIChannel(par.file, par.channel, par.len, par.gps)
streaming.GetData(data)
N=data.GetSize()

Dwhiten=DWhitening(whiten.LV,N,0)
if os.path.isfile(par.LVfile):
    logging.info('Load LV parameters')
    Dwhiten.ParametersLoad(par.LVfile)
###---whitening preheating---###
for i in range(par.preWhite):
    streaming.GetData(data)
    whiten.Process(data, dataw)
    Dwhiten.Process(data, dataww)
INFO:root:Load LV parameters
[4]:
# data to be plotted
streaming.GetData(data)
whiten.Process(data, dataw)
Dwhiten.Process(data, dataww)
Plot: raw, whitened and double-whitened data
Time-domain
[5]:
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline

plt.rcParams['figure.figsize'] = (15.0, 10.0)
mpl_logger = logging.getLogger("matplotlib")
mpl_logger.setLevel(logging.WARNING)

x=np.zeros(data.GetSize())
y=np.zeros(data.GetSize())
yw=np.zeros(data.GetSize())
yww=np.zeros(data.GetSize())

for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    y[i]=data.GetY(0,i)
    yw[i]=dataw.GetY(0,i)
    yww[i]=dataww.GetY(0,i)

plt.figure(figsize=(10,4))
plt.plot(x, y,  label='Raw data')
plt.plot(x, yw, label='Whitened data')
plt.plot(x, yww, label='Double whitened data')

plt.legend()
plt.show()

_images/structure_examples_Whitening_10_0.png
Frequency domain (PSD)
[6]:
from scipy import signal
f, Pxx_den = signal.welch(y, par.sampling, nperseg=2048)
f, Pxx_denW = signal.welch(yw, par.sampling, nperseg=2048)
f, Pxx_denWW = signal.welch(yww, par.sampling, nperseg=2048)
fig, ax = plt.subplots()
ax.semilogy(f, Pxx_den)
ax.semilogy(f, Pxx_denW)
ax.semilogy(f, Pxx_denWW)

plt.xlabel('frequency [Hz]')
plt.ylabel('PSD [V**2/Hz]')
plt.show()
_images/structure_examples_Whitening_12_0.png
Time-Frequency domain
[7]:
plt.figure(figsize=(10,4)),
freqs, times, spectrogram = signal.spectrogram(y)

plt.figure(figsize=(5, 4))
plt.imshow(spectrogram, aspect='auto', cmap='jet', origin='lower')
plt.title('Original Spectrogram')
plt.ylabel('Frequency band')
plt.xlabel('Time window')
plt.tight_layout()
<Figure size 1000x400 with 0 Axes>
_images/structure_examples_Whitening_14_1.png
[8]:
plt.figure(figsize=(10,4)),
freqs, times, spectrogram = signal.spectrogram(yw)

plt.figure(figsize=(5, 4))
plt.imshow(spectrogram, aspect='auto', cmap='jet', origin='lower')
plt.title('Whitened Spectrogram')
plt.ylabel('Frequency band')
plt.xlabel('Time window')
plt.tight_layout()
<Figure size 1000x400 with 0 Axes>
_images/structure_examples_Whitening_15_1.png
[9]:
plt.figure(figsize=(10,4)),
freqs, times, spectrogram = signal.spectrogram(yww)

plt.figure(figsize=(5, 4))
plt.imshow(spectrogram, aspect='auto', cmap='jet', origin='lower')
plt.title('Double whitened Spectrogram')
plt.ylabel('Frequency band')
plt.xlabel('Time window')
plt.tight_layout()
<Figure size 1000x400 with 0 Axes>
_images/structure_examples_Whitening_16_1.png

InverseWhitening

Inverse Whitening procedure with AutoRegressive (AR) model

author: Elena Cuoco

  • We can ‘color’ the data which have been whitened, using the P AR parameters and an ARMA(P,1) filter
[1]:
import time
import os
import pytsa
from pytsa.tsa import *
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import *
from wdf.processes.Whitening import  *
from wdf.processes.DWhitening import  *
import logging, sys

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.debug("info")

new_json_config_file = True    # set to True if you want to create new Configuration
if new_json_config_file==True:
    configuration = {
      "file": "./data/test.gwf",
      "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
      "len":1.0,
      "gps":1167559100,
      "outdir": "./",
      "dir":"./",
      "ARorder": 2000,
      "learn": 200,
      "preWhite":4
    }

    filejson = os.path.join(os.getcwd(),"InputParameters.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()
logging.info("read parameters from JSON file")

par = Parameters()
filejson = "InputParameters.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
logging.info("channel= %s at sampling frequency= %s" %(par.channel, par.sampling))

whiten=Whitening(par.ARorder)
par.ARfile = "./ARcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)
par.LVfile ="./LVcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)

if os.path.isfile(par.ARfile) and os.path.isfile(par.LVfile):
    logging.info('Load AR parameters')
    whiten.ParametersLoad(par.ARfile, par.LVfile)
else:
    logging.info('Start AR parameter estimation')
    ######## read data for AR estimation###############
    strLearn = FrameIChannel(par.file, par.channel, par.learn, par.gps)
    Learn = SV()
    strLearn.GetData(Learn)
    whiten.ParametersEstimate(Learn)
    whiten.ParametersSave(par.ARfile, par.LVfile)
INFO:root:read parameters from JSON file
INFO:root:channel= H1:GWOSC-4KHZ_R1_STRAIN at sampling frequency= 4096
INFO:root:Start AR parameter estimation
[2]:
data = SV()
dataw = SV()
streaming = FrameIChannel(par.file, par.channel, par.len, par.gps)

streaming.GetData(data)
N=data.GetSize()

Dwhiten=DWhitening(whiten.LV,N,0)

for i in range(par.preWhite):
    streaming.GetData(data)
    Dwhiten.Process(data, dataw)

(D)Whiten the data

How whiten your data are depends on a series of factors: the stationarity of the noise, the number of AR parameters you used, the lenght of the sequence of data you used to estimate the parameters

[3]:
import numpy as np
import matplotlib

import matplotlib.pyplot as plt

%matplotlib inline

plt.rcParams['figure.figsize'] = (15.0, 10.0)
mpl_logger = logging.getLogger("matplotlib")
mpl_logger.setLevel(logging.WARNING)


x=np.zeros(data.GetSize())
y=np.zeros(data.GetSize())
yw=np.zeros(dataw.GetSize())

for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    y[i]=data.GetY(0,i)
    yw[i]=dataw.GetY(0,i)
plt.figure(figsize=(10,4))


plt.plot(x, y, 'b',label='Raw data')
plt.plot(x, yw, 'g',label='whitened data')
plt.legend()
plt.show()
_images/structure_examples_InverseWhitening_6_0.png
[4]:
plt.figure(figsize=(10,4))


plt.plot(x, yw, 'g', label='whitened data')

plt.legend()
plt.show()
_images/structure_examples_InverseWhitening_7_0.png

Recoloring data using an ARMA (P,Q) filter

P= the numer of AR parameters, Q=1

In order to take into account the transient response of the filter, we need to do a ‘preaheating for the filter’ and so go first in a loop to get good result

[5]:
from wdf.processes.Coloring import *
from wdf.structures.array2SeqView import *


datac = SV()

dataw=SV()

Colored=Coloring(par.ARorder)
Colored.ParametersLoad(par.ARfile)
for j in range(5):
    streaming.GetData(data)
    whiten.Process(data, dataw)
    Colored.Process(dataw, datac)

[6]:
%matplotlib notebook
[7]:
x=np.zeros(data.GetSize())
y=np.zeros(data.GetSize())
yc=np.zeros(datac.GetSize())


for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    y[i]=data.GetY(0,i)
    yc[i]=datac.GetY(0,i)

plt.figure(figsize=(10,4))

plt.plot(x, y,  label='Raw data')
plt.plot(x, yc, label='Recolored-data')

plt.legend()
plt.show()
TO BE DOCUMENTED

Multi-WDF

Example of WDf usage with multi data segments and multi-processing

author: Elena Cuoco

Cuoco et al., Wavelet-Based Classification of Transient Signals for Gravitational Wave Detectors DO - 10.23919/EUSIPCO.2018.8553393

Please note that many packages, as graphic one or logging are not part of WDF docker, but you can install them locally or use your preferred ones

[1]:
# import libraries
import time
import os
from pytsa.tsa import *
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import *
from wdf.processes.wdfUnitDSWorker import *
from wdf.processes.wdfUnitWorker import *
import logging
import coloredlogs
#select level of logging
coloredlogs.install(isatty=True)

logging.basicConfig(level=logging.DEBUG)


[2]:
new_json_config_file = True    # set to True if you want to create new Configuration

if new_json_config_file==True:
    configuration = {
  "window":1024,
  "overlap":768,
  "threshold": 0.2,
  "file": "./data/test.gwf",
  "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
  "run":"offLine",
  "len":10.0,
  "gps":1167559608,
  "segments":[[1167559008,1167559408],[1167559408,1167560008],[1167560008,1167560308]],
  #"segments":[[1167559608,1167560008] ],
  "outdir": "local_dir/",
  "dir":"local_dir/",
  "ID":"WDF_test",
  "ARorder": 1000,
  "learn": 200,
  "preWhite":2,
  "ResamplingFactor":2,
  "LowFrequencyCut":12,
  "FilterOrder":6,
  "nproc":4
}

    filejson = os.path.join(os.getcwd(),"inputWDF.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()

logging.info("read parameters from JSON file")
par = Parameters()

filejson = "inputWDF.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

par.print()
2023-03-29 14:18:51 hal2022 root[4076457] INFO read parameters from JSON file
{   'ARorder': 1000,
    'FilterOrder': 6,
    'ID': 'WDF_test',
    'LowFrequencyCut': 12,
    'ResamplingFactor': 2,
    'channel': 'H1:GWOSC-4KHZ_R1_STRAIN',
    'dir': 'local_dir/',
    'file': './data/test.gwf',
    'gps': 1167559608,
    'learn': 200,
    'len': 10.0,
    'nproc': 4,
    'outdir': 'local_dir/',
    'overlap': 768,
    'preWhite': 2,
    'run': 'offLine',
    'segments': [   [1167559008, 1167559408],
                    [1167559408, 1167560008],
                    [1167560008, 1167560308]],
    'threshold': 0.2,
    'window': 1024}

It is important that you define correctly the parameters in the configuration files. WDf is a pipeline which performs a series of steps before producing triggers for transient signals in your data. Here it is the list of parameters you can fix in your configuration file.

  • ARorder = order of AutoRegressive model for whitening
  • ID = identification numer for your run
  • ResamplingFactor = the ratio between the original sampling frequency and the downsampled one
  • channel = the name of the channel in you .gwf o .ffl file you want to analyze
  • dir = where to find the parameters
  • file = file to be anlyzed
  • gps = the starting time for for analysis (overwritten by values in segments)
  • learn = the length in seconds of the data you will use to estimate AR parameters
  • len = the time window in second of data loaded in you loop
  • nproc = the number of processors you will use
  • outdir = where you want to save the results
  • overlap = overlapping number between 2 consecutives windows for WDF analysis
  • prewhite = the number of iter to pre-heat the whitening procedure (leave as it is)
  • run = additional tag for your data run
  • segments = 1 or more segments defined as [start time, end time] where you will run WDF, usually 1 segment/processor
  • threshold = the minimum value for WDF snr to identify a trigger
  • window = the analyzing window in point for WDF. It should be a power of 2

If you set the par.dir or par.outdir as relative path, we need to give the absolute path.

[3]:
import os
par.dir=os.getcwd()+'/'+par.dir
par.outdir=os.getcwd()+'/'+par.outdir
Load information for sampling frequency
[4]:
strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
if par.ResamplingFactor!=None:
    par.resampling = int(par.sampling / par.ResamplingFactor)
    logging.info("sampling frequency= %s, resampled frequency= %s" %(par.sampling, par.resampling))
del Info, strInfo

2023-03-29 14:18:52 hal2022 root[4076457] INFO sampling frequency= 4096, resampled frequency= 2048
Launch WDF runs
The fullPrint option is important to save information about the WDF triggers
  • fullPrint = 0 –> you save only the metaparameters for the triggers
  • fullPrint = 1 –> you save the metaparameters and the wavelet coefficients for that trigger
  • fullPrint = 2 –> you save the metaparameters and the reconstructed waveform for that trigger
  • fullPring = 3 –> you save the metaparameters, the wavelet coefficients and the reconstructed waveform for that trigger in the ‘window’ time (window/sampling frequency)
[5]:
import multiprocessing as mp
print("Number of processors: ", mp.cpu_count())
pool = mp.Pool(par.nproc)

wdf=wdfUnitDSWorker(par,fullPrint=2)
pool.map(wdf.segmentProcess, [segment for segment in par.segments])
pool.close()
Number of processors:  32
2023-03-29 14:18:52 hal2022 root[4076576] INFO Analyzing segment: 1167560008-1167560308 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076575] INFO Analyzing segment: 1167559408-1167560008 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076574] INFO Analyzing segment: 1167559008-1167559408 for channel H1:GWOSC-4KHZ_R1_STRAIN downsampled at 2048Hz
2023-03-29 14:18:52 hal2022 root[4076575] INFO Start AR parameter estimation
2023-03-29 14:18:52 hal2022 root[4076574] INFO Start AR parameter estimation
2023-03-29 14:18:52 hal2022 root[4076576] INFO Start AR parameter estimation
2023-03-29 14:19:27 hal2022 root[4076575] INFO Estimated sigma= 4.0675451444151897e-22
2023-03-29 14:19:27 hal2022 root[4076576] INFO Estimated sigma= 4.0062637269449494e-22
2023-03-29 14:19:27 hal2022 root[4076574] INFO Estimated sigma= 4.0023147527036696e-22
2023-03-29 14:19:32 hal2022 root[4076575] INFO Starting detection loop
2023-03-29 14:19:32 hal2022 root[4076576] INFO Starting detection loop
2023-03-29 14:19:32 hal2022 root[4076574] INFO Starting detection loop
2023-03-29 14:22:25 hal2022 root[4076576] INFO analyzed 300 seconds in 212.9326889514923 seconds
2023-03-29 14:23:22 hal2022 root[4076574] INFO analyzed 400 seconds in 270.102082490921 seconds
2023-03-29 14:25:12 hal2022 root[4076575] INFO analyzed 600 seconds in 379.5118489265442 seconds

In the output dir with ‘run’ tag you will find the estimated AR coefficients, and a .csv files containing the trigger lists

Let’s have a look at the results
[6]:
import pandas as pd

import glob

dirName = par.outdir # use your path
all_files = glob.glob(os.path.join(dirName, "*","*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent
df_from_each_file = (pd.read_csv(f) for f in all_files)
triggers  = pd.concat(df_from_each_file, ignore_index=True)
[7]:
triggers.shape
[7]:
(5810, 1035)
[8]:
import matplotlib.pylab as plt
pd.set_option('display.max_rows', 999)
pd.set_option('max_colwidth',100)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
import numpy as np

%matplotlib inline
# Alternatives include bmh, fivethirtyeight, ggplot,
# dark_background, seaborn-deep, etc
plt.style.use('ggplot')
plt.rcParams['font.monospace'] = 'Ubuntu Mono'
plt.rcParams['font.size'] = 10
plt.rcParams['axes.labelsize'] = 10
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['xtick.labelsize'] = 8
plt.rcParams['ytick.labelsize'] = 8
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['figure.titlesize'] = 12
plt.rcParams['figure.figsize'] = (14, 10)

colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
colors = np.hstack([colors] * 20)
plt.figure(0)
print (triggers.shape)

triggers.head(10)

(5810, 1035)
[8]:
gps gpsPeak duration EnWDF snrMean snrPeak freqMin freqMean freqMax freqPeak ... rw1014 rw1015 rw1016 rw1017 rw1018 rw1019 rw1020 rw1021 rw1022 rw1023
0 1167559009.000 1167559009.421 0.453 0.290 0.256 1.786 62.000 148.346 250.000 136.000 ... 0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 0.000
1 1167559009.125 1167559009.421 0.500 0.331 0.275 1.809 62.000 155.500 274.000 116.000 ... -0.000 -0.000 -0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
2 1167559009.250 1167559009.421 0.473 0.275 0.248 1.734 58.000 153.423 266.000 116.000 ... -0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 0.000
3 1167559009.375 1167559009.421 0.477 0.235 0.220 1.628 48.000 154.846 272.000 140.000 ... 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000 -0.000
4 1167559011.375 1167559011.782 0.500 0.305 0.219 1.788 88.000 173.769 282.000 112.000 ... 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000 -0.000
5 1167559011.500 1167559011.782 0.404 0.311 0.224 1.782 86.000 169.269 280.000 124.000 ... -0.000 -0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
6 1167559011.625 1167559011.782 0.500 0.350 0.255 1.769 78.000 170.000 302.000 122.000 ... -0.000 -0.000 -0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
7 1167559011.750 1167559011.782 0.473 0.369 0.278 1.794 70.000 168.385 298.000 138.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000
8 1167559011.875 1167559012.096 0.461 0.252 0.221 1.293 56.000 164.808 318.000 156.000 ... 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000 -0.000
9 1167559012.000 1167559012.096 0.458 0.246 0.219 1.269 54.000 174.000 356.000 156.000 ... -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 0.000 0.000 0.000

10 rows × 1035 columns

<Figure size 1400x1000 with 0 Axes>
[9]:
ax2 = triggers.plot.scatter(x='gps',
                      y='freqPeak',
                      c='snrPeak',
                      colormap='jet')

_images/structure_examples_Multi-WDF_20_0.png
[10]:
df=triggers[triggers['snrPeak']>4]
[11]:
plt.figure(0)
print (df.shape)

from matplotlib.ticker import FormatStrFormatter

sc = plt.scatter(df.gpsPeak,
                 df.freqPeak,c=df.EnWDF, cmap='jet')

# legend
cbar = plt.colorbar(sc)
cbar.ax.set_yticklabels(['5','10', '20', '30', '40', '50', '60','>60'])
cbar.set_label('SNR value', rotation=270)

plt.ylim(50, 1000)
plt.yscale('log')
plt.gca().get_xaxis().get_major_formatter().set_useOffset(False)
plt.gca().get_xaxis().get_major_formatter().set_scientific(False)
plt.xlabel("time")
plt.ylabel("Frequency")
plt.title("Frequency versus time for triggers")
(12, 1035)
/tmp/ipykernel_4076457/2239143324.py:11: UserWarning: FixedFormatter should only be used together with FixedLocator
  cbar.ax.set_yticklabels(['5','10', '20', '30', '40', '50', '60','>60'])
[11]:
Text(0.5, 1.0, 'Frequency versus time for triggers')
_images/structure_examples_Multi-WDF_22_3.png
[12]:
df=triggers.sort_values('EnWDF', ascending=False)
[13]:
df.head(10)
[13]:
gps gpsPeak duration EnWDF snrMean snrPeak freqMin freqMean freqMax freqPeak ... rw1014 rw1015 rw1016 rw1017 rw1018 rw1019 rw1020 rw1021 rw1022 rw1023
4255 1167559535.375 1167559535.639 0.019 6.242 4.661 70.169 76.000 152.577 262.000 160.000 ... -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000
4254 1167559535.250 1167559535.639 0.019 6.219 4.659 70.154 74.000 146.962 260.000 160.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
4256 1167559535.500 1167559535.639 0.019 6.170 4.658 70.118 76.000 150.308 262.000 160.000 ... -0.000 -0.000 -0.000 -0.000 -0.000 0.000 0.000 0.000 0.000 0.000
4257 1167559535.625 1167559535.639 0.019 5.798 4.624 69.640 80.000 152.192 264.000 160.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 0.000 0.000
4341 1167559560.875 1167559561.331 0.276 0.706 0.529 8.874 58.000 156.846 276.000 140.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000
4343 1167559561.125 1167559561.331 0.107 0.687 0.517 8.830 58.000 152.923 278.000 136.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000
1673 1167559322.500 1167559322.928 0.497 0.685 0.474 7.734 54.000 146.731 240.000 154.000 ... 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 0.000 0.000
4344 1167559561.250 1167559561.331 0.018 0.679 0.506 8.830 68.000 149.731 222.000 170.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000
4342 1167559561.000 1167559561.331 0.276 0.675 0.522 8.674 58.000 155.308 276.000 136.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000
1674 1167559322.625 1167559322.928 0.411 0.648 0.462 7.642 62.000 150.154 240.000 154.000 ... 0.000 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000

10 rows × 1035 columns

[14]:
wav=np.array(triggers.loc[triggers['EnWDF'].idxmax()][11:].values)
[15]:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['agg.path.chunksize']=10000
%matplotlib inline

plt.figure(figsize=(10,4)),
plt.plot(wav,'gray',label='h'),
[15]:
([<matplotlib.lines.Line2D at 0x7fc742d42310>],)
_images/structure_examples_Multi-WDF_26_1.png
[16]:
wav=np.array(triggers.loc[11][11:].values)

plt.figure(figsize=(10,4)),
plt.plot(wav,'gray',label='h'),
[16]:
([<matplotlib.lines.Line2D at 0x7fc742d950d0>],)
_images/structure_examples_Multi-WDF_27_1.png
[ ]:

SignalFiltering

Example of data filtering, using utility functions in wdf

author: Elena Cuoco

Sometime you may need to high pass, low pass or filter your data, while you are in a loop with SeqView of data. We will show how you can do this with SeqView structures, using utility functions which rely on scipy signal library

[1]:
import time
import os
from pytsa.tsa import *
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import *
import numpy as np

import logging, sys
logging.disable(sys.maxsize)
logging.basicConfig(level=logging.DEBUG)
new_json_config_file = True    # set to True if you want to create new Configuration

new_json_config_file = True    # set to True if you want to create new Configuration
if new_json_config_file==True:
    configuration = {
      "file": "./data/test.gwf",
      "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
      "len":1.0,
      "gps":1167559100,
      "outdir": "./",
      "dir":"./",
      "ARorder": 2000,
      "learn": 200,
      "preWhite":4
    }

    filejson = os.path.join(os.getcwd(),"InputParameters.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()
logging.info("read parameters from JSON file")

par = Parameters()
filejson = "InputParameters.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
logging.info("channel= %s at sampling frequency= %s" %(par.channel, par.sampling))
Butterworth Filter

Often it is useful to cut off low frequency in your data, because the noise is too high and you are not looking for signal in that region of frequency. We can do using a Butterworth filter from scipy library. It is advised that you make a design study offline to decide better the order of your filter.

[2]:
from wdf.utility.Filters import *
from wdf.utility.HighPassFilter import *
frequency=20
sampling=par.sampling
order=5
filtertype='high'
bf=Butterworth(frequency, sampling, order, filtertype)
hp=HighPassFilter(frequency, sampling,order)
Pre-heating of the filter

We use some chunck of data to pre-heating the filtering procedure and avoiding the filter tail.

[3]:

data = SV() dataf = SV() datafl = SV() streaming = FrameIChannel(par.file, par.channel, par.len, par.gps) ###---filter preheating---### for i in range(1): streaming.GetData(data) dataf=bf.ProcessSeq(data) datafl=hp.Process(data)
Filtering
[4]:
#filtering
streaming.GetData(data)
dataf=bf.ProcessSeq(data)
datafl=hp.Process(data)
Plot: raw and filtered data
Time-domain
[5]:
import matplotlib.pylab as plt
import numpy as np
import logging
%matplotlib widget
mpl_logger = logging.getLogger("matplotlib")
mpl_logger.setLevel(logging.WARNING)
x=np.zeros(data.GetSize())
y=np.zeros(data.GetSize())
yf=np.zeros(dataf.GetSize())
yfl=np.zeros(dataf.GetSize())


for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    y[i]=data.GetY(0,i)
    yf[i]=dataf.GetY(0,i)
    yfl[i]=datafl.GetY(0,i)
plt.figure(figsize=(10,8))

plt.plot(x, y/20,'b', label='Raw data')
plt.plot(x, yf, 'g', label='Filtered data with filtfilt')
plt.plot(x, yfl, 'r',label='Filtered data with lfilter')

plt.legend()
plt.show()

Frequency domain (PSD)
[6]:
from scipy import signal
f, Pxx_den = signal.welch(y, par.sampling, nperseg=2048)
f, Pxx_denW = signal.welch(yf, par.sampling, nperseg=2048)
f, Pxx_denHP= signal.welch(yfl, par.sampling, nperseg=2048)
fig, ax = plt.subplots()
ax.loglog(f, Pxx_den,'b')
ax.loglog(f, Pxx_denW,'g')
ax.loglog(f, Pxx_denHP,'r')
plt.xlabel('frequency [Hz]')
plt.ylabel('PSD [V**2/Hz]')
plt.show()

WaveformRecon

Waveform denoised and reconstructed

author: Elena Cuoco

We want to show you how a gravitational wave signal becomes more apparent after whitening and double whitening of the data. The data are not downsampled

Double whitening refers to the procedure applied in the time domain of data whitening, using the inverse of PSD. However, the method used in pytsa is based on the parametric estimation (AR) of the PSD and the Lattice Filter implementation in the time domain.

[1]:
import time
import os
from pytsa.tsa import *
from pytsa.tsa import SeqView_double_t as SV
from wdf.config.Parameters import *
from wdf.processes.Whitening import  *
from wdf.processes.DWhitening import  *

import logging, sys
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.debug("info")

new_json_config_file = True    # set to True if you want to create new Configuration
if new_json_config_file==True:
    configuration = {
      "file": "./data/test.gwf",
      "channel": "H1:GWOSC-4KHZ_R1_STRAIN",
      "len":1.0,
      "gps":1167559536,
      "outdir": "./",
      "dir":"./",
      "ARorder": 3000,
      "learn": 300,
      "preWhite":4
    }

    filejson = os.path.join(os.getcwd(),"WavRec.json")
    file_json = open(filejson, "w+")
    json.dump(configuration, file_json)
    file_json.close()
logging.info("read parameters from JSON file")

par = Parameters()
filejson = "WavRec.json"
try:
    par.load(filejson)
except IOError:
    logging.error("Cannot find resource file " + filejson)
    quit()

strInfo = FrameIChannel(par.file, par.channel, 1.0, par.gps)
Info = SV()
strInfo.GetData(Info)
par.sampling = int(1.0 / Info.GetSampling())
logging.info("channel= %s at sampling frequency= %s" %(par.channel, par.sampling))

whiten=Whitening(par.ARorder)
par.ARfile = "./ARcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)
par.LVfile ="./LVcoeff-AR%s-fs%s-%s.txt" % (
                par.ARorder, par.sampling, par.channel)

if os.path.isfile(par.ARfile) and os.path.isfile(par.LVfile):
    logging.info('Load AR parameters')
    whiten.ParametersLoad(par.ARfile, par.LVfile)
else:
    logging.info('Start AR parameter estimation')
    ######## read data for AR estimation###############
    strLearn = FrameIChannel(par.file, par.channel, par.learn, par.gps)
    Learn = SV()
    strLearn.GetData(Learn)
    whiten.ParametersEstimate(Learn)
    whiten.ParametersSave(par.ARfile, par.LVfile)
INFO:root:read parameters from JSON file
INFO:root:channel= H1:GWOSC-4KHZ_R1_STRAIN at sampling frequency= 4096
INFO:root:Start AR parameter estimation
[2]:
# sigma for the noise
par.sigma = whiten.GetSigma()
print('Estimated sigma= %s' % par.sigma)
Estimated sigma= 4.951028717156321e-22

We use some chunck of data to pre-heating the whitening procedure and avoiding the filter tail.

[3]:
#Try to center 1sec beore and 1 after the event
lenS=2.0
gpsEvent=1167559936.6
gps=gpsEvent-1.0-par.preWhite*lenS
data = SV()
dataw = SV()
dataww = SV()
N=int(par.sampling*lenS)
streaming = FrameIChannel(par.file, par.channel, lenS, gps)
Dwhiten=DWhitening(whiten.LV ,N,0)

###---whitening preheating---###
for i in range(par.preWhite):
    streaming.GetData(data)
    whiten.Process(data, dataw)
    Dwhiten.Process(data, dataww)
[4]:
print(par.file, par.channel, lenS, gps,dataw.GetStart())
./data/test.gwf H1:GWOSC-4KHZ_R1_STRAIN 2.0 1167559927.6 1167559933.6
[5]:
streaming.GetData(data)
whiten.Process(data, dataw)
Dwhiten.Process(data, dataww)
[6]:
print(dataw.GetStart(),dataw.GetY(0,0))
1167559935.6 5.647725652827097e-23
[7]:
print(dataw.GetSize()/par.sampling)
2.0
Plot: raw and whitened data
time-domain
[8]:
import numpy as np
import logging
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pylab
import os
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12.0, 8.0)
mpl_logger = logging.getLogger("matplotlib")
mpl_logger.setLevel(logging.WARNING)
x=np.zeros(data.GetSize())
y=np.zeros(data.GetSize())
yw=np.zeros(dataw.GetSize())
yww=np.zeros(dataww.GetSize())

for i in range(dataw.GetSize()):
    x[i]=data.GetX(i)
    y[i]=data.GetY(0,i)
    yw[i]=dataw.GetY(0,i)
    yww[i]=dataww.GetY(0,i)

fig, ax = plt.subplots()

ax.plot(x, y,  label='Raw data')
ax.plot(x, yww, label='D-whitened data')
ax.plot(x, yw, label='whitened data')

ax.legend()
plt.show()

_images/structure_examples_WaveformRecon_14_0.png
[9]:
fig, ax = plt.subplots()
ax.plot(x, yww, label='D-whitened data')
ax.plot(x, yw, label='whitened data')
[9]:
[<matplotlib.lines.Line2D at 0x7ff8dbc57ad0>]
_images/structure_examples_WaveformRecon_15_1.png
[20]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy import signal
from matplotlib.colors import LogNorm

def prepareImage_gw(x,y,fs,title="title"):
    w = 10.
    freq = np.linspace(1, fs/2, int(fs/2))
    widths = w*fs / (2*freq*np.pi)
    z = np.abs(signal.cwt(y, signal.morlet2, widths, w=w))**2

    plt.pcolormesh(x, freq,z,cmap='Spectral',shading='gouraud',alpha=0.95,norm=LogNorm())
    plt.yscale('log')
    plt.ylim(10, 1000)
    plt.title(str(title))
    plt.show()

    return
[21]:
prepareImage_gw(x,y,par.sampling,"time-frequency plot on raw data")
_images/structure_examples_WaveformRecon_17_0.png
[23]:
prepareImage_gw(x,yw,par.sampling,"time-frequency plot on whitened data")
_images/structure_examples_WaveformRecon_18_0.png
[24]:
prepareImage_gw(x,yww,par.sampling,"time-frequency plot on double-whitened data")
_images/structure_examples_WaveformRecon_19_0.png
[25]:
datasize=data.GetSize()
yr=np.zeros(data.GetSize())
sigma=whiten.GetSigma()

wt = WaveletTransform.BsplineC309
WT = WaveletTransform(datasize, wt)
t =WaveletThreshold.dohonojohnston
wavthres = WaveletThreshold(datasize, 1, sigma);

WT.Forward(dataw);
wavthres(dataw, t);
WT.Inverse(dataw);
for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    yr[i]=dataw.GetY(0,i)

[26]:


fig, ax = plt.subplots() ax.plot(x, yw, label='whitened data') ax.plot(x, yr, label='denoised data') ax.legend() plt.show()
_images/structure_examples_WaveformRecon_21_0.png
[27]:
prepareImage_gw(x,yr,par.sampling,"time-frequency plot on whitened denoised data")
_images/structure_examples_WaveformRecon_22_0.png
[28]:
datasize=data.GetSize()
yr=np.zeros(data.GetSize())
sigma=whiten.GetSigma()

wt = WaveletTransform.BsplineC309
WT = WaveletTransform(datasize, wt)
t =WaveletThreshold.dohonojohnston
wavthres = WaveletThreshold(datasize, 1, sigma);

WT.Forward(dataww);
wavthres(dataww, t);
WT.Inverse(dataww);
for i in range(data.GetSize()):
    x[i]=data.GetX(i)
    yr[i]=dataww.GetY(0,i)
[29]:
prepareImage_gw(x,yr,par.sampling,"time-frequency plot on double-whitened denoised data")
_images/structure_examples_WaveformRecon_24_0.png
[30]:
fig, ax = plt.subplots()
ax.plot(x,yww, label='D-whitened data')

ax.plot(x,yr, label='denoised  data')

ax.legend()
plt.show()
_images/structure_examples_WaveformRecon_25_0.png
[ ]:

WDF source code

The library’s design splits the WDF pipeline into four directories: config, observers, structures, processes

WDF source code

The config functions

parameters
parametersFilePersistence

The observers functions

observer
class observers.observer.Observer[source]

The observer class

observable
class observers.observable.Observable[source]

This class registers and updates various observers

register(observer)[source]

This method registers an observer

Parameters:observer (object) – An observer to be registered
unregister(observer)[source]

This methods unregisters an observer

Parameters:observer (object) – An observer to be unregistered
unregister_all()[source]

This method unregisters all observers

update_observers(*args, **kwargs)[source]

This method calls an update function for each observers with various parameters

Parameters:
  • args (object) – First parameter of the update function for the given observer
  • kwargs (object) – The following parameters of the update function for the given observer
Returns:

The object with triggers; type of object depends on the observer

PrintFileObserver
class observers.PrintFileObserver.PrintTriggers(par)[source]
PrintFilePEObserver
ParameterEstimationObserver
SegmentsObserver
SingleEventPrintFileObserver
class observers.SingleEventPrintFileObserver.SingleEventPrintTriggers(par, fullPrint=0)[source]

The class defining methods to save single event

update(CEV)[source]

This methods saves the triggers to the csv file

Parameters:
  • eventPE (pytsa object) – Metadata, wavelet coefficients and reconstructed wavelets of the trigger
  • CEV (pytsa object) – pytsa object that contains metadata, wavelet coefficients and reconstructed wavelets of the trigger.
wdfWorkerObserver

The structures functions

array2SeqView
ClusteredEvent
eventPE
class structures.eventPE.eventPE(gps, gpsPeak, duration, EnWDF, snrMean, snrPeak, freqMin, freqMean, freqMax, freqPeak, wave, coeff, Icoeff)[source]

This class stands for the encapsulation of the trigger data into one object

evCopy(ev)[source]

This method copies the parameter of the ev, eventPE object

Parameters:
  • ev (eventPE) – The eventPE object to copy parameters from
  • gps (float) – GPS time of the trigger denoting the first gps of analyzing window
  • gps – GPS time of the trigger denoting the moment it appeared at maximum SNR
  • EnWDF (float) – The Signal to Noise Ratio of the trigger statistics of WDF
  • snrMean (float) – The estimated mean Signal to Noise Ratio of the trigger
  • snrPeak (float) – The estimated Signal to Noise Ratio of the trigger at its peak
  • freqMin (float) – The minimum frequency of the trigger
  • freqMax (float) – The maximum frequency of the trigger
  • freqMean (float) – The mean frequency of the trigger
  • freqPeak (float) – The frequency at the peak of the trigger
  • duration (float) – The time duration of the trigger
  • wave (str) – The type of the wavelet
  • coeff (list) – The list containing wavelet coefficients of the trigger
  • Icoeff (list) – The list containing raw wavelet coefficients of the trigger
update(gps, gpsPeak, duration, EnWDF, snrMean, snrPeak, freqMin, freqMean, freqMax, freqPeak, wave, coeff, Icoeff)[source]

This method updates the eventPE object with new parameters

Parameters:
  • gps (float) – GPS time of the trigger denoting the first gps of analyzing window
  • gps – GPS time of the trigger denoting the moment it appeared at maximum SNR
  • EnWDF (float) – The Signal to Noise Ratio of the trigger statistics of WDF
  • snrMean (float) – The estimated mean Signal to Noise Ratio of the trigger
  • snrPeak (float) – The estimated Signal to Noise Ratio of the trigger at its peak
  • freqMin (float) – The minimum frequency of the trigger
  • freqMax (float) – The maximum frequency of the trigger
  • freqMean (float) – The mean frequency of the trigger
  • freqPeak (float) – The frequency at the peak of the trigger
  • duration (float) – The time duration of the trigger
  • wave (str) – The type of the wavelet
  • coeff (list) – The list containing wavelet coefficients of the trigger
  • Icoeff (list) – The list containing raw wavelet coefficients of the trigger
segment

The processes functions

AdaptiveWhitening

Whitening

createsegments
createsegmentsMinMax
class processes.createsegmentsMinMax.createSegmentsMinMax(parameters)[source]
DWhitening

Whitening

StateVectorSegments
class processes.StateVectorSegments.createSegments(parameters)[source]
wdf_reconstruct
class processes.wdf_reconstruct.wdf_reconstruct(parameters, wTh=<sphinx.ext.autodoc.importer._MockObject object>)[source]

The main WDF class responsible for the communication with the p4TSA library regarding the application of WDF onto data

FindEvents()[source]

This method calls wdf2reconstruct function from pytsa to search for triggers in the data

Returns:trigger
Process()[source]

This method calls wdf2reconstruct function from pytsa to search for triggers in the data If the triggers are found, they are stored in tosend_triggers variable that is later on used for further processing

SetData(data)[source]

This methods sets sets the data for the p4TSA wdf2reconstruct class for further search of triggers

Parameters:data (pytsa.SeqViewDouble) – An pytsa.SeqViewDouble object storing data to be processed
wdf
class processes.wdf.wdf(WdfParams: <sphinx.ext.autodoc.importer._MockObject object at 0x7f39899eee90>, wTh=<sphinx.ext.autodoc.importer._MockObject object>)[source]

The main WDF class responsible for the communication with the p4TSA library regarding the application of WDF onto data

FindEvents()[source]

This method calls wdf2classify function from pytsa to search for triggers in the data

Returns:trigger
Process()[source]

This method calls wdf2classify function from pytsa to search for triggers in the data If the triggers are found, they are stored in tosend_triggers variable that is later on used for further processing

SetData(data)[source]

This methods sets sets the data for the p4TSA wdf2classify class for further search of triggers

Parameters:data (pytsa.SeqViewDouble) – An pytsa.SeqViewDouble object storing data to be processed
wdfUnitWorker
wdfUnitDSWorker
Whitening
class processes.Whitening.Whitening(ARorder)[source]

This class is responsible for the communiction with whitening functions from pytsa

GetLV()[source]

This method returns LV object

Returns:LV object
GetSigma()[source]

This method returns the sigma parameter of the Whitening process

Returns:The sigma parameter of the whitened data
ParametersEstimate(data)[source]

This method estimates parameters of data by calling proper methods from pytsa

Parameters:data (pytsa.SeqViewDouble) – The Sequence View object containing the data to be processed
ParametersLoad(ARfile, LVfile)[source]

This method loads the calculated AR and LV parameter from the file

Parameters:
  • ARfile (basestring) – file for AutoRegressive parameters
  • LVfile (basestring) – file for Lattice View parameters
Returns:

Autoregressive and Lattice View

ParametersSave(ARfile, LVfile)[source]

This method saves the calculated AR and LV parameter to the file

Parameters:
  • ARfile (basestring) – file for AutoRegressive parameters
  • LVfile (basestring) – file for Lattice View parameters
Process(data, dataw)[source]

This method whitens the data by calling proper function from pytsa

Parameters:
  • data – pytsa.SeqViewDouble
  • dataw – pytsa.SeqViewDouble
DWhitening

Whitening

The utility functions

Filters

Indices and tables