How to extract Features from Signals

Matteo Gambera
5 min readFeb 15, 2021

--

Concepts of Theory & Python code to speed up your project.

This article aims to explain how to extract features from signal in Statistical-Time domain and Frequency domain (it is also possible to extract features in Time-Frequency domain with Short-Time Fourier Transform or Wavelet Decomposition, but they need a separate article to be explained well).
The points that will be studied in this article are two:
1. Basics concept on Features Extraction
2. Features Extraction in Python

1. Basics concept on Features Extraction

To explain what we will talk about I borrow a definition of DeepAI:

“Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.”

a. Why do we need to reduce the number of data to be processed?
The problem is the computational load given by a large amount of data.
The extraction of the features allows to reduce the amount of data to be processed by transforming it into another data set, much smaller, but with the same level of completeness and accuracy of the information contained inside the original data set.

One example of feature extraction that all of us can relate to is spam detection software.
If we had a large collection of emails and the keywords contained in these emails, then a feature extraction process could find correlations among the various keywords. For example, the words Biden and election may appear to be correlated. Thus, the set of emails can now be described using a far smaller number of word phrases than what we started out with. For example, you can tell whether the email is a current news item about the U.S. presidential election or someone who want to scam you.

b. How important is the features selection phase?
The process of feature extraction is perhaps the most important one in the entire Machine Learning pipeline. Good features depicting the most suitable representations of the data help in building effective Machine Learning models. In fact, more than often it’s not the algorithms but the features that determine the effectiveness of the model. In simple words, good features give good models. A data scientist approximately spends around 70% to 80% of his time in data processing, wrangling, and feature engineering for building any Machine Learning model.

2. Features Extraction

In a complete project, the steps to be performed before arriving at the extraction of the features are many, the main ones can be divided into four macro phases, each with criticalities to be recognized and solved in order to obtain a performing machine learning model.

  • Dataset Analysis
  • Preprocessing
  • Features Extraction & Features Selection
  • Normalization

Our goal is to try to extract features from a generical signal, so I won’t go through all the steps.

M. Barandas, D. Folgado et al. / SoftwareX 11 (2020) 100456

For this exemple I decided to choose from a public dataset an acquisition of an accelerometer used in the “Human Activity Recognition” experiment in which we want to determine the activity that a person is performing through the use of a mobile phone.

As a result of this piece of code you should get a dataframe with an accelerometer acquisition.
In this case we are not interested in seeing if there are outlayers, if the data is already normalized or if there is something wrong.
All these types of analysis must absolutely be done if the data we used were needed for a particular project, but since I only want to show how to extract features from a generic signal, I am not interested in evaluating the reliability, integrity and consistency of the data.

In this case it might be interesting to divide the signal into windows and extract the respective features for each window, but i prefer avoid doing too much at once.

The second part of the code intends to define a function that takes in input a list of values and returns a table with the features.
The features to be analyzed are divided into two types:

  • Time-domain graph shows how a signal changes with time
Time-Domain Features
  • Frequency-domain graph shows how much of the signal lies within each given frequency band over a range of frequencies.

For frequency domain feature first you must obtain the FFT of the function and the corresponding power spectrum.

Frequency-domain Features

This function therefore allows to obtain a fair amount of features given in input a signal.
There are many other features that can be exploited and can easily be included in the script without having to modify too much.

Once the features for each analyzed signal have been obtained, it is also possible to normalize them so that a machine learning algorithm does not give too much weight to a particular one, it is also recommended to exploit feature selection algorithms such as PCA and PCC to reduce the number of features by obtaining a benefit on the computational load.
In this case, the output data structure is not even designed to be conveniently inserted in the classic ML algorithms.

Conclusion

The intent of this article was to define a very simple and immediate guide for those who are just starting out with the extraction of features from any type of signal (vibrations, acoustics, etc.), unfortunately many topics have remained not explained or not even considered.
If any part is not clear or needs a more detailed explanation, I ask you to report it to me, so I can try to review the concept by simplifying it or deepening it. Thank you

--

--

Matteo Gambera

Automation Engineer & Founder @Stema, I talk about data applied to problems and decisions, I also tell about startups and team management