Feature Engineering and Machine Learning
The beginner's guide on how feature selection, feature extraction and feature transformation works...
Feature engineering is a very important process in machine learning and building models. It involves using practical, statistical, and data science knowledge to select, transform or extract features or attributes from raw data. In building predictive models, you have to consider both the predictor variables and the outcome variables. It is during the process of feature engineering that you create or extract the most useful predictor variables that you would need for your model.
Categories of Feature Engineering.
There are three main categories of feature engineering. They are:
Feature selection
Feature transformation
Feature extraction.
Feature selection: This is the method of reducing the dataset volume in your model by selecting only the relevant features you need. Here, you drop off redundant features to use in building your model. You simply identify and take note of the features you'll need to build your model.
Feature Extraction: You extract the features you need for your model.
Feature Transformation: Here, you take the features you already selected and alter them so they're beat suited for your model.
Benefits of Feature Engineering:
It gives a better model performance. By selecting and transforming the necessary features, you can increase your model's accuracy.
You get to understand more about the variables that affect the model. This now gives you a greater understanding on the features you need to build your model.
It reduces the amount of data you have to analyze. This makes your work easier, and gives faster and more effective models.
There is this saying that whatever has advantage would definitely have disadvantages and feature engineering is not an exception. As beautiful and helpful as fear engineering is, there are a few reasons why you might be discouraged from using this very important process. But before I continue, you need to know that these disadvantages are not automatic. They don't happen just because you deployed feature engineering. They only arise when your expertise isn't great enough. That being said, let's dive into the reasons why feature engineering might be a problem.
It demands an advanced technical skillet.
It is time-consuming.
Statistical tests and Class balance considerations in Feature Engineering.
While building your model, you also need to consider how balanced your dataset is. A class imbalance is when your dataset has predictor variables that contain more instances of one outcome than another. When this happens, the model tends to be biased towards the majority class, hence causing bad classification of the minority class. The dataset is not exactly expected to be exactly 50-50 split because that would be a very rare occurrence. However, that doesn't mean that a 50-50 split isn't the best or that it should be disregarded when it occurs. It is actually the best
A class imbalance is said to have occurred when the majority class makes up 90% or more of the dataset. So if this occurs, what can be done about it?
There are a few ways to handle such issue. Two of those ways include:
Down sampling
Up sampling
Now, you understand what feature engineering is all about. You understand why it is important and then the different stages of feature engineering. You might want to continue exploring other machine learning techniques and know more about how you can improve your model's functionality.