Feature engineering - Dr. Ju's Blog

### 날짜 : 2023-12-08 17:10 ### 주제 : Feature engineering #마케팅 #공부 #dataanalytics #machinelearning #engineering ---- ### Feature engineering > Feature engineering is a crucial step in data preprocessing and machine learning where you create new features (variables) from the existing data to improve the performance of a predictive model. It involves transforming, selecting, or combining the available data attributes to make them more informative and relevant for the specific task at hand. Feature engineering plays a vital role in enhancing the predictive power of machine learning models and extracting valuable insights from complex datasets. Here's a deeper dive into feature engineering: **1. Feature Transformation:** - **Scaling:** Transform numerical features to have similar scales, preventing some features from dominating others. Common techniques include standardization (mean=0, variance=1) and min-max scaling (scaling values between 0 and 1). - **Log Transformation:** Apply logarithmic transformations to data to make it more linear or to normalize it, especially when dealing with skewed distributions. - **Binning:** Group continuous numerical values into discrete bins, making it easier to capture nonlinear relationships. - **One-Hot Encoding:** Convert categorical variables into binary (0/1) vectors for machine learning algorithms to process. Each category becomes a separate binary feature. **2. Feature Creation:** - **Polynomial Features:** Create polynomial features by raising numerical attributes to various powers. For example, adding squared or cubed terms can help capture nonlinear relationships. - **Interaction Features:** Combine two or more existing features to capture interactions between them. For instance, in e-commerce, combining "item price" and "quantity purchased" can yield a "total purchase value" feature. - **Date and Time Features:** Extract meaningful information from date and time attributes, such as day of the week, month, year, or time since a particular event. **3. Feature Selection:** - **Univariate Selection:** Select features based on statistical tests like chi-squared, ANOVA, or mutual information to identify the most relevant ones. - **Feature Importance:** Use ensemble methods like Random Forest or XGBoost to rank features by their contribution to predictive accuracy. - **Recursive Feature Elimination:** Iteratively remove the least important features from the dataset until the desired number remains. **4. Handling Missing Values:** - Address missing data by filling in missing values with appropriate statistics like the mean, median, or mode. - Create binary indicator features to mark the presence or absence of missing data in a particular attribute. **5. Encoding Categorical Variables:** - Use techniques like label encoding or one-hot encoding to convert categorical variables into a format suitable for machine learning algorithms. **6. Feature Scaling:** - Ensure that features are on a similar scale to prevent some features from dominating others during model training. **7. Domain-Specific Features:** - Engineer features based on domain knowledge and expertise, which can lead to more meaningful attributes that reflect the problem's nuances. Feature engineering is both an art and a science, and it requires a deep understanding of the data, the problem, and the algorithms being used. Well-crafted features can significantly enhance the performance of machine learning models, making them more accurate and interpretable. It's an iterative process that often involves experimentation and refining to find the most effective features for a given task. ### 출처(참고문헌) - ### 연결문서 - [[Predictive Modeling]] - [[9.5 Analytics and Data-Driven Marketing]]