Dataset Preparation and Preprocessing
Splitting: To provide a fair data training and to get generalized performances in learning , three different splitting methods are provided in ProFAB. These are random splitting, similarity based splitting and temporal splitting.
Featurization: ProFAB provides featurization and scaling methods to create numerical feature vectors and to preprocess the data. Using the methods in this module, the protein sequences can be converted into numerical feature vectors to train predictive models.
Scaling: To scale the feature vectors, ProFAB allows users to apply five different methods: standard scaler is scaling the feature values of the data by subtracting the mean and scaling to unit variance while normalizer is doing the same job not feature-wise but sample-wise.