Consider the dataset as shown below:

Given a dataset, you are applying KNN on top of it with k=3. It is given that it is a two-class classification problem. Given this which is leave-one-out cross-validation accuracy for 3-NN (3-nearest neighbor)?

0
0.4
0.8
1
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
k-NN with Euclidean distance as the metric. We decide to normalize the features. What effect will this have on the distance calculations?
No effect on distance calculations
Increase the distances between points
Decrease the distances between points
Equalize the contribution of each feature to the distance
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
After performing PCA on a dataset, we find that the first four principal components have eigenvalues of 6, 4, 2,1. What percentage of the total variance is explained by the first principal component?
50%
46%
41%
39%
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33

You perform 5-fold cross-validation on a dataset. In one of the folds, the model's accuracy is 80%, and in another, it is 90%. If the remaining three folds have an accuracy of 85% each, what is the overall cross-validated accuracy in percentage?

70%

80%

85%

91%

Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
A model with a high degree of flexibility is trained on a dataset. During cross-validation, it shows very low training error but high validation error. This scenario indicates:
High bias, low variance
Low bias, high variance
Low bias, low variance
High bias, high variance
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
You perform PCA on a dataset with 10 features and get the following eigenvalues for the principal components: [5, 2.5, 1.5, 1, 0.7, 0.5, 0.4, 0.3, 0.1, 0.05]. What is the minimum number of principal components required to retain at least 90% of the variance?
4
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
In k-Nearest Neighbors (k-NN), what effect does increasing the value of k typically have on the decision boundary?
The decision boundary becomes more complex and wiggly.
The decision boundary becomes smoother and less complex.
The decision boundary becomes more sensitive to outliers.
The decision boundary becomes random and unpredictable.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
In the context of bias-variance trade-off, which of the following models is most likely to have high bias and low variance?
Decision tree with many splits
k-Nearest Neighbors with a small k.
Linear regression on a non-linear dataset.
Deep neural network with many layers.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
In PCA, what is the effect of centering the data by subtracting the mean before computing the covariance matrix?
It maximizes the variance of the principal components.
It ensures that the first principal component explains the largest variance.
It shifts the data points closer to the origin without affecting the covariance structure.
It reduces the dimensionality of the data.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
In Leave-One-Out (LOO) cross-validation, what is the main disadvantage compared to k-fold cross-validation?
It requires less computational power.
It leads to high bias in the model.
It can be computationally expensive, especially for large datasets.
It reduces variance more effectively than k-fold cross-validation.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Adataset with 10,000 features. We apply PCA and reduce the dimensionality to 100 components. We then apply a classification algorithm that performs poorly. What is the most likely reason for the poor performance?
PCA did not reduce enough dimensions
PCA reduced too many dimensions
The classification algorithm is not suitable for the data
The dataset has high bias
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Consider the following statements:

Statement I: KNN makes an assumption about the given dataset that the given dataset is non-linear.

Statement II: KNN algorithm has a quadratic space complexity, i.e O(N^2) where N is the total number of datapoints

Only I is correct
Only II is correct
Both I and II are correct
Both I and II are incorrect
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
We can increase the value of k, as the fitted line is affected by noise.
We can decrease the value of k, as the fitted line is not fitting the data well.
If there is noise, it can't be handled by changing the value of k
The fitted line is fine now, no need to change anything
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Which of the following are true about principal components analysis (PCA)? Assume that no two eigenvectors of the sample covariance matrix have the same eigenvalue. [MSQ]
Appending a 1 to the end of every sample point doesn’t change the results of performing PCA (except that the useful principal component vectors have an extra 0 at the end, and there’s one extra useless component with eigenvalue zero)
If you use PCA to project d-dimensional points down to j principal coordinates, and then you run PCA again to project those j-dimensional coordinates down to k principal coordinates, with d > j > k, you always get the same result as if you had just used PCA to project the d-dimensional points directly down to k principal coordinates.
If you perform an arbitrary rigid rotation of the sample points as a group in feature space before performing PCA, the principal component directions do not change.
If you perform an arbitrary rigid rotation of the sample points as a group in feature space before performing PCA, the largest eigenvalue of the sample covariance matrix does not change.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
For a dataset with 100 samples, We decide to use Leave-One-Out Cross-Validation (LOO-CV). How many times will the model be trained and evaluated?
100
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00