Quiz Questions

In bottom-up clustering, if you start with 5 individual points, how many merges are required to form a single cluster?

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

To merge 5 individual points into a single cluster, you need (n-1) merges, where (n) is the number of points. So, (5-1 = 4) merges are required.

K- Means clustering algorithm

Agglomerative clustering algorithm

Diverse clustering algorithm

1 only

2 and 3

1 and 3

All of the above

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Which of the following options is/are true for K-fold cross-validation?

An increase in K will result in a higher time required to cross-validate the result.

Higher values of K will result in higher confidence in the cross-validation result as compared to a lower value of K.

If K=N, then it is called Leave one out cross validation, where N is the number of observations.

1 and 2

2 and 3

1 and 3

1,2 and 3

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Suppose, you have given the following data where x and y are the 2 input variables and Class is the dependent variable.

Suppose, you want to predict the class of new data point x=1 and y=1 using eucludian distance in 3-NN. In which class this data point belong to?

+ Class

– Class

Can’t say

None

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Suppose, you have given the following data where x and y are the 2 input variables and Class is the dependent variable.

Suppose you now want use 7-NN instead of 3-KNN which of the following x=1 and y=1 will belong to?

+ Class

– Class

Can’t say

None of the Above

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

In k-NN what will happen when you increase/decrease the value of k?

The boundary becomes smoother with increasing value of K

The boundary becomes smoother with decreasing value of K

Smoothness of boundary doesn’t dependent on value of K

None of these

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

We are representing the SVM optimization problem as Min [ ½ C * W^T.W + K * sum( max[ 1-y_i(W^T.x+b), 0 ] ) ] where C, K are hyperparameters and we are optimizing using gradient descent methods. Interpret above loss and choose the correct option.

1. If C increases, the margin width decreases

2. If C increases, margin width increases

3. If C increases, the number of support vectors may decrease.

4. If K increases, the margin width decreases

1.3

2,4

1,4

3,4

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Consider the following statements:

Statement I: PCA works well for non-linearly correlated data as well.

Statement II: It is not necessary that the principal components found out by PCA should always be orthogonal.

Only I is correct

Only II is correct

Both I and II are correct

Both I and II are incorrect

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Imagine you are solving a classification problem(Cancer Detection) with a highly imbalanced class, the majority class is observed 99% of the time in the training data.

Your model has 99% accuracy after taking the predictions on the test data.

Which of the following is true in such a case? [MSQ]

Accuracy metric is not is a good idea for imbalanced class problems

Accuracy metric is a good idea for imbalanced class problems

F-1 score is a good metric for imbalanced class problems

Precision and recall metrics are not good for imbalanced class problems

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.00

A 2-dimensional data is classified in two classes using support vector machines as shown in the image below:

Given the above image where points 1,2,3,4,5 are support vectors which of the following statements are true

If 6 is removed from the dataset the margin passing through point 5 will move towards right.

If point 7 is removed from the dataset then all the margins(both pi+ and pi-) and decision boundaries will remain unchanged

If point 5 is removed from the dataset then both the positive margin and the negative margin (both pi+ and pi-) will be affected

If 4 is removed from the dataset then all the margins(both pi+ and pi-) and the decision boundaries will remain unchanged

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Consider the dataset, S given below

Elevation, Road Type and speed Limit are the features and Speed is the target label that we want to predict.

Find the entropy of the dataset, S as given above:

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.00

w1 = 1, w2 = 1, b = -2

w1 = 1, w2 = 0, b = -2

w1 = 0, w2 = 1, b = -2

w1 = 2, w2 = 1, b = 4

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

We want to predict whether a user would watch a movie or not. Each movie has a certain number of features, each of which is explained in the image.

Now take the case of the movie Avatar having the features vector as [9,1,0,5]. According to an algorithm, these features are assigned the weights [0.8, 0.2, 0.5, 0.4] and bias=-10. For a user X, predict whether he will watch the movie or not if the threshold value(θ) is 10?

Note: If the output of the neuron is greater than θ then the user will watch the movie otherwise not.

Yes, the user will watch the movie with neuron output = -0.6

No, the user will not watch the movie with neuron output = -0.6

No, the user will watch the movie with neuron output = 2.5

Yes, the user will watch the movie with neuron output = 2.5

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

Consider the dataset that is given. It has 3 features f1, f2 and f3 and output variables as Y.

Assume that we are using Naive Bayes to find Y given features. Then what is the probability that Y=1 given that f1=1, f2=1 and f3=0.

0.5

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.00

Consider this sample test data with two features f1, f2 and a response variable y.

You are given with two equations, choose the best suitable option by considering SSE (Sum of Squared Errors) as the error.

Y = 2f1 + f2

Y = f1 + 1.5f2

Either a or b

None

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

p-(i), q-(ii), r-(iii)

p-(iii), q-(ii), r-(i)

p-(ii), q-(iii), r-(i)

p-(iii), q-(i), r-(ii)

Difficulty Level: 1

Positive Marks: 1.00

Negative Marks: 0.33

In K-means clustering algorithm, where k = 3, if 2 points (1, 1) & (–1, 1) are clustered in one cluster and assuming Euclidean distance as metric. Find the point that surely lies in the same cluster.

(0, 1)

(2, 0)

(0, 2)

(0, 0)

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

You have a simple linear regression model where the estimated relationship between the dependent variable y and the independent variable x is given by:

y^=3+2x

You are given the following data points:

x=[1,2,3,4,5],y=[5,7,9,11,13]

What is the Sum of Squared Errors (SSE) for this model?

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.00

A Support Vector Machine with a linear kernel is applied to a dataset with two classes. The following data points are support vectors: (1,1),(2,2),(3,3)) from Class 1 and (−1,−1),(−2,−2),(−3,−3) from Class 2. What is the equation of the decision boundary?

x1+x2=0

x1−x2=0

x1+2x2=0

2x1+x2=0

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Given the following covariance matrix for a dataset:

Σ=[4 2

2 3]

Calculate the eigenvalues and determine the percentage of variance explained by the first principal component.

5, 2.5; 66.67%

6, 1; 75%

5, 2; 60%

4, 3; 50%

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Consider a k-nearest neighbors classifier with k=3 applied to the following dataset for binary classification:

A new data point (3,4) needs to be classified. What will be the predicted class for this data point?

Class 0

Class 1

Cannot determine

Tie

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

In decision trees, pruning is used to prevent overfitting. Which of the following statements is true regarding pruning methods?

Pre-pruning involves stopping the tree growth before it reaches full depth based on a criterion.

Post-pruning involves growing a full tree and then removing nodes that do not provide significant information gain.

Both A and B are true.

Neither A nor B are true.

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

In a multi-layer perceptron, which of the following statements about weight initialization is true?

Initializing all weights to zero leads to the best training performance.

Initializing weights to small random values breaks symmetry and is generally preferred.

Weights should be initialized to large random values for faster convergence.

Initializing weights to the same value for all layers ensures equal learning.

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Consider a decision tree where a node contains 20 instances from Class A and 30 instances from Class B. Calculate the Gini index of this node. (Upto 2 decimals)

((0.49,0.50))

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.00

In a binary decision tree, you have a node with 30 positive and 10 negative samples. What is the entropy of this node? (Upto two decimals)

0.80

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.00

Why is the ReLU (Rectified Linear Unit) activation function preferred over the sigmoid activation function in deep neural networks?

ReLU is more computationally expensive than sigmoid.

ReLU does not suffer from the vanishing gradient problem, unlike the sigmoid function.

ReLU is non-differentiable, making it harder to optimize.

ReLU outputs values only between 0 and 1.

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Which of the following best describes the role of the kernel function in a Support Vector Machine (SVM)?

It reduces the dimensionality of the input space.

It transforms the input data into a higher-dimensional space to make it linearly separable.

It normalizes the input data to have zero mean and unit variance.

It ensures that the margin is maximized in the input space.

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Given 10-dimensional data, you are required to apply a dimensionality reduction algorithm and transform it to 2-dimensional. This is done using Principal component analysis. Based on this information which of the following might be the 2 dimensions of the dataset?

[1,2,3],[5,-2,2]

[4,5,6],[-2,5,-4]

[1,2,3],[-7,2,1]

[6,7,8],[2,4,5]

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Imagine you are solving a classification problem with a highly imbalanced class.

The majority class is observed 99% of the time in the training data. Your model has 99% accuracy after taking the predictions on the test set. Which of the following is true in such a case?

The accuracy metric is not a good idea for imbalanced class problems.

The accuracy metric is a good idea for imbalanced class problems.

Precision and recall metrics are good for imbalanced class problems.

Precision and recall metrics aren’t good for imbalanced class problems.

1 and 3

1 and 4

2 and 3

2 and 4

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

In a dataset with 1000 points, you apply the K-medoids algorithm with K=4 If each iteration takes 0.5 seconds and the algorithm converges in 20 iterations, how much time does it take? (In seconds)

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.00

In hierarchical clustering, which method tends to create elongated clusters by linking the closest points between clusters?

Complete Linkage

Average Linkage

Ward's Method

Single Linkage

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Suppose you have five data points in one-dimensional space: 1,3,8,10,12. You initialize the centroids for K-means clustering as 3 and 10. What will be the new centroid for the first cluster after one iteration?

6.5

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Which of the following assumptions is made by the Naive Bayes classifier?

Features are dependent on each other.

Features are independent given the class label.

The data is normally distributed.

There is no noise in the data.

Difficulty Level: 1

Positive Marks: 2.00

Negative Marks: 0.66

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution

Correct Answer

Solution