Statement I: In Decision Trees the more balanced the dataset is at each node, the more will be the Entropy and the less will be the Information Gain.

Statement II: High Value of Entropy shows us that we have a balanced node and the impurity or disorder at that node is minimal.

Both I and II are Correct
Only I is Correct
Only II is Correct
Neither of them is Correct
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
Let’s just say, we have linearly separable data and there is a low latency requirement. Let’s assume that we have trained these four models.

i. Logistic regression with L1 regularization

ii. Support vector classifier with linear kernel

iii. Linear SVM ( plane found by minimizing hinge loss ) without any regularization

iv. Support Vector classifier with Polynomial kernel

i >= iii > ii >= iv
iv ~= ii > i >= iii
iii ~= ii >= i ~= ii
iv ~= ii > i ~= iii
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
It makes our cost function monotonic so that finding minima or maxima will be faster even with low learning rates
We cannot apply gradient descent if we don't add log, as the former one is a non convex function
Both a and b
Neither a nor b
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
The gradient of the optimization equation logistic regression with the L2 regularization with respective bais
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
Which of the following statements is not true about the Decision tree?
It starts with a tree with a single leaf and assign this leaf a label according to a majority vote among all labels over the training set
It performs a series of iterations and on each iteration, it examine the effect of splitting a single leaf

It defines some gain measure that quantifies the improvement due to the split
Among all possible splits, it either choose the one that minimizes the gain and perform it, or choose not to split the leaf at all
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
Given the entropy for a split, Esplit = 0.24 and the entropy before the split, Ebefore = 1. What is the Information Gain for the split? (Upto 2 decimals)
0.76
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
In a logistic regression problem, there are 200 instances. 170 people voted. 30 people did not cast their votes. What is the probability of finding a person who cast vote? (Upto two decimals)
((0.85,0.90))
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
In a logistic regression problem an instance is similar to 80 positive instances, 30 negative instances, dissimilar to 20 positive instances, 70 negative instances. What kind of an instance is this?
Negative instance
Positive instance
Cannot be determined, even if the threshold is given
Can be determined, if the threshold is given
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Which of the following statements is not true about SVM?
It has regularization capabilities
It handles non-linear data efficiently
It has much improved stability
Choosing an appropriate kernel function is easy
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Consider the three linearly separable two-dimensional input vectors in the following figure. Find the linear SVM that optimally separates the classes by maximizing the margin.

the equation for the classifier.

x+2=0
y+2=0
y-2=0
x-2=0
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
The effectiveness of an SVM depends upon __________
Selection of Kernel trick
Kernel Parameters
Soft Margin Parameter C
All of the above
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Consider a dataset with N different classes. You are trying to build a decision tree on top of this dataset. What will be the maximum entropy value for this complete dataset?
NlogN
-logN
logN
(1/N)logN
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Which of the following techniques is commonly used to prevent overfitting in neural networks?
Increasing the number of hidden layers
Reducing the size of the training dataset
Dropout
Using a linear activation function
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
What is the primary purpose of using backpropagation in neural networks?
To initialize the weights of the network
To perform forward propagation
To update the weights based on the error
To calculate the loss function
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Consider a neural network that uses batch normalization. What is the primary purpose of batch normalization in this context?
To prevent overfitting by randomly dropping neurons
To normalize the input data to a fixed range
To standardize the inputs to a layer, improving training stability and convergence speed
To introduce non-linearity into the model
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66