Data reduction strategies include dimensionality reduction, numerosity reduction, and data compression.
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.00
For an input vector of length n, the fast Discrete Wavelet Transformation (DWT) technique has a time complexity of
O(n)
O(log(n))
O(nlog(n))
O(n^2)
Correct Answer
Option 1
Solution
By decomposing the DWT matrix into a series of sparse matrices, you effectively reduce the computational burden. This results in an algorithm with linear complexity O(n), which is highly efficient for large datasets.
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
Which of the following is/are TRUE regarding Principal Component Analysis (PCA) ? [MSQ]
PCA can be applied to ordered attributes.
PCA can be applied to unordered attributes.
PCA can handle sparse data and skewed data.
PCA cannot handle multidimensional data.
Correct Answer
Option 1,2,3
Solution
PCA can be applied to ordered and unordered attributes, and can handle sparse data
and skewed data. Multidimensional data of more than two dimensions can be handled by reducing the problem to two dimensions. Principal components may be used as inputs to multiple regression and cluster analysis.
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.00
Which of the following is not a Data Transformation Strategy?
Smoothing
Attribute construction
Discretization
None of the above
Correct Answer
Option 4
Solution
Strategies for data transformation include the following: Smoothing, Attribute construction, Discretization, Normalization, Aggregation, Hierarchy generation.
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
What is the main goal of min-max normalization in data preprocessing?
To standardize data to have zero mean and unit variance
To scale data to a fixed range, typically [0, 1]
To convert categorical data into numerical data
To reduce data dimensionality
Correct Answer
Option 2
Solution
Min-max normalization is specifically designed to transform features to lie within a specified range, most commonly [0, 1]. This scaling helps in ensuring that each feature contributes equally to the model. Standardization (zero mean and unit variance) is a different technique, and min-max normalization does not address categorical data conversion or dimensionality reduction.
Difficulty Level: 1
Positive Marks: 1.00
Negative Marks: 0.33
Given a dataset feature x with values ranging from 10 to 100, it is required to normalize it to the range [0, 1]. The normalized value of x=55 is ______. (Round of upto one decimal)
0.5
Correct Answer
Option 1
Solution
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
Which of the following statements is true about min-max normalization?
It is sensitive to outliers because it scales based on the minimum and maximum values.
It is insensitive to the minimum and maximum values of the feature.
It converts data to a normal distribution.
It reduces the dimensionality of the data.
Correct Answer
Option 1
Solution
Min-max normalization can be significantly affected by outliers because it relies on the minimum and maximum values of the feature. Extreme values can compress the range of the rest of the data into a very narrow interval, affecting the normalization process. It does not convert data to a normal distribution or reduce dimensionality.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
If you apply min-max normalization to a dataset where the original values are [5, 10, 15, 20, 25], and the normalized values are [0, 0.25, 0.5, 0.75, 1], what is the original minimum value and maximum value of the feature?
Minimum: 5, Maximum: 25
Minimum: 0, Maximum: 1
Minimum: 10, Maximum: 20
Minimum: 1, Maximum: 5
Correct Answer
Option 1
Solution
In min-max normalization, if the normalized values range from 0 to 1, then the original minimum corresponds to the normalized value 0 and the original maximum corresponds to the normalized value 1.
If normalized values are [0, 0.25, 0.5, 0.75, 1], then:
The original minimum (corresponding to 0) is 5.
The original maximum (corresponding to 1) is 25.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.66
Suppose that the minimum and maximum values for the attribute income are $12,000 and $98,000, respectively. We would like to map income to the range [0.0, 1.0]. By min-max normalization, a value of $73,600 for income is transformed to _______ (Round of upto 3 decimals)
0.716
Correct Answer
Option 1
Solution
x is the value to be normalized ($73,600),
min is the minimum value of the attribute ($12,000),
max is the maximum value of the attribute ($98,000).
X’ = 0.716
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
Suppose that the mean and standard deviation of the values for the attribute income are $54,000 and $16,000, respectively. With z-score normalization, a value of $73,600 for income is transformed to _________ [Upto 3 decimals]
1.225
Correct Answer
Option 1
Solution
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
Consider the sorted data for price (in dollars): 4, 8, 15, 21, 21, 24, 25, 28, 34. The bins are created as follows:
Smoothing the bins by “mean” is performed. Identify which of the following are correct
Bin 1: 9, 9, 9
Bin 2: 22, 22, 22
Bin 3: 29, 29, 29
Bin 3: 28, 28, 28
Correct Answer
Option 1,2,3
Solution
In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. The mean of the values 4, 8, and 15 in Bin 1 is 9. The mean of the values in Bin 2 is 22, and Bin 3 is 29.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
Consider the sorted data for price (in dollars): 4, 8, 15, 21, 21, 24, 25, 28, 34. The bins are created as follows:
Smoothing the bins by “bin boundaries” is performed. Identify which of the following are correct.
Bin 1: 4, 4, 15
Bin 2: 21, 21, 24
Bin 3: 25, 25, 34
Bin 3: 25, 28, 28
Correct Answer
Option 1,2,3
Solution
In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
You have a population of 10 students, and you want to select a simple random sample of size 4 without replacement. The students are numbered from 1 to 10. What is the probability of selecting students 3, 6, 7, and 9 in the sample? _______ (Upto 4 decimals)
((0.0047,0.0048))
Correct Answer
Option 1
Solution
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
You have a population of 5 distinct items: A, B, C, D, and E. You want to select a simple random sample of size 3 with replacement. What is the probability of selecting the items A, B, and C in that order? (Upto 3 decimals)
0.008
Correct Answer
Option 1
Solution
Difficulty Level: 1
Positive Marks: 2.00
Negative Marks: 0.00
A university has 20 departments, and you want to select a sample of 5 departments to survey. If you randomly select 5 departments out of the 20, what is the probability of selecting the specific departments 1, 4, 7, 10, and 13?