Module 6 Assignment

 A. Consider a population consisting of the following values, which represents the number of ice cream purchases during the academic year for each of the five housemates. 8,14,16,10,11

a) Compute the mean of this population




I created a vector called ice_cream_purchase from the values that was provided and calculated the mean of the values using the mean() function. The set.seed was to make sure the results of consistent. The mean of this population was calculated to be 11.8.



I created a sample of ice_cream_purchase using the sample() function and selected a random sample size of 2. I labeled it as ice.cream.sample. The random sample results were 16 and 14.





I computed the mean, sample, and standard deviation of the sample using the mean() and sd() functions. The mean is 15 and the standard deviation is 1.414214 for the sample.




I calculated the population mean which is 11.8 as well as the standard deviation of the population which is 3.193744. Next, I compared the sample mean (15) with the population mean (11.8) and found that the sample mean was greater than the population mean. By comparing the sample standard deviation with the population standard deviation, it showed that the sample standard deviation (1.41) was lower than the population standard deviation (3.19).

B. Suppose that the sample size n = 100 and the population proportion p = .95.

  1. Does the sample proportion p have approximately a normal distribution? Explain: The sample proportion p may not follow a normal distribution because it doesn't fully satisfy the conditions required by the Central Limit Theorem. Specifically, while np does meet the criterion, n(1-p) falls short. This is due to the fact that both np and n(1-p) should be at least 10 to ensure the sampling distribution of p is approximately normal. This threshold of 10 is commonly used, though in some contexts values like 5 or 15 may also be considered acceptable. For this case, with n = 100 and p = .95, we calculate np = 100 * 0.95 = 95 and n(1-p) = 100 * 0.05 = 5. Thus, while np satisfies the requirement, n(1-p) does not, meaning the approximation to normality is not guaranteed.
  2. What is the smallest value of n for which the sampling distribution of p is approximately normal? The smallest value of n for which the sampling distribution of p is approximately normal depends on the dataset and the level of approximation required. A common rule of thumb is that both np and n(1-p) should be at least 10 to satisfy the Central Limit Theorem. While different thresholds (such as 5 or 15) are sometimes used, 10 is often applied as a practical guideline. For this case p = 0.95, the smallest n must be 200, since 200(1 - 0.95) = 10. Thus, n = 200 is the minimum sample size needed for the normal approximation to be considered valid.
a. The population mean can be calculated as (8+14+16+10+11) / 5 which equals 11.8.
b. The sample of size is 5 because the population consists of five separate values.
c.














I created a vector for the population and then sampled it four times. Next, I found the mean for each sample and calculated the sample mean by adding each of the means of the four samples and then divided them by the total number of those means. This gave me 50.2 / 4 which equals 12.55.
d.












I found that the standard error for 'Qm" equals 1.96774 when Q = 4.4 and n = 5. I used the equation "Q / sqrt(n)" to find Qm. Next, I created a table where 'x' is the population and 'u' is the mean of the population. I plugged in these values into the equation (x-u)^2 and the results are the squared difference. Then I made a table with the following variables: X, u, and (x-u)^2. (x-u)^2 is essentially the same as saying the squared difference and the resulting table of the three variables is shown above. It can be seen that the mean of the population stays consistent whereas the population value and the squared difference values change for each of the five rows in the table. I tried to determine if the sample proportion p has an approximately normal distribution when p = 0.95 and q = 0.05. Since I chose to use 10 as the benchmark value, p*n is greater than this value and q*n is lower which indicates that the conditions needed for the sample proportion to have a normal distribution is not met.

C. Simulated coin tossing: is probability better done using function called rbinom than using function called sample? Explain:

The rbinom function is better for simulating coin tossing then the sample function because rbinom is very effective at producing random samples from binomial distributions as would be expected in a coin toss. For a coin toss, rbinom would measure heads/tails and the number of times the coin tossed. That way rbinom could generate many samples in a short amount of time. However, sample is not designed with this specific purpose in mind, and the result would be more time consuming to set up and simulate the toss as well as less efficient for generating many coin tosses. In short, rbinom would be easier to use and more efficient for simulating coin tossing.

Comments

Popular posts from this blog

Module 5 Assignment

Module 2 Assignment LIS4273