Friday, August 31, 2012

Sources, Types of Data and Scales of Measure

Data
Facts and figures collected, analysed and summarized for presentation and interpretation.

Sources of Data
Existing Sources: Reports, records, public databases, census etc
Surveys
Experiments
Observational studies

Categories of Data

Primary Data: Directly collected by the investigator. Data is collected through direct interviews, surveys, experiments etc.

Secondary Data: Data that is collected from another source. Such data has already been collected and analysed by some agency and is reused by the researcher. Secondary data is available in research papers, journals, reports, census etc.

Classification of Data

Qualitative Data
-Nominal: There is no meaningful sequence in the data, it gives only a qualitative understanding. For example male/female, married/unmarried. Nominal data usually gives categories in the data.
-Ordinal: An order exists in the data. Responses in the questionnaire of the type: very bad, bad, neutral, good, very good etc.

Quantitative Data
-Discrete Data: Data that can be counted or countable data. Usually , there is only a finite number of possible values of discrete data.
-Continuous Data: Such as measurement. Usually anything you have to use a measuring device for is continuous data. This type of data is usually associated with some sort of physical measurement.

Scales of Measure

Nominal Scale: Scale for grouping into categories. There is no intrinsic order in data. e.g. eye color. It is qualitative data.
Basic empirical operations-determination of equality.

Ordinal Scale: There is rank ordering in the data but it does not give relative size or degree of difference between the items measured. It is a scale for ordering observations from low to high with lack of measurement sensitivity.
Basic empirical operations- determination of greater or less.

Interval Scale: Scale with fixed or defined interval e.g. temperature, time etc All attributes are measurable in interval scale. Zero is arbitrarily set in this scale and negative values can also be used. e.g. temperature. Ratio of numbers on this scale are not meaningful, the differences however can be compared.
Basic empirical operations-determination of equality of intervals or differences

Ratio Scale: Data at the ratio level possess all of the features of the interval level, in addition to an absolute zero value(no numbers exist below zero). Due to the presence of a zero, the ratios of measurements can be compared. Most of the data in physical sciences and engineering belong to this scale.


PS: Suggestions/Corrections/Comments are welcome.

Statistics: Definition, Functions, Scope and Limitations

In very brief:

Definition of Statistics
Statistics is the science of collection, presentation, analysis and interpretation of numerical data.

Functions of Statistics
Condensation of Data: Statistics helps in reducing the complexity of data and consequently to understand vast amount of data. Classification and tabulation are two important methods to condense data.
Facilitates Comparison: Statistics helps in comparing data obtained from different sources.Measures of central tendency, measure of dispersion, correlation coefficients, graphs etc are tools that are used for comparison.
Draw Inferences: Statistics helps in drawing inferences about a population from the analysis of a sample drawn from that population.
Testing of Hypothesis: Tests whether a statement made on a population based on the available information is valid or not.
Forecasting: Analysis of time series and regression are some of the tools that are used to predict or estimate beforehand.

Scope of Statistics:
Industry: Control charts and inspection charts.
Commerce: Stocking of material based on the estimated demand.
Agriculture: ANOVA is used in agriculture to test the equality of different population means.
Medicine: t-tests are used to test effectiveness of a new drug or compare the effectiveness of two drugs.
Economics: Index numbers, time series analysis, hypothesis testing are widely used in economics.
Planning: Statistical data related to production, consumption, demand, investment, expenditure and advanced statistical techniques are used for making policy decisions.

Limitations:
Statistics should only be used as a supportive tool in the study of a population.
Statistical laws are not exact.
It studies characteristics of population not individuals.
Not suitable for study of qualitative phenomenon. However if such phenomenon can suitable and accurately be represented in numerical form then statistical methods can be used for study.

PS: Corrections/Additions/Comments are welcome.

How Binomial and Poisson Distributions Vary

Plotting a Binomial Distribution in Excel:
Use the function = BINOMDIST(x, n, p, TRUE/FALSE)
You have to use either TRUE or FALSE is the past parameter.
TRUE is used when you want to calculate cumulative probability, and FALSE if you want to calculate probability for a particular trial.

I used FALSE since I wanted to calculate individual probabilities for different values of x. Here is the result of what I plotted:


Plotting a Poisson Distribution in Excel:
Use the function = =POISSON(x, λ, TRUE/FALSE)
Again True is used when we want to calculate cumulative probability. If you want to calculate probability for a particular value of x use FALSE.
I used False to calculate the probabilities for different values of x. Here is the result:



Conclusions you can draw from these plots:
1. For p=0.5 binomial distribution is symmetric.
2. As p increases for fixed n, binomial appears to shift towards the right.
3. Poisson distribution tends to behave like bell curve(Normal Distribution) for large values of λ.

PS1: Excel sheet is attached in case you want to experiment and learn more.
Link:-  Probability Distributions Binomial and Poisson

PS2: Corrections/suggestions are welcom.

Thursday, August 30, 2012

Probability Distributions Used In Acceptance Sampling

In basic acceptance sampling techniques three most common distributions used are:
Hypergeometric: Parameters N, X, n
Binomial: Parameters n,p
Poisson: Parameter λ
Normal: Parameter µ, σ

Approximations

1. When N is large or n/N <0.1 then Hypergeometric (N, X, n) → Normal (n, X/N)
2. When n → ∞ and p → 0  such that np → λ (some constant) then Binomial (n, p) → Poisson(λ)
3. For large values of λ Poisson → Normal

A hypergeometric distribution has finite population, a binomial has infinite or very large population or the experiment is done on a finite population with replacement and a poisson has infinite chances of occurrences.

PS: Corrections/Additions/Suggestions are welcome.

Infix, Prefix, Postfix

What is the need of prefix and postfix notation?
Apparently computer does understand the operator precedence but if there is a big expression like:
y = a + b*c + k/{d + e/(a+g+h*d)} - g + (e+ f/(s+t*k/(blah*klah)))/u
with lots of parenthesis' there is no way it understands which part of the expression to evaluate first.

So to make it easier for machine to interpret and evaluation expressions according to rules of mathematics which we understand implicitly, postfix and prefix notation is used.

Infix: 
The expressions that we have been seeing since kindergarten are all infix expressions. Just to refresh your memory:
3+5*6 -9 
is an infix expression. You can see all the operators in infix expression are in-between the operands.

Prefix Notation:
In prefix notation operators is written before the operands in the expression. So
A+B will be written as +AB.

How to write Prefix expressions?
Take the expression:
A-B/(C*D$E)

Step 1: You understand the small calculations you have to make in order to evaluate this expression correctly. Lets show them in parenthesis:

A-(B/(C*(D$E)))

Step 2: Now starting with the innermost parenthesis start writing the prefix expression. I'll show it in steps to make it easier to understand:
(i) A-(B/(C*(D$E)))
(ii) A-(B/(C*($DE)))
(iii) A-(B/(*C($DE)))
(iv) A-(/B(*C($DE)))
(v) -A(/B(*C($DE)))

Step 3: Remove all the parenthesis after you are done with converting the expression into prefix. We get:
-A/B*C$DE

Postfix Notation
Operators are written after the operands. Simply put:
A+B will be written as AB+

How to write Postfix expressions?
We will take the same example as above.
A-B/(C*D$E)

Step 1: You understand the small calculations you have to make in order to evaluate this expression correctly. Lets show them in parenthesis:

A-(B/(C*(D$E)))

Step 2: Now starting with the innermost parenthesis start writing the postfix form. I'll show it in steps to make it easier to understand:
(i) A-(B/(C*(D$E)))
(ii) A-(B/(C*(DE$)))
(iii) A-(B/(C(DE$)*))
(iv) A-(B(C(DE$)*)/)
(v) A(B(C(DE$)*)/)-

Step 3: Remove all the parenthesis after you are done with converting the expression into prefix. We get:
ABCDE$*/-


PS1: Better and alternate ways of thinking are always there. Do suggest some in your comments.
PS2: For reference, the document is attached: Infix, Prefix, Postfix

Wednesday, August 22, 2012

Pre-Increment, Pre-Decrement, Post-Increment & Post-Decrement in C

In our context:
Pre means the increment or decrement is done before the statement is executed.
Post means the increment or decrement is done after the statement is executed.

So,

i = ++i + i++ + i-- + ++i;
      (a)     (b)    (c)     (d)
is equivalent to the following set of statements:
i = i+1; (a)
i = i +1; (d)
i = i + i + i + i; (the statement)
i = i + 1; (b)
i = i -1; (c)

Case 1: 
So if initially the value of i was i = 10, then the value of i after the statement is executed will be:
i = 10 + 1;
i = 11 + 1;
i = 12 + 12 + 12 + 12;
i = 48 + 1;
i = 49-1;

Therefore if you use printf("%d", i) after the statement you will get 48 as the answer.

Case 2:
Now consider the case: 
j = ++i + i++ + i-- + ++i;
printf("%d", j);

Then these statements are equivalent to:
i = i+1;
i = i +1;
j = i + i + i + i; (the statement)
i = i + 1;
i = i -1;
printf("%d", j);

Therefore if i is 10 initially, 
i = 10 + 1;
i = 11 + 1;
j = 12 + 12 + 12 + 12;
i = 12 + 1;
i = 13 - 1;
printf("%d", j); will give j = 48. The final value of i in this case is 12.

PS: This logic holds for C and C++. It may differ in Java.

Sunday, August 19, 2012

Implementation Of Stack Using Arrays In C

Language: C
Compiler: Turbo C

#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
void main(){
clrscr();
int stack[10],top=-1,  choice =0, entry=0, pop, i;
//insert, delete, display, underflow, overflow
while(choice!=4){
    printf("\nChoose from one of the following options:\nPress 1 to insert a number.\nPress 2 to delete a number.\nPress 3 to display a number.\nPress 4 to exit.\n\nYour choice is? ");
    scanf("%d", &choice);
    switch(choice){

    case 1:
    clrscr();
    printf("\nPlease enter the number: ");
    scanf("%d", &entry);
    if(top==9){
    printf("\nOverflow! The stack is full.\n");
    }
    else{
    stack[++top]=entry;
    clrscr();
    printf("\nYour number has been entered!\n");
    }
    break;

    case 2:
    clrscr();
    if(top==-1){
    printf("\nThe stack is empty.\n");
    }
    else{
    pop = stack[top];
    top--;
    printf("\nThe number %d has been deleted.\n", pop);
    }
    break;

    case 3:
    clrscr();
    if(top!=-1){
        for(i=0;i<=top;i++){
        printf("%d ", stack[i]);
        }
    }
    else{
    printf("\nThe stack is empty.\n");
    }
    break;

    case 4:
    exit(0);

    default:
    clrscr();
    printf("\nInvalid Entry!\n");
    //choice =0;
    break;
    }//end of switch

}//end of while
}//end of main

PS: Many improvements are possible in the program. You are welcome to suggest them in comments! :)

Binary Search & Bubble Sort Using C

Language: C
Compiler: Turbo C

#include<stdio.h>
#include<conio.h>


void main(){
clrscr();
int array[] = {12,3,4,67,89,2,4,67,7,8}, num=0, upper =0, lower=0, limit = 10, search = 0;
printf("Enter the number to search:");
scanf("%d",&num);

//sorting
int i=0, j =0, sort = 0, swap=0;
for(i=0;i<9;i++){
    sort=1;
    for(j=1;j<10;j++){
        if(array[j]<array[j-1]){
        swap = array[j];
        array[j]=array[j-1];
        array[j-1]=swap;
        sort=0;
        }
    }
    if(sort==1){
    break;
    }

} //end of sorting
//for(i=0;i<10;i++){
//    printf("%d ",array[i]);

//}
//getch();
//algorithm for binary search
upper = limit-1;
lower=0;
while(upper>=lower){
    if(array[(upper+lower)/2]==num){
        printf("Your search is successful.");
        search =1;
        break;
    }
    else if(array[(upper+lower)/2]>num){
        upper = (upper+lower)/2-1;
           //lower = 0;
    }
    else if(array[(upper+lower)/2]<num){
        lower=(upper+lower)/2+1;
    }
}
if(search ==0){
printf("Number not found.");
}
getch();
}//end of main


PS: Many improvements are possible in the program. You are welcome to suggest them in comments! :)

A Bubble Sort Program in C

This program just demonstrates what a Bubble Sorting program looks like:
Compiler used: Turbo C
Language: C
 
#include<stdio.h>
#include<conio.h>

void main(){
clrscr();
int arr[]={10,20,30,3,6,23,4}, i=0, swap=0, sort=0, j=0;
for(i=0;i<6;i++){
    sort=1;
    for(j=1;j<7;j++){
        if(arr[j]<arr[j-1]){
        swap=arr[j];
        arr[j]=arr[j-1];
        arr[j-1]=swap;
        sort = 0;
        }//end of if statement
    }
    if(sort==1){
    break;
    }
}
for(i=0;i<7;i++){
    printf("%d ",arr[i]);
}

getch();
}

PS: Many improvements are possible in the program. You are welcome to suggest them in comments! :)

Implementation Of Circular Queue Using Arrays


Compiler used: Turbo C
Language C

#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
void main(){
clrscr();
//queue deletion from front, insertion from rear
//check for overflow/underflow
int queue[10], front=-1, rear=-1, choice=0, entry=0, i;
while(choice!=4){
    printf("\nChoose from the following options:\nPress 1 to insert.\nPress 2 to delete.\nPress 3 to display.\nPress 4 to exit.\n\nYour choice is? ");
    scanf("%d",&choice);

    switch(choice){

        case 1:

        clrscr();
        if(rear==9 && front==0 || rear==front-1){
            printf("\nOverflow! Queue is full.\n");
        }
        else if(front==-1 && rear==-1){
            printf("\nPlease enter your number: ");
            scanf("%d",&entry);
            queue[++rear] = entry;
            front =0;
            printf("\nYour number has been entered!\n");
        }
        else{   printf("\nPlease enter your number: ");
            scanf("%d",&entry);
            ++rear;
            rear=rear%10;
            queue[rear]=entry;
            printf("\nYour number has been entered.\n");
        }
        break;

        case 2:
        clrscr();
        if(rear==-1){
        printf("\nUnderflow! Queue is empty.\n");
        }
        else if(front==rear){
        printf("\nThe number has been deleted!\n");
        front=-1;
        rear=-1;
        }
        else{
        front++;
        front = front%10;
        printf("\nThe number has been deleted from the queue!\n");
        }
        break;

        case 3:
        clrscr();
        if(rear==-1){
        printf("\nThe queue is empty.\n");
        }
        else if(front<=rear){
        for(i=front;i<=rear;i++){
        printf("%d ",queue[i]);
        }
        }
        else{
        for(i=front;i<=9;i++)
        printf("%d ", queue[i]);

        for(i=0;i<=rear;i++)
        printf("%d ", queue[i]);
        }
        break;

        case 4:
        exit(0);

        default:
        clrscr();
        printf("\nInvalid Choice!\n");
        break;

    }
}//end of while
}//end of main


PS: Many improvements are possible in the program. You are welcome to suggest them in comments! :)