COMP316A: Artificial Intelligence Techniques and Applications

- Assignment 6: Version Space Learning -

This assignment is marked out 30, with a maximum of 25 marks for the first part and a maximum of 5 marks for the second part.

 

  1. Write a program called CandidateElimination that implements the candidate elimination algorithm for version space learning. The algorithm must be able to learn conjunctions of positive literals (e.g. conjunctions like A1 /\ A2 /\ A5, where A1, A2, and A5 are three of the attributes describing the examples). The program must accept two command-line arguments, the first one being the training set and the second one being the test set, e.g:

    java CandidateElimination train.data test.data

    must read the training data from train.data and the test data from test.data.

    The data files contain 0/1 values. Each row in the data file is an example with 0/1 values, the attribute values, separated by spaces. Each row has the same number of values. The last value in each row is the target value.

    Your program must process the training data, one line at a time, and output the new G and S sets each time the version space has been updated after an example has been seen. The hypotheses must be output in the format shown above, e.g. A1 /\ A2 /\ A5.

    Once the training examples have been processed, and assuming the version space has not collapsed, your program must process the test examples. For each example in the test set it must output either 1, 0, or unclassified. It should also output the total number of correct and incorrect classifications in the test data. If the version space has collapsed at training time, your program must output Version space has collapsed.

    Here are two example training and test sets, each with three attributes and a target attribute.

    Training set:

    0 1 1 1
    0 0 0 0
    1 0 1 0
    1 1 0 1

    Test set:

    0 0 1 0
    1 0 0 0
    0 1 0 1
    1 1 1 1

     

  2. Change your program so that hypotheses can contain negated literals, so that you can learn hypotheses like -A2 /\ A4.

Other Information

Value: 11% of the final grade
Due date: Wednesday, 11 June, midday

No extensions will be granted except for sound, documented, medical reasons.