Saturday, April 5, 2008

Questions about final projects

- in Problem #1, do we have to come up with a new method, or can we use
an already existing one (I was thinking about Kmeans, for example) ?

You can use an existing method, provided that you are happy with how it performs on the problem. The webpage describing the dataset has a list of test error rates achievable by different methods, so you can see how well you are doing.

how about the code? do we have to provide it as well? does it have to be
100% original or can we use (and maybe adapt) toolboxes? (I saw for
example you pointed us to weka). Are there restrictions on the language?

You have to provide the code. You can use toolboxes. The code doesn't have to be original if you can make it work well on the problem. The grade *will* depend on the performance of other students. There is no restriction on the programming language as long as you make it easy for me to run your solution to verify the reported test error rates. (Again, if you somehow use the test set to tune your solution, you will automatically get 0 points.)

- If we turn in some projects before May 1st, will they be graded
earlier (so that we get an idea whether we should attempt others) ?

Yes, but every student will be given only one additional attempt.

An important note: If you choose to do a reading assignment with a quiz, I will subtract points if you clearly don't understand an important concept from the paper. So choose this option only if you are serious about it.


trizzlor said...

Question for the Programming Assignment #1:

Each "data-point" in the MNIST set effectively consists of a 28 x 28 pixel array of points, so the kind of distance function we use for our classifier will make a big difference in the overall error rates (in the sample classifier tests on the MNIST site, kNN performs quite differently for various Euclidean distances) and will probably effect the motivations for prototype selection. Is there a specific distance function you expect us to use for kNN testing, or one that you will be using to verify our results? Should we put much effort into developing a good distance function (e.g: factoring in rotation and skew) or focus solely on the prototype selection?

Thank you.

Weiwei said...

Can we just use other easier dataset such as points in d-dimension?
Many of us have no idea about computer vision...