Archive for misc

Now I know how to mine data

// January 5th, 2010 // 1 Comment » // misc, open source

There’s just too many techniques on how you can do data mining. In grad school we only got to know 1 technique – by using artificial neural network. There’s a tool for this – Weka. Download and install it or sudo apt-get it (yes imagine my surprise when I found out about this).

What’s data mining?

Well, basically it’s a process to get a knowledge (eg. if customers buy diapers, most likely they will buy beers too) from your data (imagine spreadsheets filled with numbers and strings). You may have thousands of records but how do you make sense of it? You mine those data, and get a pattern. From there you’ll get a knowledge. So from this you can put beers next to diapers in supermarkets to increase sales. That’s the general idea.

There’s a 90% chance that the data you got is not cleaned aka there’s missing values, some data are not consistent (eg. in sex can only have F or M, there’s a value Q?), and some data are just plain rubbish. The data need to go through preprocessing stage. For missing values, you have to filled them up, either using mean or median values, whichever is best for your data. The same goes to inconsistent data. This is where working with experts in the domain you’re working in is very important. You don’t want to remove what you think is rubbish but it actually means something.

Then we have data discretization process where you reduce that huge amount of data but they still carry the same value. Afterwards we normalize the data. After this is done, then the data is ready for the modelling exercise.

Classification

One of the benefit you get from data mining is classification. It’s where you can predict if you have this data (eg. male, married, doesn’t have children, have regularly purchase beer for the past year) whether beer purchase is likely. So it will classify to something like this purchase_beer equals to 0 (no) or 1 (yes). In order for the model to predict 0 or 1, the model itself has to be trained with alot of data. Train it until it reaches the accuracy we want, above 90% is good. A trained model with very high accuracy is going to be an asset as you can feed it data and it will spit out what we want.

The most difficult part in doing this for me is the data preprocessing part. You have to have quality cleaned data to produce quality results. You have to carefully select which kind of data is relevant to your goal (eg. would you want to include one’s job as one of the attributes considered for the diapers-beers example?) which is why having domain experts is important. They also have to determine how each data should affect the result (eg. job probably affects 10% but marital status affects 80% towards diapers-beers purchase).

I just love AI

There are just so many preprocessing methods & data mining hybrid techniques already been researched by academicians. I’m just so overwhelmed by the amount of technical papers on this. They probably not so much IT savvy like us, the implementors as they call it, but they definitely have the brain on that part of the world. We just have to scour through this massive database and get it to run on our apps. Well, should you need it that is. Coz processing thousands of data can take hours or days, some months, depending on your machine spec.

Open data

While doing data mining assignment last 2 weeks, I became frustrated with the unavailability of data in Malaysia. Sad. Maybe it’s still difficult for us to see it now, I’m already imagining all the stuff we can do with those data like in JPA, MOHR, MOH, MOHE – fuhhhh I’m all shuddery now. Even in OSCC, those training feedback forms and MyGOSSCON feedback forms – hmm whatever happened to those?

There are concerns of exposing private data I guess. Well, if you ask me make the data anonymous, as I couldn’t care less who got promoted last year. I only want the ’spec’ of that person who got promoted – age, sex, location, department, salary, is he respected by peers, does he drive, etc, that kind of stuff, you get the idea.

Hmm.. this will take another 10 years to realise, I think.

Desert for today

// December 28th, 2009 // No Comments » // misc

Made 3 of these today.

OooOOooohhh. Yummiest.

OooOOooohhh. Yummiest.


i walked away only for a few seconds, it was still in pristine condition, came back and found out that somebody had been naughty!

A quick post on chocolate drink

// December 16th, 2009 // 2 Comments » // misc

I don’t really like chocolate much – I don’t go gaga over it like other girls do. But these… add a little sugar… oohhh… come to mama!

Do you know how delicious these taste?

Do you know how delicious these taste?

Cup of sweet sweet tea

// December 4th, 2009 // 1 Comment » // misc

hot-teaWell I’m going to have cups and cups of tea! It’s been raining here and what else could have made the days better. I love them sweet, just sugar and nothing else added. Sometimes I put in peach halves in there – yum!

There’s alot types of tea out there. I normally drink the normal one. What I mean is, I’m not into those flavoured ones. The only one that I’ve tried and liked is chamomile tea which is just so calming. There’s something about it that’s so peaceful and serene. Like I said, I like it sweet, for now. They bring you many benefits as long as you cut the sugar down. But I’m a rebel – I want it the way I like it :P

Tea is known as nature’s ‘wonder drug’. Of late, tea and its healthy benefits have been receiving wide attention in the media. The ability of tea to promote good health has long been believed in many countries, especially Japan, China, India, and even England. – http://www.teabenefits.com/

What tea is good for (from http://www.farsinet.com/hottea/medicalbenefits.html:

  • Arthritis – tea drinkers are 60 percent less likely to develop rheumatoid arthritis
  • Bone density – stronger bones
  • Cancer – green tea
  • Flu – boost your fight against the flu with black tea
  • Heart disease – two cups of tea a day decreased the risk of death following a heart attack by 44 percent
  • High blood pressure – green or oolong tea
  • Parkinson’s diseases
  • Oral health – prevent cavities and gum disease

That’s just abit of info. I’m sure there’s hordes and hoards of info online.

Here’s to good health!

I love my charmbracelet

// November 23rd, 2009 // No Comments » // misc

Makes me want to get another one. That’s going to cost me another RM100. I don’t care! I want another one, and another one, and another one!

Pretty pretty thing!

Pretty pretty thing!