Archive for open source

virtualenv

// March 2nd, 2010 // 1 Comment » // open source, programming

I kept getting back to virtualenv and every time I would totally forget how to go about it and had to start googling. Every time. Frustrating. This is usually when Plone is involved.

So.

$ sudo apt-get install python-virtualenv
$ virtualenv -p /usr/bin/python2.4 /home/cawanpink/python2.4
$ source /home/cawanpink/python2.4/bin/activate
(python2.4)$ deactivate

That’s

  • install
  • create the Python 2.4 virtual environment
  • activate the environment
  • to get out of the environment

Now I know how to mine data

// January 5th, 2010 // 1 Comment » // misc, open source

There’s just too many techniques on how you can do data mining. In grad school we only got to know 1 technique – by using artificial neural network. There’s a tool for this – Weka. Download and install it or sudo apt-get it (yes imagine my surprise when I found out about this).

What’s data mining?

Well, basically it’s a process to get a knowledge (eg. if customers buy diapers, most likely they will buy beers too) from your data (imagine spreadsheets filled with numbers and strings). You may have thousands of records but how do you make sense of it? You mine those data, and get a pattern. From there you’ll get a knowledge. So from this you can put beers next to diapers in supermarkets to increase sales. That’s the general idea.

There’s a 90% chance that the data you got is not cleaned aka there’s missing values, some data are not consistent (eg. in sex can only have F or M, there’s a value Q?), and some data are just plain rubbish. The data need to go through preprocessing stage. For missing values, you have to filled them up, either using mean or median values, whichever is best for your data. The same goes to inconsistent data. This is where working with experts in the domain you’re working in is very important. You don’t want to remove what you think is rubbish but it actually means something.

Then we have data discretization process where you reduce that huge amount of data but they still carry the same value. Afterwards we normalize the data. After this is done, then the data is ready for the modelling exercise.

Classification

One of the benefit you get from data mining is classification. It’s where you can predict if you have this data (eg. male, married, doesn’t have children, have regularly purchase beer for the past year) whether beer purchase is likely. So it will classify to something like this purchase_beer equals to 0 (no) or 1 (yes). In order for the model to predict 0 or 1, the model itself has to be trained with alot of data. Train it until it reaches the accuracy we want, above 90% is good. A trained model with very high accuracy is going to be an asset as you can feed it data and it will spit out what we want.

The most difficult part in doing this for me is the data preprocessing part. You have to have quality cleaned data to produce quality results. You have to carefully select which kind of data is relevant to your goal (eg. would you want to include one’s job as one of the attributes considered for the diapers-beers example?) which is why having domain experts is important. They also have to determine how each data should affect the result (eg. job probably affects 10% but marital status affects 80% towards diapers-beers purchase).

I just love AI

There are just so many preprocessing methods & data mining hybrid techniques already been researched by academicians. I’m just so overwhelmed by the amount of technical papers on this. They probably not so much IT savvy like us, the implementors as they call it, but they definitely have the brain on that part of the world. We just have to scour through this massive database and get it to run on our apps. Well, should you need it that is. Coz processing thousands of data can take hours or days, some months, depending on your machine spec.

Open data

While doing data mining assignment last 2 weeks, I became frustrated with the unavailability of data in Malaysia. Sad. Maybe it’s still difficult for us to see it now, I’m already imagining all the stuff we can do with those data like in JPA, MOHR, MOH, MOHE – fuhhhh I’m all shuddery now. Even in OSCC, those training feedback forms and MyGOSSCON feedback forms – hmm whatever happened to those?

There are concerns of exposing private data I guess. Well, if you ask me make the data anonymous, as I couldn’t care less who got promoted last year. I only want the ’spec’ of that person who got promoted – age, sex, location, department, salary, is he respected by peers, does he drive, etc, that kind of stuff, you get the idea.

Hmm.. this will take another 10 years to realise, I think.

Committing to SVN using Bazaar

// December 22nd, 2009 // No Comments » // open source, tech

Ejat and I were working on how to do the above for MyMeeting codes and we did it! I have asked ejat to put this down in writing in his blog, but sadly the blog is not available anymore (alaa ejat bukan susah sangat cari hosting punnnnn.. susah2 ko host sendiri jek kat umah :P )

MyMeeting is also on Launchpad that makes use of Bazaar. We wanted to find a way how to send changes to both its main repo (using SVN) and Launchpad.

So we’ve been using SVN for MyMeeting hosted at OSCC. A typical way for us would be like this.

$ svn co https://svn.oscc.org.my/mymeeting/trunk trunk
$ cd trunk
(hack hack hack...)
$ svn status #see our changes
$ svn ci -m 'added feature ABC' #commit to SVN repository

To use Bazaar to work with SVN repo, you have to install bzr and bzr-svn. Excellent doc on bzr-szv is here.

$ sudo apt-get bzr bzr-svn

SVN-like

Working with Bazaar, the way would be something like this.

$ mkdir dev
$ bzr init-repo --default-rich-root dev
$ cd dev
$ bzr co https://svn.oscc.org.my/mymeeting/trunk trunk
$ cd trunk
(hack hack hack...)
$ bzr update #get changes done by others
$ bzr ci -m 'added form for feature ABC' #commit to SVN repository
$ bzr push lp:mymeeting #push to Launchpad, only have to provide location once
(hack hack hack...)
$ bzr update #get changes done by others
$ bzr ci -m 'added list for feature ABC' #commit to SVN repository
$ bzr push

Decentralized Bazaar way

If we were to take advantage of Bazaar’s decentralised way of doing it (so you can work offline, for example), it’s like this.

$ mkdir dev
$ bzr init-repo --default-rich-root dev
$ cd dev
$ bzr co https://svn.oscc.org.my/mymeeting/trunk trunk #our copy of trunk
$ bzr branch trunk working #make a local branch to hack on
$ cd working
(hack hack hack...working offline)
$ bzr ci -m 'added form for feature ABC' #commit to local branch
(hack hack hack...working offline)
$ bzr ci -m 'added list for feature ABC' #commit to local branch
 
(when you get your connection back)
$ cd ../trunk
$ bzr update #get changes done by others to our copy of trunk 
$ cd ../working
$ bzr pull #pull the changes to our local branch
$ bzr status #see our changes
$ cd ../trunk
$ bzr merge ../working
$ bzr ci -m 'added feature ABC'

Personally I like the centralised approach because it’s similar to SVN. Local branch is great feature if I have to do my work offline sometimes. And while I can pick Bazaar from now on, the rest of the team doesn’t have to switch tool. That’s great!

Some pics on 24 Hour OSS Webdev Competition

// November 5th, 2009 // No Comments » // events, open source, programming

We provided them food, unlimited coffee, a pc with Ubuntu installed, 2 wired connections, 2 wireless connection, a huge desk and enough chairs and couch. The rest, gadgets, devices, books, cables, wires, they brought it themselves.

Stopwatch and huge clock on screen. It would start at 11am.

Stopwatch and huge clock on screen. It would start at 11am. Arm did this - "fuyyo" was my first reaction when I saw it the first time.

(more…)

24-Hour OSS WebDev Contest

// October 7th, 2009 // No Comments » // events, open source

Will be held in conjuction with MyGOSSCON 2009. For geeks, I believe it’s going to be like a walk in the park.
programming-contest-banner
It’s the first time they’re going to have this contest in the conference. So I believe it’s going to be easy. Adda, ko cari geng cepat, kalau ko join, mesti menang. Pastu kena open table untuk aku ok hahaha…

1st prize: RM5,000 cash
2nd prize: RM3,000 cash
3rd prize: RM1,500 cash

Anyway yes it’s open for all, and they have additional prize for best team for students. If you’re in IPTA/IPTS, you’ll probably end up RM2k richer.

More information here.