mathieuen Mathieu
Very cool: hash trick + cross-product features = polynomial kernel-like features in feature space

April 15, 2011     2 retweets #

atpassos_ml Alexandre Passos
@mathieuen The only sad thing is you still need quadratic/cubic computation to enumerate them all

April 15, 2011 #

ogrisel Olivier Grisel
@atpassos_ml @mathieuen you can use a moving window for text tokens (cross product only inside 100 consecutive words). Mahout can do this.

April 15, 2011 #

mathieuen Mathieu
@atpassos_ml @ogrisel restricting to promising features, grouping features in some way etc should help computationally

April 15, 2011 #

ogrisel Olivier Grisel
@mathieuen @atpassos_ml promising cross products == collocations: need an unsupervised statistical tests to identify: http://bit.ly/eO8COs

April 15, 2011 #

atpassos_ml Alexandre Passos
@ogrisel @mathieuen I don't think statistical tests are a good idea, we should be able to do this as fast as lexing and trust regularization

April 15, 2011 #

atpassos_ml Alexandre Passos
@ogrisel @mathieuen to fix things and ignore the irrelevant colocations

April 15, 2011 #

atpassos_ml Alexandre Passos
@ogrisel @mathieuen One thought: if one uses an incremental hash function then one can add arbitrary cross-products easily

April 15, 2011 #

atpassos_ml Alexandre Passos
@ogrisel @mathieuen just, when adding a new feature, go through every nonzero feature in the vector and continue the hash from there

April 15, 2011 #

atpassos_ml Alexandre Passos
@ogrisel this should have lots of collisions (and can't handle redundant hashes as-is), but should be fast to get n-th order cross-products

April 15, 2011 #

ogrisel Olivier Grisel
@mathieuen would be great to have this implemented efficiently in the scikit as an alternative to vocabulary based vectorizer

April 15, 2011 #

ogrisel Olivier Grisel
@mathieuen also, we should investigate with google's http://bit.ly/cityhash (BTW, mahout uses murmurhash implemented in java)

April 15, 2011 #

mikentweets Michael Nute
@mathieuen how are you using the cross product on features? And what's the hash trick? So many questions and so few characters... #stats

April 15, 2011 #

mathieuen Mathieu
.@mikentweets hash trick: use a hash function to get a mapping to an index in your weight vector

April 15, 2011 #

mathieuen Mathieu
.@mikentweets cross-product features: create pseudo features from the combination of other features

April 15, 2011 #