Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The first term may be more accurate than you might think. Elizabeth I of England ruled as Elizabeth Rex where Rex is Latin for "King" and Regina is the female term.

Technically in England Queen is an ambiguous term, queen regnant is the actual ruler, queen dowager is the widow of a king etc. But, again several European an other female rulers used the male term.

Hatshepsut crowned herself Pharaoh and maintained an elaborate legal fiction of maleness, because ya know she was in charge and could do what she wanted.

Hungary had two female kings Mary of the House of Anjou, and Maria Theresa.

Etc etc.



Poland had King Jadwiga, although in common language people still call her a queen (as she actually became one a few years after she was crowned).


Good point! I would see this rather as yet another argument for why you should simply give the actual output of the NLP algorithm.

So if people actually do the calculation King-Man+Woman and it comes closest to King, than they should report "King-Man+Woman~=King" and not "King-Man+Woman=Queen" (only because that's what they expected).


To be honest, I think the idea that we should expect ML algorithms to give a single, certain answer is misguided. I would expect the output from this algorithm to be "King - Man + Woman = King (90%), Queen (83%), Prince (70%)" or something like that, i.e. a list of answers with some measure of how "good" those answers are. Then again, I work in a field that doesn't really have categorical answers so maybe I'm missing something obvious.


That's pretty much correct. You would typically calculate a vector for "King-Man+Woman" and then do a query on this based on a cosine distance (or similar measure) over the entire vocabulary.

The query would give you a ranked list of the closest word vectors with scores that indicate how good the match is.


But the example is only performing vector operations. You could perhaps normalize the distances of a number of vectors with a softmax or something to produce a probability across a set, but what's being presented in the paper is the "closest" vector following the operations in terms of cosine distance.


In the end, it doesn't matter what transformation you do though, as long as you do it consistently and/or not in an ad hoc manner. If excluding the original term always leads to useful results, it is a useful transformation.

The problems materialize when you're just cherry picking for results.


See also: Drag kings (https://en.wikipedia.org/wiki/Drag_king). (Usually) female people called "king".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: