Discussion:
Word sense disambiguation
Cristian Petroaca
2018-02-17 20:52:17 UTC
Permalink
Hi,

I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any but I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?

Thanks,
Cristian
Anthony Beylerian
2018-02-20 06:09:27 UTC
Permalink
Hi Cristian,

Thank you for your interest.

The WSD module is currently experimental, so as far as I am aware there is
no timeline for it.

You can find the sandboxed version here:
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd

I personally didn't have the time to revisit this for a while and there are
still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.

Best,

Anthony

On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any but I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?
Thanks,
Cristian
Cristian Petroaca
2018-02-20 20:26:04 UTC
Permalink
Hi Anthony,

I'd be interested to discuss this further.
What are the wsd methods used? Any links to papers?
How does the module perform when being evaluated against Senseval?

How much work do you think it's necessary in order to have a functioning
WSD module in the context of OpenNLP?

Thanks,
Cristian



On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian <
Post by Anthony Beylerian
Hi Cristian,
Thank you for your interest.
The WSD module is currently experimental, so as far as I am aware there
is no timeline for it.
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd
I personally didn't have the time to revisit this for a while and there
are still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.
Best,
Anthony
On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any but I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?
Thanks,
Cristian
Anthony Beylerian
2018-02-24 17:49:26 UTC
Permalink
Hey Cristian,

We have tried different approaches such as:

- Lesk (original) [1]
- Most frequent sense from the data (MFS)
- Extended Lesk (with different scoring functions)
- It makes sense (IMS) [2]
- A sense clustering approach (I don't immediately recall the reference)

Lesk and MFS are meant to be used as baselines for evaluation purpose only.
The extended version of Lesk is an effort to improve the original, through
additional information from semantic relationships.
Although it's not very accurate, it could be useful since it is an
unsupervised method (no need for large training data).
However, there were some caveats, as both approaches need to pre-load
dictionaries as well as score a semantic graph from WordNet at runtime.

IMS is a supervised method which we were hoping to mainly use, since it
scored around 80% accuracy on SemEval, however that is only for the
coarse-grained case. However, in reality words have various degrees of
polysemy, and when tested in the fine-grained case the results were much
lower.
We have also experimented with a simple clustering approach but the
improvements were not considerable as far as I remember.

I just checked the latest results on Semeval2015 [3] and they look a bit
improved on the fine-grained case ~65% F1.
However, in some particular domains it looks like the accuracy increases,
so it could depend on the use case.

On the other hand, there could be some more recent studies that could yield
better results, but that would need some more investigation.

There are also some other issues such as lack of direct multi-lingual
support from WordNet, missing sense definitions etc.
We were also still looking for a better source of sense definitions back
then.
In any case, I believe it would be better to have higher performance before
putting this in the official distribution, however that highly depends on
the team.
Otherwise, different parts of the code just need some simple refactoring as
well.

Best,

Anthony

[1] : M. Lesk, Automatic sense disambiguation using machine readable
dictionaries
[2] : https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf
[3] : http://alt.qcri.org/semeval2015/task13/index.php?id=results

On Wed, Feb 21, 2018 at 5:26 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi Anthony,
I'd be interested to discuss this further.
What are the wsd methods used? Any links to papers?
How does the module perform when being evaluated against Senseval?
How much work do you think it's necessary in order to have a functioning
WSD module in the context of OpenNLP?
Thanks,
Cristian
On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian <
Post by Anthony Beylerian
Hi Cristian,
Thank you for your interest.
The WSD module is currently experimental, so as far as I am aware there
is no timeline for it.
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd
I personally didn't have the time to revisit this for a while and there
are still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.
Best,
Anthony
On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any but I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?
Thanks,
Cristian
Cristian Petroaca
2018-02-27 20:22:41 UTC
Permalink
I agree with you. WSD should be included in OpenNLP once it has a
reasonably good performance.
On the other hand, I have seen few libraries or APIs doing WSD and almost
none doing it right. That may be indicative of how hard the problem is.

The only promising api I found is Babelfy : http://babelfy.org/about. It
uses a graph based model based on their BabelNet Knowledge base in order to
predict word senses. I think it's based on this paper:
http://www.aclweb.org/anthology/Q14-1019. Any thoughts on this?

On Sat, Feb 24, 2018 at 7:49 PM, Anthony Beylerian <
Post by Anthony Beylerian
Hey Cristian,
- Lesk (original) [1]
- Most frequent sense from the data (MFS)
- Extended Lesk (with different scoring functions)
- It makes sense (IMS) [2]
- A sense clustering approach (I don't immediately recall the reference)
Lesk and MFS are meant to be used as baselines for evaluation purpose only.
The extended version of Lesk is an effort to improve the original, through
additional information from semantic relationships.
Although it's not very accurate, it could be useful since it is an
unsupervised method (no need for large training data).
However, there were some caveats, as both approaches need to pre-load
dictionaries as well as score a semantic graph from WordNet at runtime.
IMS is a supervised method which we were hoping to mainly use, since it
scored around 80% accuracy on SemEval, however that is only for the
coarse-grained case. However, in reality words have various degrees of
polysemy, and when tested in the fine-grained case the results were much
lower.
We have also experimented with a simple clustering approach but the
improvements were not considerable as far as I remember.
I just checked the latest results on Semeval2015 [3] and they look a bit
improved on the fine-grained case ~65% F1.
However, in some particular domains it looks like the accuracy increases,
so it could depend on the use case.
On the other hand, there could be some more recent studies that could yield
better results, but that would need some more investigation.
There are also some other issues such as lack of direct multi-lingual
support from WordNet, missing sense definitions etc.
We were also still looking for a better source of sense definitions back
then.
In any case, I believe it would be better to have higher performance before
putting this in the official distribution, however that highly depends on
the team.
Otherwise, different parts of the code just need some simple refactoring as
well.
Best,
Anthony
[1] : M. Lesk, Automatic sense disambiguation using machine readable
dictionaries
[2] : https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf
[3] : http://alt.qcri.org/semeval2015/task13/index.php?id=results
On Wed, Feb 21, 2018 at 5:26 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi Anthony,
I'd be interested to discuss this further.
What are the wsd methods used? Any links to papers?
How does the module perform when being evaluated against Senseval?
How much work do you think it's necessary in order to have a functioning
WSD module in the context of OpenNLP?
Thanks,
Cristian
On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian <
Post by Anthony Beylerian
Hi Cristian,
Thank you for your interest.
The WSD module is currently experimental, so as far as I am aware there
is no timeline for it.
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd
I personally didn't have the time to revisit this for a while and there
are still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.
Best,
Anthony
On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any
but
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?
Thanks,
Cristian
Rodrigo Agerri
2018-02-28 06:47:56 UTC
Permalink
Hello,

Babelfy is not open source software. DBpedia Spotlight performs Named
Entity Disambiguation (APL 2.0), UKB (GPL) does WSD and obtains very
good results, and the IMS system is available for download. There will
be others, I am sure, but just talking off the top of my head.

HTH

R

On Tue, Feb 27, 2018 at 9:22 PM, Cristian Petroaca
Post by Cristian Petroaca
I agree with you. WSD should be included in OpenNLP once it has a
reasonably good performance.
On the other hand, I have seen few libraries or APIs doing WSD and almost
none doing it right. That may be indicative of how hard the problem is.
The only promising api I found is Babelfy : http://babelfy.org/about. It
uses a graph based model based on their BabelNet Knowledge base in order to
http://www.aclweb.org/anthology/Q14-1019. Any thoughts on this?
On Sat, Feb 24, 2018 at 7:49 PM, Anthony Beylerian <
Post by Anthony Beylerian
Hey Cristian,
- Lesk (original) [1]
- Most frequent sense from the data (MFS)
- Extended Lesk (with different scoring functions)
- It makes sense (IMS) [2]
- A sense clustering approach (I don't immediately recall the reference)
Lesk and MFS are meant to be used as baselines for evaluation purpose only.
The extended version of Lesk is an effort to improve the original, through
additional information from semantic relationships.
Although it's not very accurate, it could be useful since it is an
unsupervised method (no need for large training data).
However, there were some caveats, as both approaches need to pre-load
dictionaries as well as score a semantic graph from WordNet at runtime.
IMS is a supervised method which we were hoping to mainly use, since it
scored around 80% accuracy on SemEval, however that is only for the
coarse-grained case. However, in reality words have various degrees of
polysemy, and when tested in the fine-grained case the results were much
lower.
We have also experimented with a simple clustering approach but the
improvements were not considerable as far as I remember.
I just checked the latest results on Semeval2015 [3] and they look a bit
improved on the fine-grained case ~65% F1.
However, in some particular domains it looks like the accuracy increases,
so it could depend on the use case.
On the other hand, there could be some more recent studies that could yield
better results, but that would need some more investigation.
There are also some other issues such as lack of direct multi-lingual
support from WordNet, missing sense definitions etc.
We were also still looking for a better source of sense definitions back
then.
In any case, I believe it would be better to have higher performance before
putting this in the official distribution, however that highly depends on
the team.
Otherwise, different parts of the code just need some simple refactoring as
well.
Best,
Anthony
[1] : M. Lesk, Automatic sense disambiguation using machine readable
dictionaries
[2] : https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf
[3] : http://alt.qcri.org/semeval2015/task13/index.php?id=results
On Wed, Feb 21, 2018 at 5:26 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi Anthony,
I'd be interested to discuss this further.
What are the wsd methods used? Any links to papers?
How does the module perform when being evaluated against Senseval?
How much work do you think it's necessary in order to have a functioning
WSD module in the context of OpenNLP?
Thanks,
Cristian
On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian <
Post by Anthony Beylerian
Hi Cristian,
Thank you for your interest.
The WSD module is currently experimental, so as far as I am aware there
is no timeline for it.
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd
I personally didn't have the time to revisit this for a while and there
are still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.
Best,
Anthony
On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any
but
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there a
timeline for adding it?
Thanks,
Cristian
Cristian Petroaca
2018-03-01 19:39:14 UTC
Permalink
I know it's not open source. I was referring to replicating their graph
based model using Wordnet.
Post by Rodrigo Agerri
Hello,
Babelfy is not open source software. DBpedia Spotlight performs Named
Entity Disambiguation (APL 2.0), UKB (GPL) does WSD and obtains very
good results, and the IMS system is available for download. There will
be others, I am sure, but just talking off the top of my head.
HTH
R
On Tue, Feb 27, 2018 at 9:22 PM, Cristian Petroaca
Post by Cristian Petroaca
I agree with you. WSD should be included in OpenNLP once it has a
reasonably good performance.
On the other hand, I have seen few libraries or APIs doing WSD and almost
none doing it right. That may be indicative of how hard the problem is.
The only promising api I found is Babelfy : http://babelfy.org/about. It
uses a graph based model based on their BabelNet Knowledge base in order
to
Post by Cristian Petroaca
http://www.aclweb.org/anthology/Q14-1019. Any thoughts on this?
On Sat, Feb 24, 2018 at 7:49 PM, Anthony Beylerian <
Post by Anthony Beylerian
Hey Cristian,
- Lesk (original) [1]
- Most frequent sense from the data (MFS)
- Extended Lesk (with different scoring functions)
- It makes sense (IMS) [2]
- A sense clustering approach (I don't immediately recall the reference)
Lesk and MFS are meant to be used as baselines for evaluation purpose
only.
Post by Cristian Petroaca
Post by Anthony Beylerian
The extended version of Lesk is an effort to improve the original,
through
Post by Cristian Petroaca
Post by Anthony Beylerian
additional information from semantic relationships.
Although it's not very accurate, it could be useful since it is an
unsupervised method (no need for large training data).
However, there were some caveats, as both approaches need to pre-load
dictionaries as well as score a semantic graph from WordNet at runtime.
IMS is a supervised method which we were hoping to mainly use, since it
scored around 80% accuracy on SemEval, however that is only for the
coarse-grained case. However, in reality words have various degrees of
polysemy, and when tested in the fine-grained case the results were much
lower.
We have also experimented with a simple clustering approach but the
improvements were not considerable as far as I remember.
I just checked the latest results on Semeval2015 [3] and they look a bit
improved on the fine-grained case ~65% F1.
However, in some particular domains it looks like the accuracy
increases,
Post by Cristian Petroaca
Post by Anthony Beylerian
so it could depend on the use case.
On the other hand, there could be some more recent studies that could
yield
Post by Cristian Petroaca
Post by Anthony Beylerian
better results, but that would need some more investigation.
There are also some other issues such as lack of direct multi-lingual
support from WordNet, missing sense definitions etc.
We were also still looking for a better source of sense definitions back
then.
In any case, I believe it would be better to have higher performance
before
Post by Cristian Petroaca
Post by Anthony Beylerian
putting this in the official distribution, however that highly depends
on
Post by Cristian Petroaca
Post by Anthony Beylerian
the team.
Otherwise, different parts of the code just need some simple
refactoring as
Post by Cristian Petroaca
Post by Anthony Beylerian
well.
Best,
Anthony
[1] : M. Lesk, Automatic sense disambiguation using machine readable
dictionaries
[2] : https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf
[3] : http://alt.qcri.org/semeval2015/task13/index.php?id=results
On Wed, Feb 21, 2018 at 5:26 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi Anthony,
I'd be interested to discuss this further.
What are the wsd methods used? Any links to papers?
How does the module perform when being evaluated against Senseval?
How much work do you think it's necessary in order to have a
functioning
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
WSD module in the context of OpenNLP?
Thanks,
Cristian
On Tue, Feb 20, 2018 at 8:09 AM, Anthony Beylerian <
Post by Anthony Beylerian
Hi Cristian,
Thank you for your interest.
The WSD module is currently experimental, so as far as I am aware
there
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
Post by Anthony Beylerian
is no timeline for it.
https://github.com/apache/opennlp-sandbox/tree/master/opennlp-wsd
I personally didn't have the time to revisit this for a while and
there
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
Post by Anthony Beylerian
are still some details to work out.
But if you are really interested, you are welcome to discuss and contribute.
I will assist as much as possible.
Best,
Anthony
On Sun, Feb 18, 2018 at 5:52 AM, Cristian Petroaca <
Post by Cristian Petroaca
Hi,
I'm interested in word sense disambiguation (particularly based on
Wordnet). I noticed that the latest OpenNLP version doesn't have any
but
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
I
remember that a couple of years ago there was somebody working on
implementing it. Why isn't it in the official OpenNLP jar? Is there
a
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
Post by Anthony Beylerian
Post by Cristian Petroaca
timeline for adding it?
Thanks,
Cristian
Loading...