sábado, 02 de mayo de 2020

lunes, 28 de octubre de 2019

Training iNat's Artificial Intelligence (AI) - the curious case of Serruria villosa

  • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - o0o- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    An intriguing issue/situation has sparked my interest. It has to do with an observation of Serruria villosa by another iNaturalist observer, someone who happens to be an iNat novice - one of a number of 'new' iNatters who joined during April 2019 to contribute to the city nature challenge. The iNaturalist member in question has not been active on iNat for several months.

As it happened

As a little bit of background, the observation in question ( click here ) has four photos, of four different fynbos plant species. The first (front) photo is of the protea family member Serruria villosa, and the ID for the observation is just that: Serruria villosa. I know this, because I was one of the iNatters who identified it during the city challenge. In this instance, the first photo is really all one needs to identify it to species level - if one is familiar with the fine-leaved, yellow-flowered S. villosa that is.

The other three species illustrated in the three remaining photos are:

Syncarpha vestita (a daisy with bright white flowers and hairy, silver-grey, broad leaves)
Diastella divaricata (a groundcover proteaceae, with small pink flowers and relatively broad leaves)
Erica ericoides (a fine-leaved heath with small, clustered flowers)

Each of these photos are also used in their own discrete observations, so there is no loss of information just this one, slightly anomalous, observation that includes three extraneous (non-relevant) photographs.

Almost a month after the observation was posted, one of the iNat curators refused to agree to the ID of S. villosa but instead labelled it "State of matter Life".

I thought (and still do think) this was an odd action on the part of the curator and commented to that effect. There are a number of observations with more than one species included in the photos, but the comments section and/or sometimes the description are sufficient (usually) to deal with that issue. One identifies the life form and moves on.

Strict application of the iNaturalist 'house rules' would require that this observation be identified as Magnoliopsida (not 'state of matter: life) - which I have now done in order to accede to those with "heebeejeebies" (see below - spoiler alert). This would/could/should also mean that any number of existing observations would have to be reclassified - not the least of which would be the numerous observations of a landscape with many species present. Ah well ... that is another story (for anyone who has that kind of energy to put into this space) - this is about the AI issue.

Curating the AI

I then thought nothing more of it - until a few days ago when the same curator commented once again on the same observation:

"Please dont identify mixed collections by the first photo. Gives some members on the site the heebeejeebees!! All four pictures are now identified as Serruria villosa and as this is research grade all four will be used to train the Artificial Intelligence on what Serruria villosa looks like.
Clearly that is utterly undesirable!"


So, being curious about the AI training and skills-development I thought I might test the hypothesis that the AI would be misled by having three arbitrary photos included in this single observation.*

*[... again, bearing in mind that a number of iNat observations do in fact have photos of other organisms included in their photo gallery for the observation - for all sorts of reasons - but also bearing in mind that there are already well over 140 observations (thus well over 140 photos) of this rather distinctive species of Serruria on iNat. Why is this last point important? Click here!].

I selected several observations of S. villosa by simply clicking at random on the distribution map (I admittedly discarded one - simply because it rather ironically turned out to be the observation in question - the one with the photos of three other plants, but right at the end I checked that one too - see the postscript).

I then pretended that I had absolutely no idea what species had been observed and clicked on each to find species suggestions.

Top species suggestions for the first three I opened were as follows (in order of recommendation):

1) https://www.inaturalist.org/observations/34684957

  • Syncarpha vestita (AHA!)
  • Phylica
  • Spatalla
  • Staavia radiata
  • Amphithalea
  • Edmondia sesamoides
  • Erica imbricata
  • Lachnaea

(After this first one I kinda thought the curator might have a point, although to my mind it was possible that the photos were more focused on the bright-white flowered Metalasia in the images since it 'popped' far more than the Serruria), but I moved on ... many photos in iNaturalist are not great (including most of my own.)

2) https://www.inaturalist.org/observations/10900255
-Erica plukenetii
-Serruria elongata
-Erica sessiliflora
-Aulax umbellata
-Erica coccinea

Hmmmmmm.... desperately seeking silkypuffs, kooigoed and small-flowered heaths ... next?

3) https://www.inaturalist.org/observations/23617516
-Serruria elongata
-Serruria fasciflora
-Brunia noduliflora
-Aulax umbellata

I tried several more randomly selected observations with similar results (although I have not transcribed them all - you are welcome to try this yourself ...).

Selecting relatively good photos

After checking through five or six of these random ones I started to think more strongly that maybe the Syncarpha misidentification from the first observation might really just be because the computer might (understandably) assume that the subject of both photos was the brightly white Metalasia? Maybe?

This got me thinking that maybe because some of them were not great, readily identifiable photos (out of season, etc.) maybe it was a bit difficult for the poor old AI. Change of plan!

I next selected several quite readily identifiable observations of S. villosa - ones with good close up photos of the flowers. I again pretended I had no idea what species it was (despite all being research grade with several correct IDs) and clicked on each to find species suggestions. The results were fairly startling IMO!!

Top species suggestions for each were as follows (in order of recommendation):

-Asclepias linaria
-Protea scolymocephala
-Erica sessiliflora
-Peucephylum schottii

-Serruria fasciflora
-S. elongata
-Pinus mugo
too cute man!!
-E. sessiliflora, etc.

-Erica sessiliflora
-Leucadendron spissifolium
-Mimetes cucullatus
-E. plukenetii
-Syncarpha vestita
(AHA!!! - that little blighter snuck back in here!!)

-E. sessilifora
-P. scoly
-Serruria elongata
-Leucospermum cuneiforme

-Leucogenes grandiceps
(I had to look this up)
-Rhodiola rosea (I confess I had to look that up too)
-Chuquiraga (OK, I had to look that up too - rather fascinating)
-Penaea mucronata

Again - I looked through several more, but have not transcribed the results - which were similar to those above. And, again, you are most welcome to try this yourself, at your leisure.

What will it take?

Serruria villosa as a species appears to have the iNaturalist AI beat! Bouncing between families and even continents at times, searching for a solution.

Yet, hats off to the AI for not being so readily duped as to ID Serruria villosa as Erica ericoides (which the AI also finds a bit difficult to recognise, even with good flower images), Syncarpha vestita (which the AI recognises pretty readily, especially when in flower, even in habitat) or Diastella divaricata (which the AI recognises relatively readily from a reasonable photo - in flower)!

What will it take to train the iNat AI into recognising Serruria villosa (arguably one of the more easily recognisable Serrurias - especially in flower, and which has a fairly narrow distribution)? Maybe it will take a whole lot more observations? Apparently the AI does not appear to be confused or swayed by the inclusion of these three other species' photos in the mix?

Would/could/should the removal of those three extraneous photos be sufficient to get the iNat AI recognising Serruria villosa? I suspect not. But it would be most interesting to find out.

It must/might be infuriating to some curators that some (or maybe most) of their own taxa of interest are less recognised and recognisable than - for example - a blurry photo of a Harlequin Ladybird at a distance of 100 m, photgraphed during a sandstorm [trigger warning - satire font may just be on]. Asparagus must be a real blighter! [Note: satire font is definitely off again].

In the meantime, here's hoping the iNat AI upskilling process will eventually afford regular/routine recognition of this impossibly incognito species Serruria villosa! Maybe a dedicated S. villosa bioblitz will bring hundreds more observations into iNat and swing the iNaturalist AI into line?

This would be great to resolve, since I have loved the regularity with which many of my (and other people's) observations are given 'spot on' family-, genus-, and even species level IDs through AI recognition. Well done iNat IT software-whizzes and the iNat AI!

Moving on, moving forward

Anyhow, I did turn to the iNat Forum, Blog and other bits and pieces in the hopes of finding something constructive. I first found this rather useful forum list in a 'Computer vision clean up wiki'. At first I thought that maybe Serruria villosa can be added to this list of species to which the computer says "no"? However, reading further, I realised that this is not the purpose of that list. I continued hunting. I found further useful information here: Vision model updates.

I also turned to the flags, to see, for example, whether the curator had flagged S. villosa for attention. There, I found that a number of species have similarly confounded the computer ...

A number of flags for 'overconfident computer vision suggestion' indicate that this is something that does occur, providing some mopping up for curators and other hard-working citizen identifiers.

Whatever the technical requirements and practical solutions, I'm sure that the iNaturalist tech team will find a solution. The team at iNaturalist has proven time and again to be most adept at resolving issues and graciously willing to face challenges. The software and computer vision boffs - such as @alexshepard, @pleary and @albullington will likely be able to provide reasons for all the challenges, as well as be working on the solutions and work-arounds.

Thanks iNat - I have loved this space!

  • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - o0o- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Postscript: I am not sure whether anyone will read this, or even try checking on the Computer vision ID for the observation/species in question, but at present, asking the AI to provide recommendations for this particular 'naughty' observation of state of matter Life (the one which includes the photos of the Serruria, Erica ericoides, Diastella and Syncarpha) yields the following results (at time of writing):

-Berzelia abrotanoides
-Brunia noduliflora
-Erica sessilifolia
-Berzelia intermedia

This suggests to luddite me that none of the three extraneous photos are giving any particularly broad hints to the AI at all. It also makes me think that the most recognisable photo should probably always be the first/front photo of an observation?

Publicado el lunes, 28 de octubre de 2019 a las 07:11 PM por leejones leejones | 3 comentarios | Deja un comentario