AI Content Recognition for Text Transcription and Tag Generation

 I think this might be a neat use for all this fancy AI stuff we've been working on.

Imagine how much more accessible the site could be if things like image text transcription could be done automatically?

Maybe you want to upload some content with a lot of potential tags but you don't have the patience or energy to type out each one. So you just click a button and after a few moments a list of potential tags comes up for you to pick through.

Of course there is a technical limit to how far this could go. I wouldn't expect an AI to generate a fully parsed image description with reasonable accuracy.

But if you're uploading a picture of your cat in your bedroom with your 'legally acquired' traffic sign on your wall in the background, it probably wouldn't be too hard to find AI algorithms to automatically pick out the "cat", "animal", and "stop sign" tags for you, as well as some extra garbage tags that you can just ignore.

as much as i'd love to have something like this... as a general idea, i think i'm gonna have to give this a hard pass.

personally, i'd only be okay with this if:
1. this entire process could somehow run on your own device, rather than having to rely on an external service. this is primarily for privacy reasons.
2. it had to be manually activated; therefore, encouraging the post author to review and approve the resulting text.

even if those 2 points could be addressed, i feel like automatically-generated alt text could be really dangerous. modern day AI image recognition is far from perfect, and it's been known to get things WILDLY wrong. at best, the author would give up and write their own alt text. at worst, though... i wouldn't put it past certain people to just click "caption pls" and call it a day without giving a second thought to whatever mess of words the AI managed to spit out.

Mastodon has an OCR option for image descriptions. It's very handy. I think simple OCR at the very least would be nice and safe to have for transcribing actual text, anything more "intelligent" gets into all sorts of problems to consider as JackDotJS points out.

 sorry, i'm not too familiar with mastodon. what's OCR?

OCR is optical character recognition, i.e. the ability for a computer to "read" text in an image and produce a transcript. It's very good these days, especially for screenshots and other high-quality images of text.

I'm not sure how useful automatic tag generation would be (would tagging a photo "#stop sign" because it has a stop sign in the background actually be useful for most posters or viewers?) but the ability to automatically generate alt text with OCR would be a fantastic feature.

