Fri, 24 Mar 2023 06:51:55 -0500The Human Web

mr's Preposter.us Blog

I’ve been thinking this morning about how there may finally be a market emerging for ways to detect and avoid machine-generated things.  I’m still working on a general-purpose solution for this, but in the meantime I have an idea that could be a step in the right direction: a search engine that only returns pages created by humans.

The basic idea is that publishers (anyone who creates a website) create an account and go through a basic vetting process (with another human) to confirm their humanity.  Once confirmed, they provide the search engine with a list of urls to index (could be domains, subdomains or domains with paths, etc.) and are given a key.  

To add pages to the index, the key is used to compute a signature by using it in a checksum process along with the page data.  This signature is then imbedded in the page in as meta tag.  The next time the search engine crawls the publishers URLs, it checks these signatures and only adds the page to the index if they are valid.

In cases where invalid signatures are found, the engine staff contacts the publisher who provided the URLs to see what’s up (maybe it’s a technical error, maybe funny business, etc.)

Periodically, human staff at the search engine review the corpus to see if someone has published something that looks machine-generated and follows-up with the publisher who originally provided the URLs in question.  

Some of these validation steps might eventually be automated, but I don’t think they need to be, and I think the results would be best if they are not.  After all, jumping to that conclusion is the sort of thinking that created the problem we’re trying to solve in the first place.  

Also this work is uniquely suited for humans and I think it could be made enjoyable if matched to the right person.  Verifying humanity is no harder than having a coffee chat with someone and checking pages for machine-generated content is basically surfing the web.  Assigned correctly to the right people, these tasks might not even feel like “real work”, but they’d still get paid for their time.

Which brings up the obvious question: how do you pay for all this?  The most obvious answer is subscriptions and fees, but I’m not exactly sure what would work best yet.  Querying the search engine could require a subscription, and I would personally pay a monthly fee to avoid consuming machine-generated garbage every time I look for a recipe or a DIY article or programming example.  There could also be a one-time per-query fee for automated systems which want to provide only human-created work (references, citations, etc.).

On the supply-side a subscription could be required to register and keep your keys valid for signing new pages, and there could be usage-based costs as well (some basic number of megabytes/gigabytes of index data and additional charges above that, etc.)

There may also be better and more creative ways to finance the operation, but I want to be very thoughtful about anything other than direct exchanges of money and service because again getting cute and clever about money is what created the problems we’re tying to solve.  What I absolutely do not want to do is incentivize anything that makes the search engine less valuable to humans.

So I think there’s plenty of ways to fund the engine without compromise, but I need to think about it more to find the best one.

If working on this interests you, let’s chat