RivkaT (rivkat) wrote in suggestions,

Refinements for robots.txt

Refinements for robots.txt

Short, concise description of the idea
I would like the ability to customize robots.txt so that my choices are not all-or-nothing. I'd like to be able to allow Google to index my journal but not other bots.

Full description of the idea
In general, robots.txt can be customized, allowing or disallowing certain bots or allowing indexing of text only but not images, putting certain directories off-limits, and so on. LJ's feature "minimize inclusion of results in search engines" is all-or-nothing: if I want Google to index my posts, I have to accept any indexing. Given current attempts to target LJ users and, in particular, media fans and harvest their information for commercial purposes, see http://nakeisha.livejournal.com/478324.html, I can't keep the convenience and visibility of Google indexing without accepting much more targeted and intrusive indexing.

An ordered list of benefits
  • Greater control of privacy.
  • Greater customizability for users who use LJ as a primary site.
An ordered list of problems/issues involved
  • I have no idea how hard this would be to program. I presume it wouldn't be used by very many people because it is an advanced customization feature, but if targeted data harvesting becomes more common, the feature might increase in popularity.
Tags: robots.txt, § no status
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded