RavenBlack (ravenblack) wrote in suggestions,

Improved data-mining to include communities

Improved data-mining to include communities

Short, concise description of the idea
Currently, /misc/interestdata.bml supports collecting data about communities, but /misc/fdata.bml does not. I believe it should.

Full description of the idea
An old-style data-mining bot which parsed userinfo.bml would be able to collect information about friends and communities with pretty much no distinction between the two. For some reason, fdata.bml?user=(some-community) simply returns
! not a person account
rather than giving the most useful information it could, which would be the "watched-by" and "members" information. The same probably goes for RSS feeds.

An ordered list of benefits

  • More data available to data-mining bots.
  • Reduced temptation to resort to parsing userinfo.bml to retrieve the otherwise unavailable data.

An ordered list of problems/issues involved

  • Some work involved, albeit probably not a lot.
  • Possibly resulting in more information being mined than some bot would want.

An organized list, or a few short paragraphs detailing suggestions for implementation

  • To make it easy for bots to avoid collecting information when they don't want it, I think the opening line of non-person accounts should remain "! not a person account". The rest of the output, however, should include the most relevant data possible.
  • For maximum clarity, I think a normal user's fdata.bml should include communities watched and membered, using a slightly different notation from the usual, resulting in a file resembling the following:
  • > my_friend
    < person_I_am_friend_of
    } community_I_watch
    { community_I_am_member_of
  • And a community's or RSS entry should perhaps be
  • ! not a person account
    ] user_who_is_a_member
    [ user_who_watches_me
  • The different notations would make it very easy for data-miners to ignore lines that don't interest them, and should also prevent existing data miners from being confused by new data, leaving it up to their developer whether to include the additional information. (Rather than leaving it up to the developer to actively exclude it.)
Tags: community membership, data mining, external services, ~ submitted - needs retagging
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded