الخميس، 15 يوليو 2010

A followup on the "Cake Today" blogpost thefts

A couple weeks ago I reported that a bot-generated pseudoblog called "Cake Today" was stealing content from TYWKIWDBI and reposting the material with links to Amazon Cake sales.  I was puzzled in part by the fact that this website was horribly mistranslating my text in the process.  Here, for example, are two sentences I wrote about the American Lady caterpillars:
"Several weeks ago I wrote about the host plant and the eggs of the American Lady. Now I can offer some photos about the young caterpillars... The first instar has a semi-translucent body and is very difficult to see except for that black head."
And here's how those sentences were rendered at the other website:
"Several weeks ago I wrote most the patron plant and the eggs of the dweller Lady. Now I crapper substance whatever photos most the young caterpillars...  The prototypal instar has a semi-translucent embody and is rattling difficult to wager eliminate for that black head."
Obviously the text had been rendered into some other language (??Klingon) - and then retranslated back into English.  But why??  TYWKIWDBI reader Andrew offered this concise explanation: "It's been re-translated to make it look like original content to search engines," and reader Kirk contacted Ed Kohler of The Deets, who explained the process as a type of "black hat SEO" ["search engine optimization"]:
"Since the content isn't really meant to be read (just draw traffic from search that can then leave through ads) it doesn't matter if it's particularly legible. Also, he could publish additional sites in additional languages. Again, not really any more work other than setting it up."
Interesting.  The auto-retranslation might also bypass any legal claims that they are stealing my content.  So what to do about it?  I did ponder the obvious, of trying to sabotage the process by posting something embarrassing to them.  The post I wrote about them stealing material was posted at their site (!), so the process was clearly automated.  I could post anything, and since their site did the repost within minutes I could then delete whatever I wrote from my site.

But then I thought why bother?  Since TYWK is a nonprofit blog, I'm not actually losing anything.  There is also the principle of not bringing a knife to a gunfight.  Whoever was doing this was a log-power more technically sophisticated that I am, and if they perceived that I was trying to mess with them, they might know of ways to hit back.

The best response seemed to be to report the site, as several of my readers suggested.  I looked into the process, and it seemed to be complicated.   While I was diddling around, TYWKIWDBI-oldtimers soubriquet and Nathan reported the offender on my behalf, and within a couple days the site was taken down by Blogger/Google.

I'm posting this now in case other bloggers encounter a similar situation in the future.   One can reasonably expect that this sort of thing happens all the time, and that in the vast world of the intertubes we wouldn't even be aware of it unless we have occasion to run a search for text we've written or TinEye one of our posted photos.

My sincere thanks to all the TYWK readers for your technical advice and assistance during this incident.

And now back to our regularly-scheduled programming...

ليست هناك تعليقات:

إرسال تعليق