Celeb Pics

SMI Finishes With Loss

There is no other way to put it; the Social Media Index failed miserably on the final night of the season. Jessica's 2-to-1 lead in social media traffic melted away when faced with this season's White Guy With Guitar (WGWG). SMI has garnered some attention in recent weeks from Jessica's fan base thanks to its success in picking the Bottom 3 AND her high ranking every week. Every week I tried to outline my problems with Jessica's numbers - the astroturfing, the foreign fan base and its attempts to vote, etc. In the end those concerns and doubts turned out to be prophetic.

I worked on those nagging doubts over this past weekend. As the number of votes tallied each week increased so did the SMI raw numbers. Lots of casual fans become more attracted to the show once the wheat is separated from the chaff. Each week I would revisit the magic "500" astroturfing number I came up with back when Jessica was saved from elimination by the judges. I figured that as interest in the show increased so would the number of hardcore fans.

And that indeed did happen. Problem is so did Phil's hardcore fanbase. The astroturfing adjustment for the finale had to be a measurement of Phil's rabid fans subtracted from Jessica's count. And the work I did this weekend just couldn't justify a substantial increase over that 500 number. In the end I think AI's vote counting people figured out how to shut down the foreign voting and I didn't have an effective way of eliminating foreign social media traffic from SMI. I can shut off foreign *language* traffic from the Index but not traffic emanating from foreign countries.

Changes To The Model

Well I can't really say I'm making changes to the model for next season. I don't really have a model yet. I just threw some scripts together to measure social media traffic and slapped a boring name on it. Next season I *will* have a real model. A model is needed to properly judge the value of various changes to SMI that I've talked about here. Factors such as non-voting period traffic, textual analysis ("vote for jessica"), and even the sampling period itself need a formal model to help give a yea or nay on SMI inclusion.

Some of the details have yet to be worked out but I'm certain I'll be using logistic regression. Results on AI are binomial in nature. You're either in or out every week. Logistic regression is built for those types of outcomes. Now if AI released vote totals I'd probably pick something else. But AI is never going to do that and more importantly hasn't done that in the past. All of the available data I have to draw on is of the "in/out" variety.

Logistic regression will allow SMI to do some pretty neat things. For starters SMI will be able to give confidence estimates. No more "SMI picks Skylar Laine to go home". I'll say things like "According to SMI Skylar Laine has a 55% chance of being sent home". I especially want to be able to put confidence estimates on Bottom 3 membership. Earlier in the competition picking the loser is really tough and getting the Bottom 3 right is of greater import. Plus Bottom 3 membership allows for a much bigger dataset.

Pairwise Comparison And The Bottom 3

Every week the AI results show is one big pairwise comparison. In last week's results show it was revealed that Jessica beat Joshua Ledet, and that Phil beat Josh as well. We didn't get any data on the Jessica/Phil matchup.

If these were the only data points available every week I don't know that I would bother with this thing. But the Bottom 3 designations greatly increase the amount of data available. With say, 10 contestants left in competition the Bottom 3 data gives us a total of (10 - 3) * 3 = 21 data points before

We wish to thank the following allies:

the loser is even considered. Each member of the safe 7 can be said to have bested each member of the Bottom 3. The larger number of remaining contestants the bigger the number of data points I can feed to the model.

This is a big deal because social media like Twitter and Facebook didn't exist back in Season 1 and didn't become relevant until maybe the last four seasons. The bigger the dataset the more reliable the model.

SMI And The X Factor

I'd like to try out the new and improved SMI on The X Factor this fall. XF doesn't have the kind of back data that AI has. And the structure of the show might be just different enough to throw SMI a curve ball. But it would be nice to have something of a dry run for the upgraded SMI. My biggest constraint here is really time. XF starts in the fall and the fall is a pretty busy time with lots of other projects competing for attention.

I'll post updates here during the summer and fall on the progress of the SMI. I've got a lot of learning to do and new code to write. If that interests you please stick around. If all you're interested in are the predictions themselves you might want to check back in the fall and see if I'm doing X Factor. Thanks for reading!

Wanna keep up with new posts here? Ddd our RSS feed to your favorite reader or just bookmark the main blog page. Over on that page you can see all of our past posts.

Welcome to Pics Of Celebrities. This is a brand new site and it will take a little time to flesh out. But it isn't hard to describe what we're all about: celebrity pictures!

PoC has a broad view of what constitutes a "celebrity": movie stars, models, sports figures, all that stuff for sure. But we'll also show you pics of your fave reality stars too. You want Housewives? We got 'em. Idol contestants? You betcha. G4 show hosts? Of course. All in one easy to digest package.

You might find the place a little idiosyncratic at first. Warts will show. Not all of the functionality will go up right away. Some links won't work. We'll stumble a bit on the way to finding a good clean page design. Rome wasn't built in a day. Crawling is up first, then walking. Someday we'll start running.

The Editor