A Lithium brand
A more neutral topic curation algorithm
May 16, 2016


Monday, Gizmodo published an article about the curation practices behind Facebook’s Trending module, and to what extent a curator’s personal biases affect what’s shown to Facebook’s billion-plus active users. (As you can imagine, this has caused some controversy.) Although Gizmodo focused on the role of human curators, those of us who work with algorithms and machine learning have had to confront the fact that biases can end up deeply encoded into supposedly objective systems, including Klout’s own topic classifier -- the system that identifies your Expert topics, as well as the relevant articles for the Explore tab. Here’s a quick roundup of how we in Klout’s Data Science team think about keeping the topic system as unbiased as possible.


Klout’s topic classifier in 30 seconds or less

The major inputs to our topic system are:

  1. The classifications we’re applying:
    1. An ontology of nearly 10,000 human-curated topics
    2. An underlying dictionary of over a million named entities and concepts
  2. The data being classified, which includes:
    1. Social media profiles (to be analyzed for topic expertise and interest)
    2. URLs published on social media and elsewhere (to be analyzed for topical content and served in the Explore tab)


Even without getting into the weeds of the pipeline that brings those two types of input together -- more on that below -- critical readers will already be able to spot a few areas where we’re vulnerable to bias. Let’s walk through them.


Any classification system contains value choices

Should “Autism” be placed under “Diseases” or “Neurology”? Is “the Tea Party movement” distinct from “conservative politics”? Is “Wizard Rock” really a thing? How we answer these questions shapes the experience of our users, and inevitably gives our ontology a point of view.


Longtime users may remember the early days of Klout, when topics were a, shall we say, messy combination of user-submitted tags and data-mined concepts. As the Data Science team worked on regularizing and improving the ontology, we’ve relied on the following principles:


  1. The ontology is a living thing; we should always have tools in place for updates
  2. The ontology should always have up-to-date guidelines defining:
    1. Scope -- what portion of the world we’re describing. (In our case, as much of it as possible.)
    2. Granularity -- at what level of detail we’re describing it. (Do we need to include every actor? Television show? Heavy metal subgenre?)
    3. Voice -- the tone we use in describing it. (Do we use scientific names? Full legal names for persons? Slang?)
  3. A little redundancy won’t hurt  -- our system can support topics with some conceptual overlap, so err on the side of inclusion. (For example, both “Gun Rights” and “Gun Control” are topics in the Klout ontology.)
  4. Users are the best source of feedback -- our users have a broader range of perspectives than we do; make it easy for them to alert us to problems


Even so, any time your application or audience changes, it’s important to reassess your classification scheme. One major flaw in Klout’s topic ontology is that it was developed for a U.S. audience, and still needs significant work for other countries and languages.


Staying alert for sins of omission

In addition to the human-curated topics in the ontology, we also use a dictionary of concepts and entities derived from Freebase. Freebase is a widely-used resource in the data science world, but “widely-used” is not the same thing as “perfect”, by any means. The biggest issue with Freebase is what it leaves out; like Wikipedia, it was collectively sourced, and like Wikipedia, it’s biased toward the interests of its editors, and sparse in some areas like cosmetic products and fashion terms, requiring us to develop ways to supplement the dictionary. The moral of the story: it pays to look critically at any pre-packaged data set you plan to use.


Boosting inclusivity

Next, let’s consider the URLs we collect for the Explore tab. The majority are URLs that have been shared on social media, which means they are dominated by the topics most discussed on social media: politics, celebrity news, music, etc. What we sometimes call “niche topics”, like molecular biology, or Wicca, or wheelchairs, naturally are present in fewer URLs. Does that count as a bias? It’s unclear, but it’s not a good end-user experience and risks making some users feel marginalized. As a result, we’ve had to develop backup strategies to increase coverage for less common topics.


The fuzzy line between human bias and business logic

One of the more ironic tidbits in Gizmodo’s article was that Facebook’s curators were told to suppress news about Facebook -- that is, to interfere with the Trending algorithm to avoid the appearance that Facebook was interfering with the Trending algorithm. But that kind of decision is probably familiar to the product managers in the audience, whose goal it is to preserve the user experience. Similarly, a discovery feed like our Explore tab might recommend porn, or spam, or hate speech, and need to be tuned or overridden. To make it even more complicated, the definition of porn, or spam, or hate speech may change from region to region. Keeping those decisions from being made inconsistently or thoughtlessly is really difficult, but our approach has been to define a single owner who both documents the rules and is accessible to discuss individual cases. As others have pointed out, Facebook’s mistake may not have been having curatorial tools, but isolating the employees using them.


Fine, but what about the actual topic algorithm?

Eagle-eyed readers will have noticed that we haven’t touched on the nuts and bolts of how Klout’s topic system actually assigns topics. The challenges of data modeling and debugging machine learning algorithms are pretty well surveyed elsewhere, and how we handle those challenges at Klout would require a dedicated blog post. However, there’s less discussion of how to handle human biases when collecting training or validation data -- how people’s points of view get encoded into the data a given algorithm is trying to approximate. The two approaches often recommended could be described as micromanaging versus crowdsourcing; either a) have an in-house process that includes well-defined guidelines, trained judges, and a reconciliation process for disagreements, or b) have lightweight guidelines but a large number of judges, in the hopes that individual biases will be muted. There are tradeoffs to either approach; our team has recently been relying mostly on in-house validation data, mostly because it’s friendly to our development schedule. But what’s more important, in our experience, is that the potential weaknesses of the training/validation data are known and discussed and documented ahead of time, so that they can be distinguished from problems with the model itself.

No system is perfect, and keeping out bias takes continual work. Although a focus on documentation, consistency, and validation will take you a long way, the very best defense against unintentional bias is a diverse team, who can bring multiple points of view. Want to come work with us?



IMG_750.jpgSarah Ellinger is the Lead Data Analyst for Klout/Lithium’s Data Science team. She is responsible for overseeing the content of the topic ontology, as well as monitoring the performance of the topic classification system. Sarah attended U.C. Berkeley’s School of Information and has over a decade of experience in taxonomy and web content classification at tech companies large and small. She can be found on Twitter discussing information science and Game of Thrones spoilers @sarahellinger.

Not applicable

Your article makes me more experienced and impressed, I hope you will have more good posts in the near future to share with readers. geometry dash

Not applicable
see more news about education http://studyabroad.resultonline.pk/search.php
Not applicable

Do you want to proofread my paper but you do not know exactly how to do this properly. You should not worry as you can hire expert writing services that are working online. 

Not applicable

They just wanted to stand out because they didn't want to fit in what Avery considered to be "robots" and conform, and they were well versed in history and how scary it is to disassociate and forget who you are, is as it happened to them before Hunks gay movies. To avoid that, as they entered middle school, Avery began to listen to metal grunge and punk music and began wearing ripped jeans, crop tops and they colored and dyed their hair frequent colors and sometimes Avery even shaved part of her hair off to the point of only having it barely being there for mostly visual and sensory reasons in addition to wanting to stand out. Avery was a full-blown punk and goth. Also, Avery loved punk culture and loved it after seeing one of their friends wear that style, that encouraged Avery to "liberate" themselves and their body, and Avery's parents were OK with them wearing those types of clothing as Avery's parents just like Avery were non-binary and Avery became non-binary at age 12 and before that Avery still considered themselves to be female and preferred she/her pronouns until 12 Interracial gays

Avery was extremely ticklish and their parents knew that, and Avery's parents did everything to raise Avery the way they always dreamed of raising their first child. Avery's parents didn't dislike their behavior, but whenever They would misbehave or do something to seek attention, Avery would get tickled.







Not applicable

This is really an awsome blog because i found all my required information here. I recently found another blog with same content which is updated on daily basis. You can check it here. Apple Iphone 12 Price and Specification 

Not applicable

Awesome information thanku so much for Sharing Movie Download 

Not applicable

Its a awesome blog and its really very very very helpful blog 


My all request complete this forum and its all information great MBA Full Form 

Not applicable

Fixing of Error Code 30 of HP Printer

HP printer is a globally used printer. Hp is a reputed brand that has many varieties of products. Regardless of being reputed with multi - functionality features, HP Printers comes across with many technical errors.  Hp Printer non - activated error code 30 is one of them. This is a printer non activation error that can be solved in very simple steps. Here we will tell you how to resolve this error code 30 in few simple steps. In case if you are still facing some issues, then plc contact HP Printer helpline number+1-833-284-2444. Experts will help to solve your concerns in a few minutes.

To know more click here

Not applicable

Thank you for sharing this article. It is really useful, hope you can share more other articles.

five nights at freddy's unblocked

Not applicable
I am thankful for the article post.Looking forward to visit more. voir film complet
Not applicable

Proprio per questo motivo, ho deciso di condividere con voi divese frasi di Buon Compleanno da dedicare alle persone più care, sulla bacheca di facebook, su un bigliettino o a voce,
per rendere la loro giornata più allegra! Preparati a far sorridere i tuoi amici con queste fantastiche battute.
Queste sono solo alcune delle frasi da dedicare, ma ti ricordo che nel menù in alto ne puoi trovare tante altre in base al tipo di persona a cui devi fare gli auguri..

Not applicable

I don't know what to say really what you share is so good and helpful to the community, I feel that it makes our community much more developed, thanks.  online jigsaw puzzles    

Not applicable

Australian Assignment Help provides best quality assistance to students at reasonable prices. Get my assignment help services in Australia from the quality writers of Australian Assignment Help and get assignment writing help in Oman from the experts of Student Life Saviour. 

Not applicable
We offer the best Law essay services in the UK with exceptional original written work from British native writers. UK professional law experts who have also been providing Law coursework help for years. Order with us and get good grades in your university exams!
Not applicable

I don't know what to say really what you share is so good and helpful to the community, I feel that it makes our community much more developed, thank you. online jigsaw puzzles

Not applicable
Thanks for sharing this helpful & wonderful post. I really appreciate your hard work. This is very useful & informative for me. Thanks for sharing with us. Thanks a lot. Regards
 Adviceduniya and helpgurugroup
Not applicable
Not applicable

Very nice join free computer tips, hacking, html, java, css, and tricks in Techlinics Visit Site


Free digital marketing course

For Computer Gyan


Not applicable

Chole bhatore best recipe

Lucknawi Zayka

Not applicable

This is extremely helpful info!! Very good work. Everything is very interesting to learn and easy to understood. Thank you for giving information.

bonk io

Not applicable

I am truly thankful to the holder of this web page who has shared this enormous piece of writing at at this place. president leather jacket

Not applicable

Excellent and nice post. It will beneficial for everyone. Thanks for sharing such a wonderful post.It is extremely helpful for me.

Batman Arkham Knight Nightwing Leather Jacket 

Not applicable

Breast cancer Philippines is still the most common cancer affecting women. In 2018, there were 24,798 new cases of breast cancer among women in the Philippines. That is 31.4% of all new cancer cases in women for that year, making it the top cause of death among Filipina women. There are many options for treatment of breast cancer in the Philippines. Many local surgeons are trained and experienced when it comes to treating the condition. However, a common concern that breast cancer patients are the look of their breast after treatment, especially in surgeries like a mastectomy.

Not applicable

 , fax machine servicing & repairs and photocopier service & repairs. Our aim is to provide you with Fast onsite Service and Repairs as well as providing you with 100% backup on our service and parts. 

Not applicable

Thank you for organizing such wonderful info. Students can now make their life easy and hassle-free by opting Assignment Help service of GoAssignmentHelp portal. It is one stop gateway for success in the academic field. You can ask for administrative law assignment help online. We will make it for you at affordable price.