David Anderson, A Brief History of (CPU) Time

Comp scientist at UC Berkley, building platform for citizen science. Looking for commonalities, software support that addresses community’s needs to make it easier for scientists to use volunteer power. Tech is only one piece of the solution.

Build platforms for:

  • Volunteer computing
  • Distributed thinking
  • Education

Computational science:

Simulations are now so vastly complex that they can only be done on computer. Simulations at various scales,  e.g. proteins, ecosystems, Earth, galaxy, universe. . Need lots of computing power because need to fit models to observed data. To predict what’s going to happen you need to run thousands or millions of simulations.

Generation of new instruments, e.g. LHC, LIGO, SKA, gene sequencers, produce data at unprecedented rates, right at limit of computers to handle. Beyond limit of computers owned by unis or institutions. Science limited by computing power and storage capacity. What we need is not a faster computer, but higher throughput, i.e. a lot of computers.

Consumer digital appliances, e.g. computers, handhelds, set-top boxes, are all converging on similar hardware. Networks that connect them all: consuemr digital infrastructure. 1.5 billion PCs. Graphics processing improved through desire to watch HD TV and play realistic games, and GPU oft 100x CPU speed. Put there for games, but good for science.

Storage on consumer devices approaching the terabyte scale, network approaching 1 Gbps.

All this is ideal for science computing!

Compare consumer digital infrastructure with institutional counterpart, it’s way bigger and way cheaper than institutional computing. Supercomputers moving towards an ExaFLOPs in 5 years, but consumers already have 1000 ExaFLOPS today. Consumer spend $1 trillion per year!

BOINC, free open source software, anyone can create a project.

Utopian ideal, to have a lot of these projects, getting computing power by advertising research to the public, educating public on their project, so public supplies resources to science where they want to put it.

Boinc projects

  • ~30 projects
  • 300k vols
  • 530k computers
  • 3 PetaFLOPS

Volunteers can do more than run software. they can provide tech support, can optimise programme, translate website, recruit new users. Initially used message boards, realised that they weren’t working well for non-tech systems, so now have a system based on Skype, so people needing help can find someone who is willing to give that help via Skype.

Volunteers have a spectrum of confidence, and some users are malicious, e.g. they have had people trying to scam others users and trying to get their PayPal IDs.

Motivations study. People interested in doing science, want to show the world they have the fastest computer, people who want to be a part of a team.

Distributed thinking. Stardust@Home, interstellar dust photos, looking for grains of dust. People can do this better than computers. Interesting thing was needing to quantify accuracy of results. Created samples where they new the answers, i.e. either contained noise or had a particle. Every 5th image was a callibration images and so could keep track of false positives and negatives.

Also used replication, many people look at it, and then if there is consensus can look at calibration results and that shows if consensus is correct. Project found all the dust particles it could.

Created platform called Bossa. Middleware for distributed thinking, provides scheduling mechanisms, e.g. calibration jobs, replication. Open system with respect to assessment, scheduling policies.

Being used to find fossils.

Also extending Bossa to Bossa Nova, looking at more complex systems for asking people to do things involving creativity, problems for which there is no unique answer. E.g. complex problem solving, use volunteers to decompose problem into sub-problems, propose solutions, evaluate them, evaluate how a group of solutions might work together. Involves different skills. At software level, it uses people optimally, uses them for tasks to which they can contribute most.

Education and citizen science. If we can train people to do more complex staves we can achieve more. This is very important – if people learn more they may stick with a project longer, recruit more people to help, and you get more computing power.

Challenge of training or educating 100,000s people is a challenging problem, not attacked by traditional education theory. Heterogeneity is problematic: different backgrounds, education levels, locations, language. What makes this tractable is that there is a constant stream of students arriving, dozens, 100s, 1000s new users per day, so we have lots of people arriving interested in a course, so we can do experiments. If we haev two alternative ways of teaching a concept, we can rig up the software system to randomly show one lesson or the other and then they take the same assessment.

May learn one lesson is better than another, or one lesson is better for a subset, e.g. based on demographic or other attribute, and can then make an adaptive course where as we learn more about the student we refine how we teach them. Not just individual lessons, but overall structure of course.

Bolt: system for tailored education for large streams of volunteers.

Christian Thiemann, Follow the money: what we’ve learned from ‘Where’s George?’

Where’s George? is a project tracking one dollar notes in the US. Put a stamp on the note, and log it on the website. Creates link between the two places where the not was logged. Provides a lot of data about how money moves around the US, therefore data about how people move around the US.

Administrative system of US: Divisions contain States contain Counties. Spatially compact hierarchical structure historically evolved with geographic determinants.

Human mobility – are these divisions spatially compact, determined by geography? How much geography is encoded in the network of money movements?

Can make groups of counties, run algorithms to test the strength of borders between devisions/states/counties. Shows Mississippi River is a strong border, as is Colorado/Ohio border.

Compare to epidemiology, SIR model – susceptible, infectious, recovered, i.e. what happens when an infectious person meets a susceptible person, etc. No spatial element in this model. When modelled, see a wave of infection moving through the world.

But modern transportation, aviation, changes it. Incorporate that, creates new model of spread of disease worldwide, e.g. SARS. Modelling based on country doesn’t provide enough granularity. Using Where’s George? data, can see how people move around the US.

No dataset like WG. Local travel data and aviation data, but doesn’t show full picture.

Nw model for Swine Flu, combine WG data with for spread, and show where possible flu hotspots might be, e.g. LA, Dallas, Miami, Chicago, NY areas.

Can look at multiscale mobility and local mobility, they show very different spreads of disease.

David Grier, Lessons from the Ancient History of Crowd Sourcing

I’m here at the Citizen Cyberscience Summit for the next two days. Expect quite a few notes (although probably not all session!).

Lessons in crowdsourcing.

Nothing new under the sun. Science shaped by forces that have existed a long time, and we understand them. Works on calculation and making mathematical models work. That has long history as being citizen science, going back 200 years. Take large task, divide into small exchangeable jobs, send out into the world with instructions on how to do them. Charles Babbage wrote extensively about this in the 1830s.

Babbage was thinking about this because of his computing machine, how can you split big tasks up? Activity is largely about starting something, drive is towards radical organisation that moves towards convention, where you have people who lead, follow, varying levels of skill.

Babbage’s discussion, prime example he worked off, Gaspard DeProny, French Revolution, labour was very cheap. DeProny got 100 people to do maths tables, required for surveying.

~100 years later, 1875, post American Civil War, lots of people out of work, women who were widows, organised group of computers, i.e. women, who put together the Harvard Star Catalogue. Then again in 1907 by US Naval Obs.

And again in 1938 during the Great Depression – 450 people working at tables with paper and pencils doing calculations for scientists or government. Maths Tables Project.

A few people have adding machines, they are the leaders.

A NYC computing office, Columbia Uni Stats Computing Lab, 1930, had 6 employees.

Most of the Maths tables computers hadn’t been to high school, organised by arithmetic opersations e.g. – or +.

Often drew from poor classes at that time: blacks, women, Irish, Jews.

Planners had a doctorate or masters, operated 1938 – 1948, remnants existed til 1964.

Most scientists/engineers didn’t have access to computers until mid-60s when they had timeshare. Some not until the 70s. They worked with a worksheet, that was planned, a bit like programming. Early programmers were called planners, coding was called planning. Instructions, so workers didn’t need to understand what they were doing.

Built 28 volumes of tables, e.g. powers of integers, exponential functions.

Discovered there were particular skills, and started to look for specific calculations that were good for generating revenue, e.g. OSRD calcs, microwave radar tables, explosion calcs; LORAN navigation tables (precursor of GPS); general science calcs, e.g. Hans Bethe paper on Sun. First test of linear programming.

Labour economics. As you build skills, people want to use their skills and want to be rewarded for that skills. They want to advance. WPA studied labour and skill as a way of building identity.

Building skill

  • Identity
  • Accomplishment
  • Avancement

Aspirational issues: Everyone wanted to be special.

Special Computing Group, had a room to themselves, had machines, almost all were women, and everyone wanted to move up to the special computing group. They started offering courses at lunch to develop those skills. Once they had completed that, were were opportunities outside. So lots of places they could apply those skills. Best measure of ability was the skill of the group.

They recognised that losing members of the group, breaking it up would damage the organisation. Started to have difficulty getting work, so soliciting work from scientists.

Gertrude Blanch, PhD, chief mathematician. Ran the computing group. Had to take everyone who came in the door, had to find ways to enforce discipline, e.g. people who didn’t think about carrying the ten, or who couldn’t concentration. Calculations were done 3 – 10 times each, didn’t just duplicate for sake of it.

“People doing hand calculations computing the same number the same way make the same mistakes” – Babbage

So did same calculation in different ways to ensure accuracy.

Crucial issue: How do citizen scientists relate to professionals? Professionals build walls around themselves.

National Academies of Science said they wanted to be of use to the gov’t, and help scientific work. WPA sponsored a lot of science. Internal comms of Nat Academies are ’embarassing at best’. Group realised it was in a position it didn’t want to be in, and internal memos had one set of reasoning that repeats:

“Scientists are successful people. The poor, are not successful, because they are poor. Therefore can conclude that hte poor are not scientific. Ergo, maths tables project is not scientific, ergo their work is not good.”

NAS wanted the budget for the maths tables and wanted to do it “right” and ‘well”. But they could never have replicated it with students as budget wouldn’t cover it.

Handbook of Mathematical Functions: Largest selling science book in history.

Gertrude Blanch, finished PhD in 1934, was Jewish, was never going to be employed by anyone, but doing the maths tables project lead her to be employed by the Air Force, worked on supersonic air flow, jet nozzles.

The thing that we are doing is building skill amongst the general populous can never be overlooked.

Maths group had droped form 450 people to 120 as labour costs were higher. But claimed 120 was as efficient. by 1946, group had fallen to 60 people with specific skill, but still as efficient. People have titles, skills, identifiable expertise.

Lessons:

  • We are creating skill, not just exploring the universe and doing science,
  • Get people who want to be identified with project, part of their identity.
  • Builds org with hierarchy, divisions of labour based on skill
  • Encourage aspiration

DeProny users Adam Smith as justification, first 2 ch of Wealth of Nation, division of labour, identifying people with skill. Forces that shape these orgs and relationships that will support science, there is also a political economy that shapes it. which builds skill but divides jobs, creates leaders and followers. Must deal with science self-defining as enclosed domain.

 

links for 2010-09-01

links for 2010-08-30

  • Kevin: In July, Matt Brian of The Next Web reports about recent successes enjoyed by location-based network Foursquare: "Just days after securing $20 million series B round of capital, the location service has announced another big milestone – 1 million check-ins in one day."
  • Kevin: A look at the competition heating up in the location space with the launch of Facebook Places. "While some of you might think this trend marks the moment when social media jumped the shark, major media outlets and other businesses are looking to cash in on the impulse people have to overshare." I spotted this, and I think that news organisations need to pay attention to this. Geolocation has potential for news organizations, too, as demonstrated earlier this year after the foiled Times Square attack. In the days that followed, there was more than one false alarm, and the Wall Street Journal used a Times Square "check in" on Foursquare to alert others in the area that there was an evacuation.

    Editors can also use geolocation to help confirm eyewitness tips from the scenes of news events. That doesn't mean there won't be hoaxes or that the systems can't be fooled, but it's a step up from the e-mails and SMS tips we have now."

  • Kevin: Jonathan Stray of the Nieman Journalism Lab interviews my friend and former colleague Simon Rogers, the editor of The Guardian's Datablog, on the data journalism efforts at the newspaper. As Jonathan points out, most of the tools that The Guardian uses are free. Mostly free. The Guardian uses Google Docs and Google mail for much of office applications, but, of course, to use those at a business, there is a fee. However, it's less than traditional office applications. However, they also IBM's Many Eyes and time visualisation tool Timetric, both, which are free.
  • Kevin: Most location-based services in 2010 focus on retail and restaurants, but Zillow is one of a several location applications the focuses on real estate. They have iPhone, iPad, Android and Windows Mobile apps. Their CEO says that the app generates much better leads for real estate agents because people are actively out looking at homes. "Zillow’s competitors, such as Redfin, ZipRealty, Century 21, Realtor.com, have apps, as well."
  • Kevin: MG Siegler reports: "Future Checkin is an app that allows you to check-in to your favorite Foursquare venues automatically when you’re near them. You don’t have to do a thing besides simply have your phone on you and this app will check you in while running in the background with iOS 4."
  • Kevin: Shopkick uses the API of location-based network Foursquare to reward you for walking into a retail store in the US. The app knows when you walk into the store and gives you a reward, presumably a special offer of some sort. They also have a virtual currency called "kickbucks". They say the check-in method can't be faked. In the future, their app will know not only when you walk into a store but also where you are in the store. I suppose this would be of interest to the store because if the phone was sensitive enough it might know not only that you're in the store but also what department you were in. I think to achieve that they'll have to use Skyhook using hotspots because the GPS signal wouldn't be available in the store, but it's possible.

links for 2010-08-29

links for 2010-08-27

links for 2010-08-25

  • Suw: Despite a silly headline, this is actually a very good opinion piece by Mike Altendorf, questioning the kneejerk reactions of HR and boardrooms towards social media in the business.
  • Kevin: Yahoo's Barcelona research lab has created a tool that will not only puts past articles on a timeline, but it also looks at predictions made in those past articles. For instance, Tom Simonite in the MIT Technology Review gives the example of a 2004 opinion piece that predicted that North Korea would have some 200 warheads. It's a clever use of semantic technology that extracts dates from articles and delivers more information to the reader. It's a clever riff on the idea of a timeline, and it's a great discovery tool for a news organisations archives.
  • Kevin: "Statistics can make or break a story. Used correctly they add weight and conviction, but it’s easy to be seduced by cherry-picked data and meaningless surveys." A talk at the Centre for Investigative Journalism by Nigel Hawkes on how to become savvy about data.
  • Kevin: "Polymaps is a free, open-source JavaScript library for making dynamic, interactive maps. It is the result of a collaboration between Stamen Design and SimpleGeo. … Polymaps provides speedy display of multi-zoom datasets over maps, and supports a variety of visual presentations for tiled vector data, in addition to the usual cartography from OpenStreetMap, CloudMade, Bing, and other providers of image-based web maps."
  • Kevin: Scott Rosenberg has written a very thoughtful post on the risks of trusting Facebook with the future's past. He write: "In fact, Facebook is relentlessly now-focused. And because it uses its own proprietary software that it regularly changes, there is no way to build your own alternate set of archive links to old posts and pages the way you can on the open Web." I think the issue of memory and archive in the digital age is a really interesting one, and it becomes even more important when we outsource digital memory to closed systems that have their own priorities.
  • Kevin: A good howto post by Tony Hirst of Open University on how to screen scrape data from Wikipedia. Tony has a number of excellent tutorials on his blog on how to do this. One thing to note is that a lot of the data in Wikipedia is now available on DBPedia so you might not have to go through this process.
  • Kevin: A nice brief look at a new visualisation project from the BBC called HowBigReally.com that puts news events in a physical context. For instance, with floods currently covering a fifth of Pakistan, how does that translate on a map of the United States, allowing readers in the US to appreciate the sense of scale. Really good thinking.
  • Kevin: Alan Mutter writes about how major metro newspapers in the US are finding some success in creating niche print products. "(F)oresighted publishers are creating niche products to try to capture readers who historically were unlikely to buy the legacy newspaper – and, of course, the advertisers who covet them as customers." This is smart. As Philip Meyer wrote in 2004 with The Vanishing Newspaper, whenever a new medium has challenged an existing one, it has always pushed the legacy media towards greater specialisation. Some newspapers are focusing on this not only with digital products but also with new targeted print products.
  • Kevin: Caroline McCarthy at CNET has written one of the best pieces on the roll out of Facebook places. With Facebook Places, there was not just a shift from location as a standalone feature but also in how location was being talked about. "Facebook Places' debut marks a shift in the rhetoric of the location-based services market because of the company's vocal connection of geolocation to permanence and memory, rather than the language of exciting immediacy (see what your friends are doing right now! In real time!) touted by the likes of Foursquare and Gowalla." I wonder if this is simply a rhetorical shift for the purpose of marketing and differentiation or if it actually speaks fundamentally to how location will work on Facebook. They cynic in me thinks it's probably just a bit of marketing.
  • Kevin: Poynter has a great interview with Tiffany Campbell, a six-year veteran of the paper and a lead producer for SeattleTimes.com. She describes how mobile tools such as Twitter and live video service Qik are changing how they report news and interact with audiences. She talks about how they use Twitter not only as a distribution mechanism but also as a content creation platform. "By using Twitter as a mobile platform, we were able to give real-time updates and maintain users' interest in an event." They always see a spike in traffic when they go live with video or tweet a live event. She sees that this is not only changing reporting but also how audiences interact with journalism. People can interact with reporters in real time as they are reporting.