David Anderson, A Brief History of (CPU) Time

Comp scientist at UC Berkley, building platform for citizen science. Looking for commonalities, software support that addresses community’s needs to make it easier for scientists to use volunteer power. Tech is only one piece of the solution.

Build platforms for:

  • Volunteer computing
  • Distributed thinking
  • Education

Computational science:

Simulations are now so vastly complex that they can only be done on computer. Simulations at various scales,  e.g. proteins, ecosystems, Earth, galaxy, universe. . Need lots of computing power because need to fit models to observed data. To predict what’s going to happen you need to run thousands or millions of simulations.

Generation of new instruments, e.g. LHC, LIGO, SKA, gene sequencers, produce data at unprecedented rates, right at limit of computers to handle. Beyond limit of computers owned by unis or institutions. Science limited by computing power and storage capacity. What we need is not a faster computer, but higher throughput, i.e. a lot of computers.

Consumer digital appliances, e.g. computers, handhelds, set-top boxes, are all converging on similar hardware. Networks that connect them all: consuemr digital infrastructure. 1.5 billion PCs. Graphics processing improved through desire to watch HD TV and play realistic games, and GPU oft 100x CPU speed. Put there for games, but good for science.

Storage on consumer devices approaching the terabyte scale, network approaching 1 Gbps.

All this is ideal for science computing!

Compare consumer digital infrastructure with institutional counterpart, it’s way bigger and way cheaper than institutional computing. Supercomputers moving towards an ExaFLOPs in 5 years, but consumers already have 1000 ExaFLOPS today. Consumer spend $1 trillion per year!

BOINC, free open source software, anyone can create a project.

Utopian ideal, to have a lot of these projects, getting computing power by advertising research to the public, educating public on their project, so public supplies resources to science where they want to put it.

Boinc projects

  • ~30 projects
  • 300k vols
  • 530k computers
  • 3 PetaFLOPS

Volunteers can do more than run software. they can provide tech support, can optimise programme, translate website, recruit new users. Initially used message boards, realised that they weren’t working well for non-tech systems, so now have a system based on Skype, so people needing help can find someone who is willing to give that help via Skype.

Volunteers have a spectrum of confidence, and some users are malicious, e.g. they have had people trying to scam others users and trying to get their PayPal IDs.

Motivations study. People interested in doing science, want to show the world they have the fastest computer, people who want to be a part of a team.

Distributed thinking. Stardust@Home, interstellar dust photos, looking for grains of dust. People can do this better than computers. Interesting thing was needing to quantify accuracy of results. Created samples where they new the answers, i.e. either contained noise or had a particle. Every 5th image was a callibration images and so could keep track of false positives and negatives.

Also used replication, many people look at it, and then if there is consensus can look at calibration results and that shows if consensus is correct. Project found all the dust particles it could.

Created platform called Bossa. Middleware for distributed thinking, provides scheduling mechanisms, e.g. calibration jobs, replication. Open system with respect to assessment, scheduling policies.

Being used to find fossils.

Also extending Bossa to Bossa Nova, looking at more complex systems for asking people to do things involving creativity, problems for which there is no unique answer. E.g. complex problem solving, use volunteers to decompose problem into sub-problems, propose solutions, evaluate them, evaluate how a group of solutions might work together. Involves different skills. At software level, it uses people optimally, uses them for tasks to which they can contribute most.

Education and citizen science. If we can train people to do more complex staves we can achieve more. This is very important – if people learn more they may stick with a project longer, recruit more people to help, and you get more computing power.

Challenge of training or educating 100,000s people is a challenging problem, not attacked by traditional education theory. Heterogeneity is problematic: different backgrounds, education levels, locations, language. What makes this tractable is that there is a constant stream of students arriving, dozens, 100s, 1000s new users per day, so we have lots of people arriving interested in a course, so we can do experiments. If we haev two alternative ways of teaching a concept, we can rig up the software system to randomly show one lesson or the other and then they take the same assessment.

May learn one lesson is better than another, or one lesson is better for a subset, e.g. based on demographic or other attribute, and can then make an adaptive course where as we learn more about the student we refine how we teach them. Not just individual lessons, but overall structure of course.

Bolt: system for tailored education for large streams of volunteers.