Key Takeaways from Beijing’s AI Conference

I attended the Beijing Association of Artificial Intelligence Conference last week, and took away an understanding of the need for cross-cultural coordination. There is a pressing prerogative for humans to take back responsibility in the global governance of AI

Disclaimer: This summary was produced for the Berggruen Institute China Institute. Find out more about their work.

Beijing set the stage for the coming together of AI experts from around the world at this year’s Beijing Association of Artificial Intelligence (BAAI) conference at the beginning of November. The intended cross-cultural communication channel opened by conferences such as this one, was particularly pertinent in the panel discussing AI Ethics. The guest speakers from different cultures shared their different ideas but one commonality emerged: the requirement of humanity to take responsibility in the careful design of AI to attain collective goals of harmony, transparency and diversity, themes repeated in national AI principles produced around the world. The Collingridge dilemma famously introduces a “pacing problem” whereby the rate of technological innovation is increasingly outstripping the rate of required regulation. In light of this, the key takeaway from the panel is the onerous on humans, in the capacity as the academic community, the governments, the companies or the public, to take responsibility now with the foresight to design a future where artificial intelligence is for the benefit of all, not the few.

Wallach’s opening address was characterised by his emphasis on cooperation and coordination, seeing our current climate as an inflection point of uncertainty in technological development. Ethics in navigating this uncertainty requires flexibility to overcome the technological determinism inherent to Collingridge’s dilemma. Technological determinism seems more likely when we consider Liu Zhe’s point on the differential treatment of autonomy in an engineer’s definition, versus a philosopher’s. In Wallach’s words, engineers, ethicists and legislators must all speak the same language to find a robust set of parameters for decision making and tools for guidance at each level of responsibility. Yet adaptive and agile governance remains an ideal, not a practical implementation. Greater precision is required in laying concrete foundations, what Liu Zhe calls functional morality, for the mechanics of global AI governance in order to close the gap between principles and implementation.

Professor van den Hoven shared the concern on how we design an ethical system which can be applied to the 21st century condition:

21st century condition: How can we collectively assist people unlike us, far away, and in a distant future, by refraining from using certain complex technologies now to benefit them later in accordance with standards unfamiliar to us?

To meet this condition requires a reinvention of ‘old ethics’, a reconceptualization of a new manual with updated guidelines, which van den Hoven analogises with the different safety rules written for different types of ship: an oil-tanker is unfit for tourists and a cruise ship unfit for oil.  In constructing our new approach, responsibility, as the cornerstone of all legal and ethical frameworks, must remain central.

Source: van den Hoven

Ethics 1.0 Ethics 2.0
Actions Omissions
Basic and non-mediated Technologically mediated
Between natural individual persons Joint/ Collective
Space time contiguity Future and remote
Causally standard context Causally wayward chains
Negative duty not to harm Positive duty to assist

Trust and confidence are often conflated but speakers on this panel called for their philosophical distinction. Van den Hoven implores trust is a human system and ignorance of human responsibility allows for ‘agency laundering’. Demarcating an artificially intelligent agent as trustworthy abstracts from who made it or who designed it. The moral abdication of human responsibility to the shoulders of AI inadequately distributes risk. In blackbox algorithms, responsibility is blurred and without responsibility there is no imperative  for the designers of AI to consider ethical design at all. Instead, to mitigate plausible deniability of blame, designers need to ground machines as based on human choice and human responsibility to maintain our knowledge, control and freedom. 

Three examples illustrate this transition from epistemic enslavement to epistemic empowerment, where humans avoid becoming slaves to algorithmic autonomous decisions by retaining a responsibility over moral risk. The first example is provided by van den Hoven who criticises automatic deployment of safety systems. When undeserved full trust is placed in a less than perfect algorithmic system, a human operator cannot disengage the system even when they consider the judgement to be erroneous. By shifting responsibility to the machine, the human must comply or bear an unacceptably high weight of moral risk. Instead, while algorithms can recommend a decision, the human operator should maintain autonomy and therefore responsibility over the final outcome. Zeng Yi provides two further examples. Deep neural nets are efficient image classifiers yet as Zeng’s study shows, changing crucial pixels can confuse an algorithm into mistaking a turtle for a rifle. To avoid this misclassification having real-world consequences, once again a human must take responsibility for moderating the machine’s judgement. Finally, the case against moral abdication in favour for responsibility retention is perhaps best exemplified by the development of Brain-Computer interfaces. In the situation we do not carefully ascribe risk to different actors, would killing at the hand of a robotic arm be the responsibility of the human attached to the arm, the roboticist who designed the technology, or the robot itself? To avoid such situations, ethical and responsible governance is required for human AI ‘optimising symbiosis’.

Beyond the specific recommendations of retaining responsibility in the redesign of ethical systems, the panel implored an understanding of cross-cultural attitudes to AI. Yolanda Lannquist sees considerable common ground across the 84 AI ethical principles in other countries versus one developed with the help of Zeng Yi here in Beijing. The national strategies share goals such as accessibility, accountability, safety, fairness and transparency. Such alignment of goals provides an a priori positive case for the scope of global coordination in AI Governance. Yet from the panel, a more nuanced understanding lands. As Danit Gal summarised, theoretical similarities of shared AI ethics exist globally but the application or cultural understanding happens locally. Allowing for and understanding different interpretations of these AI principles, and keeping vagueness in international mandates, retains flexibility of culturally-specific technological development. Internationally cooperative AI strategy does not necessarily in force identical national AI strategies. Three comments on cross-cultural differences were central to the panel: priorities, interpretation and partnership.

Japanese professor Arise Ema highlights priorities are likely to be different. A country will focus on principles which align with their most pressing problems. In the Japanese case, the national strategy focuses on human-machine labour as complements or substitutes due to the country-specific backdrop of a super-aging society and an industry comprised of many automatable jobs.

Gal warns of the differential interpretation of the same ethical code in different countries. In China, accessibility and privacy are placed in governmental hands, with the assimilation of data in government repositories where they can monitor misuse or violations of privacy. Data in the West is often stored in datasilos owned by tech companies so the concepts of accessibility and privacy are borne out in a different context entirely. Further, our notion of controllability depends inextricably on cultural treatment of hierarchy, where in South Korea a clear human-machine hierarchy moderates design of controllable AI but in Japan, the human-machine hierarchy appears flatter in structure, exemplified by world’s first Robot Hotel in Tokyo. By inspecting the hotel’s workings, Professor Arise Ema reveals beneath the surface it is unclear when humans or robots are treated as more valuable. The manager admitted while robots could clean corridors, humans cleaned the rooms. Yet at reception, robots took full control of pleasant well-mannered guests, and humans were left to deal with the emotional labour from angry or difficult cases. In this shared structure of responsibility, it is unclear who is in control. Finally, what we consider to be a fair or prejudiced decision depends on societal structure, and experience of diverse populations within that society. A product developed in China may be fair for Han ethnicity citizens but in another country could display considerable bias. These examples only begin to illustrate the complexity of cross-border coordination in AI strategy.

Finally, Zeng Yi considers how priorities and interpretations direct different levels partnership between humans and AI. He triangulates these relationships into tools, competitors and partners. In the round table discussion, in response to an audience question, the speakers considered the root of these different human-machine relationship, specifically whether the media and creative industries play a role in mediating or manipulating public opinion, a relevant consideration given the timely release of the most recent Terminator film. The contrasting portrayal of mutually-destructive human and machine interaction in western films such as this one, versus the mutually-benefit friendship robots in Japanese cinema introduces a discrepancy of public expectations to whether future technologies will help or harm. Crucially, Zeng’s trifurcation allows for these dynamic considerations: as artificial general intelligence or artificial super intelligence appears on the horizon, a tool-only based relationship becomes less realistic. Understanding the current climate of cross-cultural attitudes to AI can inform our judgement on whether future humans see AI as antagonising competition or harmonising partners. This judgement of the future should remain central to designing our present-day governance principles because as Zeng warns, by building different pathways into AI, we build in different risks.

Source:  Zeng Yi

The general consensus arising from this panel is the need for a globally inclusive governance of AI. The assimilation of key takeaway from each speaker share the recommendation of retaining human responsibility in a re-invention of ethical systems whilst maintaining flexibility to apply these ethical principles of AI across borders and cultures. Yet the question remains of how such recommendations are implemented in practice. As Brian Tse realises, the international relations landscape will be substantially altered by unbalanced AI development paths, especially between global superpowers. How can we replace this international competition with a cooperative rivalry? Yolanda Lannquist proposes a multi-agent approach across many stakeholders, where collaboration between those in academia, nonprofits, government and the public is required to synthesise common norms. Arise Ema hopes to build greater Chinese involvement at the global discussion table, by drawing on the existing Japanese-German-France initiative which fosters a dialogue between experts in the East and the West. Wendell Wallach proposes the delegation of obligation to International Governance Coordination Committees, whose purpose would be to devise ethical principles at an institutional level for the global community and then advise policy locally for national application. Wallach’s 1st International Congress for the Governance of AI (ICGAI) happening in Prague next year holds promise in producing an agenda for successfully implementing ‘Agile and Comprehensive International Governance of AI’. Implementation challenges remain and uncertainties about the future cloud precise policy prescriptions, but in the meantime, as this panel demonstrates, conferences like these are a step in the right direction to foster the diversity of discussion required for inclusive and mutually-beneficial AI.


What Exactly is High-Level Machine Intelligence?

High-Level Machine Intelligence (HMLI) has been considered as “achieved when unaided machines can accomplish every task better and more cheaply than human workers” (Grace et al., 2018), but is this definitional approach appropriate?

I don’t think so. The definition requires a threefold change: specifying human-level versus human-like intelligence, specifying a set of non-infinite tasks and specifying our judgement of ‘better’.

‘Human-like’ versus ‘Human-level’

At current, Professor Boden (2015) describes artificial intelligence as a “bag of highly specialised tricks”, seeing the advent of true ‘human-level’ intelligence a distant threat. Present AI is sometimes labelled as an ‘idiot savant’, drastically outperforming humans in specific tasks but this ‘performance’ cannot be extended to many other tasks which humans complete on a day-to-day basis with such little cognition power a child or even an infant could master. Diversity is thus key: human-level intelligence requires people can learn models and skills to apply them to arbitrary new tasks and goals. For a machine to beat a human at Go requires a neural network to be trained on hundreds of millions of games. While it can be successful in this domain specific scenario, consider now we change the objective function to purposeful losing, to being the last player to pick up a stone or even to beating opponent but only by enough to not embarrass them. While a human player could adapt to these new situations quickly,  an AI model would require substantial retraining and reconfiguring. A crucial aspect of human intelligence thus requires transfer learning. One change to the question must state that one AI agent trained on one set of data, however large this may be, can adapt to different objective functions and different constraints. Otherwise AI can remain as “a bag highly specialised tricks”, where each separately trained model can excel at just one task but not across the board; yet diversity is a key component of human-level intelligence. Humans also have the ability to learn their weaknesses and improve in the future. It may be required then that a machine cannot only perform the task better than a human worker, but continually improve its own performance by for example rewriting its own python scripts analogous to the human process of self-development.

Yudkowsky (2008) considers the nexus of artificial intelligence and existential risk arising from convergent instrument goals, avoiding the trap of anthropomorphising AI. Human-level ≠ human-like. Lake et al. (2015) are advocates of ‘building machines that learn and think like people’. They consider incorporating intuitive physics and psychology. This richer starting point would allow technologies such as neural networks to converge to human performance with less training examples, making human-like not only human-level decisions. Grace et al. (2018) ascribes somewhat to this view by asking experts: when can a machine can beat the best human Go players but with a similar level of training examples, in the tens of thousands, rather than requiring hundreds of millions of games. While using the human brain as our best example of developed intelligence can provide fruitful research ventures, such a requirement for human-level intelligence to be human-like is overly restrictive.   If we abide by true human-like learning, as Crick (1989) famously criticised, the commonly used technique of backpropagation requires that information be transmitted backwards along the axon, an impossible process in the reality of neuronal function. A less puritanical criticism is why must machines must think like humans if they achieve the same outcome. Whilst the paper requires an artificial Go player to have the same experience as a human player, it does not specify an artificial musician must have learned to the same number of songs as Taylor Swift to imitate her writing style. Nor do we require a neural network to classify images using a lens, a cornea and an optic nerve. Admittedly the black-box of machine learning algorithms is an area requiring study but our definition of human-level intelligence must be clear in whether this is required to be human-like intelligence and if we want to enforce this stricture. Despite its sophistication, human intelligence relies on behavioural biases and heuristics which can give rise to irrational or racially and sexist discriminatory actions, raising the philosophical question of what human-level intelligence really means and whether mimicking its imperfections is a desirable development path to take.

Non-infinite Task Set

One can easily come up with a set of tasks that we do not require AI to perform better in. To list a few, do we require an AI to dance better, to go to the toilet better, or to offer companionship for the elderly better? As Kai Fu Lee, a leading Chinese AI specialist and AI optimist notes that some tasks, especially those requiring empathy and compassion, are profoundly human and can stay that way. Reaching human-level intelligence need not be limited by developing human-level emotional capacity if such capabilities are not required in the tasks AI must perform. In fact, in the literature on the future of capitalism, advocates of AI hope for digital socialism where humans maintain their comparative advantage over machines in a subset of non-automated tasks requiring exactly the aspects of human nature which cannot easily be coded or trained into a machine’s learning process. We thus require a subset of tasks, perhaps 95%, leaving the remaining for human workers.

Towards a Better Definition of ‘Better’

Being ‘better’ at a task is measurable in a number of different ways. AI may reach human-level or even superhuman performance at certain tasks but retain subpar performance in other components. The cost component has here been specified but vagueness in detail creates vagueness in prediction. If an AI can do what a human can do for 1000 times the hourly wage, this is clearly sub-optimal. However, stating an AI must be ‘cheaper’ than one human worker is also naive if a machine has a higher than 1:1 replacement ratio. This can be overcome by referring to human workers in the plural. Yet vagueness remains in the term ‘better’ , thus introducing scope for different interpretations of this survey question. Does better mean quicker, more accurate or making more efficient use of resources? To illustrate consider the following personal example. After being in a road accident last week and suffering a few broken bones, I have lost use of my arm. My human capability to type this blog post is severely limited. Instead I have used voice dictation software for speech to text recognition. On one hand, this technology is faster, cheaper and less demanding of external resources compared to dictating to fellow human. Yet, on the other, it cannot offer me grammatical, topical or semantic advice, nor does it recognise less frequently used words such as ‘Bayesian’, ‘a priori’ or ‘Nick Bostrom’. Equally, unlike a human, it does not understand if I am making a true statement so cannot warn me to validate claims or delete certain sentences. If weighing up whether this technology is ‘better’ than human help, on which metrics should we put more weight? Critically, our parameterisation of the definition depends on our primary concern so should be treated as domain specific.

Considering all of these points I would change the definition to address the following changes. To better confine interpretations of the requirements I offer one example of domain bifurcation:


  • Labour market domain: High-level machine intelligence (HLMI) is a machine which can perform all the composite tasks comprising 95 percent of human occupations currently listed on Census Occupation Tier 2 Classification, at an equivalent speed and accuracy rate as a median worker in that occupation, at the same cost as the human workers it replaces.


  • Security Domain: High-level machine intelligence (HLMI) is a machine which can perform all the composite tasks comprising 5% percent of human occupations in AI research, cybersecurity, governance intelligent units and military strategy at an equivalent speed and accuracy rate as a median worker in that occupation, at cost as the human workers it replaces.


While these alternative definitions do mitigate some problems of vagueness and variability of interpretation, they do not remove it entirely. The unknown nature of undeveloped technologies advancing on an uncertain timeline inevitably renders the question of when AI will reach high-level intelligence definitionally ambiguous to some degree.

Diversity and Dating: Does Online Dating Help or Hinder Diversity of Matches?

On one hand, online dating websites connect users simultaneously to hundreds or thousands of profiles, promising enormous expansion in partner diversity. On the other, filtering and algorithmic matchmaking introduces risk for the pool of partners to be less diverse by ethnicity, by personality, or by any other (potentially irrelevant) input to the black-box models. So which is it? More diversity or more similarity?

The Case for MORE diversity

Matchmaking has existed for millennia, but in the 21st century, the search for love has gone online, and for some individuals is now mediated by sophisticated mathematical algorithms. Under traditional forms of matchmaking, third parties – religious leaders, parents and other connections within a closed-form social network – recommended romantic partners from a narrow pool of individuals (Finkel et al., 2012). Selection from this restricted `field of eligibles’ (Kerckhoff, 1964), endorsed endogamy where partners from the same social group (ethnicity, religion, or culture) come together, making exogamy, the act of marrying a diverse partner, a rarity. The field-of-eligibles hypothesis is upheld as an explaination for spousal correlations, in contrast to the possibility of similarity preference in partner attributes (Berscheid & Reis, 1998). Assortive matching, where potential matches share educational or economic circles (Becker, 1973), occurs even in populations with no individual preference for homogeneity (Burley, 1983). 

The advent of online dating has changed the fundamentals of searching for prospective partners, altering both the romantic acquaintance process and the compatibility matching process. The platform is not only penetrating more of society with over 41% of the world active (Paisley 2018) but also more socially accepted (Whitty and Carr 2006). The legacy of such sites is beginning to emerge with 20% of current committed romantic relationships beginning online (Xie et al., 2014). In particular, the pervasiveness of online dating has expanded users’ access to potential romantic partners which they would otherwise be unlikely to encounter. The Internet permits not only a death of distance geographically, with the ability to communicate without face-to-face encounters, but also death of distance in social network interactions. The pool is no longer defined by community or culture. Applying Feld’s focus theory, online dating is a hyper-focus platform, compared to traditional foci characterised by socially homogenous groups- such as religious congregations, workplaces or nightclubs (Feld 1982; Schmitz 2014). Online interactions can go beyond an existing network, introducing the potential for social discovery among high socio-structural heterogeneity. A priori, the case for greater diversity in online dating is clear.

The Case for MORE Similarity

With great choice, comes great choice fatigue, self-selection and intuition are unfeasible with such a large array of potential partners. In China, out of its 200 million single people, a quarter (54 million) used online dating services in 2016. The ability to browse and potentially match with over 50 million people is a daunting prospect. Instead, dating websites offer new choice infrastructures in filtering and recommending but both restrict diversity of viewed profiles. 

Filter theory, proposed by Kerchoff and Davis (1962), describes homogeny in partner selection where people interact partners filtered by similarity of social demographic factors. Online search introduces greater scope for individuals to filter over selective and defined criteria, enforcing an unprecedented degree of parametrisation of potential partners. Precise attribute selection tools allow dating site users to eliminate huge groups of the population who don’t meet specific desires. However, by creating extensive checklists they may be closing their minds to possibilities, especially given the compression of compatibility criteria to modular attributes such as education or income, excluding important offline forms of self-presentation such as facial expressions or humour (Schmitz 2014). Some studies exist on how this filtering mechanism operationally affects successful matches, with Rudder (2014) considering how probability of messages or likes are derived from different partner attributes. 

Technology has expanded the inputs to matchmaking, where not only hundreds of profile traits but also second-by-second user interaction behaviours can be used in mathematical algorithms to recommend potential partners. These digital traces introduce new complexity into designs of recommer systems. Content and collaborative filtering algorithms have wide commercial applications, working under the assumption `if you like person x, you will like person y’. However, unlike purchase or movie recommendations, successful matches critically require a ‘double coincidence of wants’ (Hitsch et al., 2010). Reciprocal recommenders such as RECON (Pizzato et al., 2011) offer higher success rates of conversion between recommendations, initiations and matches. Distance scoring systems attempt to minimise the Euclidean distance between partners across attributes (Hu et al., 2019). What these systems have in common is a structural design based on likeness. As Finkel et al., (2012) summarises, “[s]imilarity is a potent principle in online dating.” The recommended set of partners may indeed be even more homogenous than the traditional field of eligibles. Consider a dating website which demands users fill out a personality test, measuring attributes such as morality, extraversion or self-confidence. A matching algorithm could purposefully not recommend partners who misalign on these measures despite their potential compatibility in the real world. In fact, self-expansion theory argues people gain confidence from acceptance by dissimilar partners (Aron & Aron, 1997; Aron et al., 2006). As such, although users believe they can access a wider field, the irony is they are actually accessing users much more similar to themselves. The potential for recommender systems to diminish diversity has considerable consequence. Across social science literature, evidence suggests contact with dissimilar others and expansion of information flows beyond a close social network can broaden perspective and deepen empathy for other racial, religions or socioeconomic groups (Wright et al., 1997). Resnick et al., (2013) consider recommender systems as creators of content bubbles, responsible for the entrenchment of inaccurate or polarised beliefs. If recommender systems are trained on implicit feedback data such as clicks or messages, the training data is pre-biased by the filters already applied by the user, reducing diversity and impeding social discovery yet further. 

Which have you experienced?