China-U.S. Competition in Large Language Models: Global Perspectives on Opportunities and Challenges

Since ChatGPT was introduced, large language models (LLMs) have quickly become a focus in the global tech competition. LLMs are being applied in various fields, presenting ample opportunities for AI development. The U.S. currently leads in technology development and innovation, with its models excelling at the technological forefront. In contrast, Chinese models focus on optimizing…


As large language models (LLM) are being applied to an increasing number of sectors, striking a balance and fostering cooperation between China and the U.S. will be the key development in the forthcoming years.

Since the launch of ChatGPT, LLMs have swiftly become the spotlight in the global technology race. These models not only demonstrate extraordinary capabilities in interpreting dialogue and analogy, processing data, and handling creative tasks, but are also perceived as a crucial step in moving towards “Artificial General Intelligence”, which will possess cognitive and deduction capacities like human beings. LLMs sparked an emerging investment trend worldwide, taking the AI race between China and the U.S. to a new level. According to the “Artificial Intelligence Index Report 2024” published by Stanford University, the U.S. continues to outpace other regions in the development of foundational models, while China ranked top in the number of AI patent applications and successfully granted patents (see 【Note 1】). This reflects that the two countries adopt different paths and strategies when it comes to technological innovation and development. Against this backdrop, a comprehensive understanding and comparison of the latest LLM developments between China and the U.S. not only deciphers patterns and strategies in this competition, but also provides fresh perspectives and new opportunities for future international cooperation.

The China-U.S. LLM competition: A comparison of language environment in English and Chinese

The multilingual performance of a LLM is an important metric for assessing its global competitiveness. Although ChatGPT excels in English, its capabilities in other languages require further evaluation (see 【Note 2】). Similarly, LLMs created by Chinese teams or originating from Chinese models perform well in local language environments (see 【Note 3】), but are still inadequate when it comes to language tests in English. A comprehensive understanding of the performance of these LLMs in different language environments is of vast importance. For such, we compared the Chinese and English performance of 16 representative LLMs through systemic assessment frameworks in the first half of 2024 (see 【Note 4】 and 【Note 5】). These models originate from tech giants, top universities and AI start-ups from China and the U.S.

In English language testing environment, the GPT-4 Turbo ranked first thanks to its natural language processing and subject matter expertise, followed by Gemini Pro and Llama 2. China’s ERNIE Bot 4 is the best performing China-made model when it comes to English language test, ranking fifth overall, slightly above Claude 2 and GPT-3.5 Turbo, but still unable to surpass GPT-4 (see 【Note 5】). In the Chinese language testing environment, ERNIE Bot 4 outperforms GPT-4 Turbo, ranking first and has the best overall performance (see 【Note 4】). Overall, China’s models outperformed other models in the Chinese language environment, but still have room for improvement in performance in other language environments.

New Trend: The rise of multimodal and cross-disciplinary applications

As LLM technology continues to mature, LLMs are rapidly expanding into multimodal models and cross-disciplinary applications, becoming a new blue ocean for AI development. The multimodal capabilities allow models not only to process text but also to understand and generate images, audio and video content, significantly broadening their application scenarios. For example, OpenAI’s latest GPT-4o can simultaneously process text, audio and video messages, offering new possibilities for augmented reality, intelligent surveillance and auto-pilot systems.

At the same time, the cross-disciplinary applications of LLMs are also accelerating. Microsoft and OpenAI worked together to integrate GPT-4 into office software, helping users to enhance efficiency at work. Baidu’s ERNIE Bot is not only used for search engines but is also widely incorporated into scenarios such as customer services and smart home systems. On the other hand, LLMs in the vertical sector emerge to impress the industry with the latest technology. For instance, the OpenMEDLab2.0 – a medical multimodal model co-founded by the Shanghai Artificial Intelligence Laboratory and Ruijin Hospital — aims at empowering applications in intelligent image diagnosis, virtual surgery and intelligent clinical decision-making, in a bid to create future AI hospitals. These applications not only showcase the diverse potential of LLMs, but also give rise to a stronger demand for high-performing and high-security AI models.

Future development direction: Identify the differences between China and the U.S. and explore opportunities for collaboration

Looking ahead, the future development of LLMs will focus on further deepening their multi-modality, expanding cross-disciplinary application, enhancing security and ethical responsibility. Currently, the U.S. has a clear advantage in developing new fundamental technologies and innovative applications. Its models often excel at the technological forefront. Meanwhile, China’s models place greater emphasis on optimising their local language environment and their adaptability to practical applications. As LLMs are applied to an increasing number of fields, finding a balance between competition in both countries and promoting collaboration will become a key to development in the coming years.

The competition and cooperation between China and the U.S. not only influence their respective technological ecosystems, but also create long-lasting impacts on the development of the global AI industry. Hong Kong, with its unique international background, and advantages in finance, technology, and location, is poised to become an important bridge in global competition and collaboration. By promoting multi-faceted cooperation in technology, talent and policy, Hong Kong will play a significant role in international research exchanges, technology transfer, and industrial collaboration, leading the exploration of the limitless potential of artificial intelligence.

【Note 1】:https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf
【Note 2】:http://arxiv.org/abs/2302.04023
【Note 3】:https://cevalbenchmark.com/static/leaderboard_zh.html
【Note 4】:https://www.hkubs.hku.hk/aimodelrankings/report
【Note 5】:https://www.hkubs.hku.hk/aimodelrankings/report/en

Professor Zhenhui Jack Jiang,
Professor of Innovation and Information, HKU Business School

Jiaxin Li
Ph.D. Student in Innovation and Information Management, HKU Business School

This article was also published on September 18, 2024 on the Financial Times’ Chinese website.

Translation

中美大語言模型競逐:全球視角下的機遇與挑戰


隨著大語言模型被應用到越來越多的領域,如何在中美兩國的競爭動態中找到平衡點幷推動合作,將成爲未來幾年的發展關鍵。

自從 ChatGPT 問世以來,大語言模型LLM(Large Language Model)迅速成爲全球科技競賽的焦點。這些模型不僅在對話類比、數據處理、創意任務等方面展現出非凡的能力,更被視爲邁向通用人工智能AGI(Artificial General Intelligence)——即具備人類般認知和推理能力的人工智能—— 的關鍵一步。 這場技術革命激發全球投資熱潮,也將中美兩國在人工智能領域的競爭推向新的高度。根據斯坦福大學發布的《2024人工智能指數報告》,美國在基礎模型研發方面持續大幅領先,而中國無論在AI專利申請和成功獲取專利數量上均位居榜首,反映兩國在技術創新發展中有不同路徑與策略【注 1】。在這一背景下,深入瞭解幷比較中美兩國大語言模型的發展現狀,不僅有助于洞悉這場競爭的格局,還爲未來國際合作提供了新的視角與機會。

中美LLM競爭格局:中英文環境的比較


大語言模型的多語言表現是衡量其全球競爭力的重要標準。雖然ChatGPT在英文環境中表現出色,但在其他語言環境中的能力仍需進一步驗證【注 2】; 同樣,源自中國團隊或原生于中文的大模型在本土語境中表現優异【注 3】,但在英文測試中可能仍有不足,全面瞭解這些模型在不同語言環境中的表現尤其重要。爲此,我們通過系統性的評估框架,在2024年上半年對16個具代表性的大語言模型進行了中英文表現的比較【注 4】【注 5】。這些模型來自中美兩國的技術巨頭、頂尖大學,以及人工智能領域的新銳初創企業。

在英文環境的測試中,GPT-4 Turbo憑藉其卓越的自然語言能力和學科專長排名第一; 而Gemini Pro和Llama 2則分別位列第二和第三。中國的文心一言4(ERNIE Bot 4)是英文測試中表現最佳的國産模型,綜合排名僅達第五位,略高于Claude 2和GPT-3.5 Turbo,唯未能超越GPT-4【注 5】。 在中文測試中,文心一言4則超越GPT-4 Turbo排名第一,整體表現最佳【注 4】。總體來看,中國的領先模型在中文環境中的表現强勁,但在其他語言環境下仍有較大提升空間。

新動態:多模態與跨領域應用的興起


隨著大語言模型技術不斷成熟,LLM正迅速向多模態與跨領域應用擴展,成爲AI發展的新藍海。多模態能力讓模型不僅能够處理文本,還能理解和生成圖像、音頻和視頻內容,大大拓寬了其應用場景。例如,OpenAI最新推出的GPT-4o能够同時處理文本、語音和視覺信息,爲增强現實、智能監控和自動駕駛等應用提供了新的可能性。

與此同時,LLM的跨領域應用也在不斷加速。微軟與OpenAI合作,把GPT-4深度集成到辦公軟件中,幫助用戶提高工作效率; 百度的文心大模型不僅用于搜索引擎,還被廣泛納入企業客服和智能家居等場景。此外,垂直領域LLM也層出不窮。例如上海人工智能實驗室(上海AI實驗室)與瑞金醫院合作推出“浦醫2.0”(OpenMEDLab2.0)醫療多模態基礎模型群,旨在賦能智能影像診斷、虛擬手術、智慧臨床决策等應用場景建設,爲未來的“AI醫院”提供支持。這些應用案例不僅充分展示LLM的多樣化潜力,也催生市場對高性能、高安全性AI模型的强烈需求。

未來發展方向:中美差异與合作機遇


展望未來,大語言模型的發展將重點圍繞多模態能力的進一步深化、跨領域應用的擴展,以及安全性與倫理責任的提升。目前,美國在基礎技術開發和創新應用上具有明顯優勢,其模型往往在技術前沿上表現出色; 而中國的模型更强調針對本土語言環境的優化和實際應用的適應性。隨著LLM被應用到越來越多領域,如何在兩國的競爭中找到平衡點幷推動合作,將成爲未來幾年的發展關鍵。

中美兩國的競爭和合作不僅影響著各自的科技生態,也對全球AI産業的發展方向産生深遠影響。在此背景下,香港憑藉其獨特的國際化背景,以及金融、科技和區位優勢,有望成爲全球競爭與合作中的重要橋梁。通過推動技術、人才和政策的多方協作,香港將會在國際研究交流、技術轉移和産業合作中發揮重要作用,帶領探索人工智能的無限潜力。

【注 1】:https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf

【注 2】:http://arxiv.org/abs/2302.04023

【注 3】:https://cevalbenchmark.com/static/leaderboard_zh.html

【注 4】:https://www.hkubs.hku.hk/aimodelrankings/report

【注 5】:https://www.hkubs.hku.hk/aimodelrankings/report/en

蔣鎮輝教授
港大經管學院創新及資訊管理學教授

李佳欣女士
港大經管學院創新及資訊管理學博士生

(本文同時于二零二四年九月十九日載于《FT中文網》「明德商論」專欄)