HDC 2026 timeline · 7 open components · Ascend-only training · competitor matrix · six-step runbook
On June 30, 2026, Huawei delivered its HDC 2026 promise: openPangu-2.0-Flash weights, inference code, and training operators went live on GitCode. If you evaluate open-source frontier MoE models, Ascend-native stacks, or supply-chain-independent AI, this guide covers the HDC timeline, Pro and Flash specs, seven open components, mHC, Muon, ModAttn, and DSA+SWA architecture, the fact that it is the first frontier model trained entirely without NVIDIA GPUs, a competitor matrix against DeepSeek, Qwen, and Kimi, ModelArts API plus GitCode self-host runbook, HarmonyOS Agent strategy, the openPangu License, and why Mac Mini M4 cloud rental still matters when your Agent stack spans Ascend inference and macOS toolchains.
Richard Yu unveiled openPangu 2.0 at HDC 2026 (June 12, 2026, Dongguan Songshan Lake). Flash weights shipped June 30. Most coverage still treats it as "another open model." These blind spots directly affect procurement and architecture choices.
Equating open source with weights only: Industry norm is weights plus inference. openPangu 2.0 plans to release pre-training code, post-training code, and Ascend training operators—rare at this MoE scale.
Underestimating the no-NVIDIA milestone: DeepSeek, Qwen, Kimi, and Llama all trained on NVIDIA hardware. openPangu 2.0 ran entirely on Ascend 910B—the first frontier-scale model trained and open-sourced without a single A100 or H100.
Dismissing 512K context because general benchmarks lag: DeepSeek V4 Pro still leads code and complex reasoning, but 512K context is openPangu's differentiator—roughly eight full-length novels in one prompt.
Confusing Flash and Pro release cadence: Flash (92B total / 6B active) is live now; Pro (505B / 18B active) weights are planned for July 2026; pre-training and post-training code roll out in H2 2026.
Deploying the model without planning the host layer: Ascend uses torch_npu; HarmonyOS edge uses Embedded builds. If your Agent also runs Xcode, Claude Code, or OpenClaw, inference on Ascend and tooling on macOS is a layered split—same pattern as a multi-model routing gateway.
| Date | Event |
|---|---|
| 2026-06-12 | HDC 2026 keynote: Richard Yu officially launches openPangu 2.0 |
| 2026-06-30 | Flash weights, base inference code, and training operators on GitCode |
| 2026-07 (planned) | Pro model weights and inference code go live |
| H2 2026 (planned) | Pre-training code, post-training code, and additional training operators |
| Metric | openPangu 2.0 Pro | openPangu 2.0 Flash |
|---|---|---|
| Total parameters | 505B | 92B |
| Active parameters | 18B | 6B |
| Sparsity ratio | ~28:1 | ~15:1 (Flash DSA+SWA can reach extreme sparsity) |
| Context window | 512K | 512K |
| Availability | Planned July 2026 | Live since June 30, 2026 |
| Component | Status |
|---|---|
| Model architecture definition | Released |
| Model weights (Flash) | Released 2026-06-30 |
| Technical report | Released with weights |
| Inference code + training operators | Released 2026-06-30 |
| Model weights (Pro) | Planned July 2026 |
| Pre-training code | Planned H2 2026 |
| Post-training code (SFT / RLHF) | Planned H2 2026 |
The first four items match typical open releases. Pre-training code, post-training code, and Ascend training operators at this MoE scale are exceptionally rare—researchers and enterprises can genuinely reproduce frontier training from scratch.
The license is Huawei's openPangu License: commercial use allowed, royalty-free, non-exclusive. Full terms are in the GitCode Ascend Tribe repositories.
openPangu 2.0 is the first frontier LLM fully trained on non-NVIDIA hardware, running end-to-end on Huawei Ascend 910B NPUs with zero A100 or H100 involvement.
import torch_npu to switch backendsBenchmark disclaimer: Independent third-party benchmarks are still in progress. The capability matrix below is based on architecture inference, not measured scores. This article will be updated when official results publish.
| Model | Total params | Active params | Context | Training hardware | Open depth |
|---|---|---|---|---|---|
| openPangu 2.0 Pro | 505B | 18B | 512K | Ascend NPU | Full stack (7 components) |
| openPangu 2.0 Flash | 92B | 6B | 512K | Ascend NPU | Full stack (7 components) |
| DeepSeek V4 Pro | 1.6T | ~200B | 128K | NVIDIA | Weights + inference |
| Qwen 3.7 Max | ~400B+ | varies | 128K | NVIDIA | Weights + inference + partial training |
| Kimi K2.7 | 1T | 32B | 256K | NVIDIA | Weights + inference |
| Llama 4 405B | 405B | — | 128K | NVIDIA | Weights + inference |
| Capability | openPangu 2.0 Pro | DeepSeek V4 Pro | Qwen 3.7 Max | Kimi K2.7 |
|---|---|---|---|---|
| Code generation | Strong | Best-in-class | Very strong | Very strong |
| Complex reasoning | Strong | Best-in-class | Best-in-class | Very strong |
| Tool use / Agent | Very strong | Very strong | Very strong | Best-in-class |
| Ultra-long context | Best-in-class | Strong | Strong | Very strong |
| Inference efficiency | Best-in-class | Moderate | Moderate | Very strong |
| Supply-chain control | Best-in-class | Limited | Limited | Limited |
| Full-stack open source | Best-in-class | Good | Good | Good |
| Scenario | Recommendation | Why |
|---|---|---|
| Code generation / complex reasoning | DeepSeek V4 Pro | ~200B active, leading performance |
| Agent / multi-tool orchestration | Kimi K2.7 | Mature MCP ecosystem |
| Ultra-long documents (>256K tokens) | openPangu 2.0 Pro | 512K context is the clear choice |
| Domestic stack / no NVIDIA dependency | openPangu 2.0 | Only frontier model trained on non-NVIDIA hardware |
| Ascend / Huawei Cloud deployment | openPangu 2.0 | Native optimization, 2x throughput |
| Edge / phone deployment | openPangu Embedded | 30B on-device, Kirin offline |
| Low-cost local inference | openPangu 2.0 Flash | 6B active, ~96GB unified memory |
| Variant | Recommended hardware | Minimum config | Notes |
|---|---|---|---|
| Flash (6B active) | Single Ascend 910B | ~96GB unified memory | Community tests on large-memory systems possible |
| Flash-Int8 | Single Atlas A2 | ~48GB memory | W4A8, <10% accuracy loss |
| Pro (18B active) | 4+ Ascend 910B cards | Multi-card cluster | Validate after July 2026 weight release |
Pick a path: No hardware? Start with ModelArts API (register Huawei Cloud, open AI Gallery, search openPangu 2.0, subscribe Flash or Pro). Have an Ascend cluster? Self-deploy from GitCode.
Clone repos: Visit gitcode.com/org/ascend-tribe and clone openPangu-2.0-Flash, openPangu-2.0-Infer, and openPangu-2.0-Op (operators).
Configure CANN + torch_npu: Install Ascend drivers and CANN; add import torch_npu to switch your PyTorch project to the Ascend backend.
Flash single-card inference: Run the inference script on 910B with --context_length 512000 to validate long context (reduce if memory is tight).
Quantize or distribute: Tight memory? Use openPangu-2.0-Flash-Int8. Pro uses multi-card distributed_inference.py after July weights land.
Domain fine-tuning (optional): LoRA example: finetune.py --method lora --lora_rank 16. After pre-training code opens in H2 2026, second-stage pre-training becomes possible.
curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
-H "Content-Type: application/json" \
-H "X-Auth-Token: ${TOKEN}" \
-d '{
"model": "openpangu-2.0-flash",
"messages": [{"role": "user", "content": "Hello, introduce yourself briefly."}],
"max_tokens": 1024,
"temperature": 0.7
}'
python inference.py \ --model_path ./openPangu-Flash \ --device npu:0 \ --context_length 512000 \ --precision bf16
With U.S. export controls limiting A100 and H100 access, openPangu 2.0 proves a frontier MoE can be trained without NVIDIA—not just a technical milestone, but a direct challenge to the CUDA monopoly narrative. At HDC 2026, Richard Yu stated: "In my dictionary for the rest of my life, there is no second place—only first."
Full-stack open source lets researchers reproduce training, enterprises run vertical second-stage pre-training, and developers lower Ascend compute barriers. HarmonyOS 7 enters the Agent era with openPangu 2.0 as the native AI engine; a 30B edge model runs offline on Kirin phones.
openPangu 2.0 may not beat DeepSeek V4 Pro on general benchmarks, but across 512K context, domestic stack control, Ascend-native throughput, full-stack openness, and edge deployment it has few substitutes. If your stack spans HarmonyOS Agent + iOS / Xcode CI + OpenClaw multi-model routing, running inference on Ascend and tooling on macOS is the practical split—a closed laptop kills overnight jobs; a Linux VPS lacks Metal and Keychain. VpsMesh Mac Mini M4 cloud rental bundles 24/7 uptime and native Apple tooling into monthly OpEx. See Mac Mini M4 rental pricing, setup in the help center, and order at cloud order page.
Disclaimer: Some capability ratings are architecture-inferred estimates. This article will be updated when independent third-party benchmark results publish. Published: July 1, 2026.
Flash (92B / 6B active) went live June 30, 2026, runs on a single 910B, suited for high-concurrency API serving. Pro (505B / 18B active) arrives in July 2026, best for 512K long documents and second-stage pre-training. Weights are on GitCode Ascend Tribe.
Pick DeepSeek V4 Pro (~200B active) for code and complex reasoning. Choose openPangu 2.0 for 512K context, domestic compliance, 2x Ascend throughput, and full-stack training code. Running both? See OpenClaw multi-model routing.
Not for pure Ascend or ModelArts workloads. If your stack includes Xcode, Claude Code, or OpenClaw daemons, Mac Mini M4 monthly rental is more stable. Pricing at Mac Mini M4 rental pricing; order at cloud order page.