What is the difference between openPangu 2.0 Flash and Pro?

Flash is 92B total parameters with 6B active, live on GitCode since June 30, 2026, and runs on a single Ascend 910B for inference. Pro is 505B total with 18B active, 512K context, weights planned for July 2026, suited for ultra-long documents and second-stage pre-training.

How do I choose between openPangu 2.0 and DeepSeek V4 Pro?

Pick DeepSeek V4 Pro (~200B active) for code and complex reasoning. Choose openPangu 2.0 for 512K context, domestic Ascend deployment, 2x Ascend throughput, and full-stack training code openness.

What is the fastest way to try openPangu 2.0?

Register for Huawei Cloud, open ModelArts AI Gallery, subscribe to Flash or Pro, and call the Chat Completions API. For self-hosting, download weights and inference code from GitCode Ascend Tribe.

Huawei openPangu 2.0 Open Source: 505B MoE, 512K Context, Full Ascend Stack Explained

01

Five misconceptions about openPangu 2.0 that break deployment decisions

Richard Yu unveiled openPangu 2.0 at HDC 2026 (June 12, 2026, Dongguan Songshan Lake). Flash weights shipped June 30. Most coverage still treats it as "another open model." These blind spots directly affect procurement and architecture choices.

01
Equating open source with weights only: Industry norm is weights plus inference. openPangu 2.0 plans to release pre-training code, post-training code, and Ascend training operators—rare at this MoE scale.
02
Underestimating the no-NVIDIA milestone: DeepSeek, Qwen, Kimi, and Llama all trained on NVIDIA hardware. openPangu 2.0 ran entirely on Ascend 910B—the first frontier-scale model trained and open-sourced without a single A100 or H100.
03
Dismissing 512K context because general benchmarks lag: DeepSeek V4 Pro still leads code and complex reasoning, but 512K context is openPangu's differentiator—roughly eight full-length novels in one prompt.
04
Confusing Flash and Pro release cadence: Flash (92B total / 6B active) is live now; Pro (505B / 18B active) weights are planned for July 2026; pre-training and post-training code roll out in H2 2026.
05
Deploying the model without planning the host layer: Ascend uses torch_npu; HarmonyOS edge uses Embedded builds. If your Agent also runs Xcode, Claude Code, or OpenClaw, inference on Ascend and tooling on macOS is a layered split—same pattern as a multi-model routing gateway.

02

Timeline, Pro vs Flash specs, and seven open components

Key timeline

Date	Event
2026-06-12	HDC 2026 keynote: Richard Yu officially launches openPangu 2.0
2026-06-30	Flash weights, base inference code, and training operators on GitCode
2026-07 (planned)	Pro model weights and inference code go live
H2 2026 (planned)	Pre-training code, post-training code, and additional training operators

Pro vs Flash core parameters

Metric	openPangu 2.0 Pro	openPangu 2.0 Flash
Total parameters	505B	92B
Active parameters	18B	6B
Sparsity ratio	~28:1	~15:1 (Flash DSA+SWA can reach extreme sparsity)
Context window	512K	512K
Availability	Planned July 2026	Live since June 30, 2026

Seven open components (full-stack release plan)

Component	Status
Model architecture definition	Released
Model weights (Flash)	Released 2026-06-30
Technical report	Released with weights
Inference code + training operators	Released 2026-06-30
Model weights (Pro)	Planned July 2026
Pre-training code	Planned H2 2026
Post-training code (SFT / RLHF)	Planned H2 2026

The first four items match typical open releases. Pre-training code, post-training code, and Ascend training operators at this MoE scale are exceptionally rare—researchers and enterprises can genuinely reproduce frontier training from scratch.

The license is Huawei's openPangu License: commercial use allowed, royalty-free, non-exclusive. Full terms are in the GitCode Ascend Tribe repositories.

03

Architecture deep dive: MoE design, Ascend training, and developer stack

Architecture innovations

mHC (Multi-Head Combinatorial) routing: Improves expert routing efficiency and reduces MoE load imbalance
Muon optimizer: Second-order momentum scheme (Microsoft research) that stabilizes large-scale training
ModAttn (Modular Attention): Modular attention blocks that support 512K ultra-long context
DSA+SWA ultra-sparse attention (Flash only): Extreme sparsity ratio—6B active parameters draw on a 92B knowledge base with compute close to a dense 6B model

Hardware and training breakthroughs

openPangu 2.0 is the first frontier LLM fully trained on non-NVIDIA hardware, running end-to-end on Huawei Ascend 910B NPUs with zero A100 or H100 involvement.

Single-card throughput: Up to 2x mainstream open models on Ascend
Super-node training efficiency: Roughly +30% improvement
512K long-sequence training throughput: Roughly +50% improvement
Train-inference consistency: >99% (a common MoE pain point)
Inference latency: About 1.2x better than comparable models
Edge Embedded variant: 30B on-device model—50% faster inference, 20% less memory, offline on Kirin phones
Flash-Int8 quantized build: W4A8, 40% memory reduction, <10% accuracy loss

Software stack and deployment platforms

CANN (CUDA-class runtime) + torch_npu (PyTorch adapter)—add import torch_npu to switch backends
Cloud: Huawei Cloud ModelArts API
Open source: GitCode Ascend Tribe self-hosted weights
Edge: HarmonyOS native integration; HarmonyOS 7 Agent-era native AI engine

!

Benchmark disclaimer: Independent third-party benchmarks are still in progress. The capability matrix below is based on architecture inference, not measured scores. This article will be updated when official results publish.

04

Competitor comparison: parameters, capability matrix, and scenario picker

Headline parameter comparison

Model	Total params	Active params	Context	Training hardware	Open depth
openPangu 2.0 Pro	505B	18B	512K	Ascend NPU	Full stack (7 components)
openPangu 2.0 Flash	92B	6B	512K	Ascend NPU	Full stack (7 components)
DeepSeek V4 Pro	1.6T	~200B	128K	NVIDIA	Weights + inference
Qwen 3.7 Max	~400B+	varies	128K	NVIDIA	Weights + inference + partial training
Kimi K2.7	1T	32B	256K	NVIDIA	Weights + inference
Llama 4 405B	405B	—	128K	NVIDIA	Weights + inference

Capability matrix (architecture inference, pending benchmark validation)

Capability	openPangu 2.0 Pro	DeepSeek V4 Pro	Qwen 3.7 Max	Kimi K2.7
Code generation	Strong	Best-in-class	Very strong	Very strong
Complex reasoning	Strong	Best-in-class	Best-in-class	Very strong
Tool use / Agent	Very strong	Very strong	Very strong	Best-in-class
Ultra-long context	Best-in-class	Strong	Strong	Very strong
Inference efficiency	Best-in-class	Moderate	Moderate	Very strong
Supply-chain control	Best-in-class	Limited	Limited	Limited
Full-stack open source	Best-in-class	Good	Good	Good

Scenario picker

Scenario	Recommendation	Why
Code generation / complex reasoning	DeepSeek V4 Pro	~200B active, leading performance
Agent / multi-tool orchestration	Kimi K2.7	Mature MCP ecosystem
Ultra-long documents (>256K tokens)	openPangu 2.0 Pro	512K context is the clear choice
Domestic stack / no NVIDIA dependency	openPangu 2.0	Only frontier model trained on non-NVIDIA hardware
Ascend / Huawei Cloud deployment	openPangu 2.0	Native optimization, 2x throughput
Edge / phone deployment	openPangu Embedded	30B on-device, Kirin offline
Low-cost local inference	openPangu 2.0 Flash	6B active, ~96GB unified memory

05

Six-step runbook: ModelArts API and GitCode self-deployment

Hardware requirements reference

Variant	Recommended hardware	Minimum config	Notes
Flash (6B active)	Single Ascend 910B	~96GB unified memory	Community tests on large-memory systems possible
Flash-Int8	Single Atlas A2	~48GB memory	W4A8, <10% accuracy loss
Pro (18B active)	4+ Ascend 910B cards	Multi-card cluster	Validate after July 2026 weight release

Six-step deployment guide

01
Pick a path: No hardware? Start with ModelArts API (register Huawei Cloud, open AI Gallery, search openPangu 2.0, subscribe Flash or Pro). Have an Ascend cluster? Self-deploy from GitCode.
02
Clone repos: Visit gitcode.com/org/ascend-tribe and clone openPangu-2.0-Flash, openPangu-2.0-Infer, and openPangu-2.0-Op (operators).
03
Configure CANN + torch_npu: Install Ascend drivers and CANN; add import torch_npu to switch your PyTorch project to the Ascend backend.
04
Flash single-card inference: Run the inference script on 910B with --context_length 512000 to validate long context (reduce if memory is tight).
05
Quantize or distribute: Tight memory? Use openPangu-2.0-Flash-Int8. Pro uses multi-card distributed_inference.py after July weights land.
06
Domain fine-tuning (optional): LoRA example: finetune.py --method lora --lora_rank 16. After pre-training code opens in H2 2026, second-stage pre-training becomes possible.

API call example (ModelArts)

bash

curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: ${TOKEN}" \
  -d '{
    "model": "openpangu-2.0-flash",
    "messages": [{"role": "user", "content": "Hello, introduce yourself briefly."}],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

Flash single-card inference example

bash

python inference.py \
  --model_path ./openPangu-Flash \
  --device npu:0 \
  --context_length 512000 \
  --precision bf16

Citable hard numbers

Parameter scale: Pro 505B / 18B active, Flash 92B / 6B active, both with 512K context
Ascend throughput: Single-card 2x mainstream open models; 512K training throughput +50%
Train-inference parity: MoE distribution consistency >99%
Quantization gain: Flash-Int8 memory -40%, accuracy loss <10%
HarmonyOS Agent: Agent framework 2.0 complex task success rate >90% (powered by openPangu 2.0)
Context analogy: 512K equals roughly eight full novels or an entire large codebase in one prompt

06

Strategic significance: supply-chain independence, HarmonyOS Agent, and cross-platform hosts

With U.S. export controls limiting A100 and H100 access, openPangu 2.0 proves a frontier MoE can be trained without NVIDIA—not just a technical milestone, but a direct challenge to the CUDA monopoly narrative. At HDC 2026, Richard Yu stated: "In my dictionary for the rest of my life, there is no second place—only first."

Full-stack open source lets researchers reproduce training, enterprises run vertical second-stage pre-training, and developers lower Ascend compute barriers. HarmonyOS 7 enters the Agent era with openPangu 2.0 as the native AI engine; a 30B edge model runs offline on Kirin phones.

openPangu 2.0 may not beat DeepSeek V4 Pro on general benchmarks, but across 512K context, domestic stack control, Ascend-native throughput, full-stack openness, and edge deployment it has few substitutes. If your stack spans HarmonyOS Agent + iOS / Xcode CI + OpenClaw multi-model routing, running inference on Ascend and tooling on macOS is the practical split—a closed laptop kills overnight jobs; a Linux VPS lacks Metal and Keychain. VpsMesh Mac Mini M4 cloud rental bundles 24/7 uptime and native Apple tooling into monthly OpEx. See Mac Mini M4 rental pricing, setup in the help center, and order at cloud order page.

Disclaimer: Some capability ratings are architecture-inferred estimates. This article will be updated when independent third-party benchmark results publish. Published: July 1, 2026.

FAQ

Three questions readers ask most

Flash (92B / 6B active) went live June 30, 2026, runs on a single 910B, suited for high-concurrency API serving. Pro (505B / 18B active) arrives in July 2026, best for 512K long documents and second-stage pre-training. Weights are on GitCode Ascend Tribe.

Pick DeepSeek V4 Pro (~200B active) for code and complex reasoning. Choose openPangu 2.0 for 512K context, domestic compliance, 2x Ascend throughput, and full-stack training code. Running both? See OpenClaw multi-model routing.

Not for pure Ascend or ModelArts workloads. If your stack includes Xcode, Claude Code, or OpenClaw daemons, Mac Mini M4 monthly rental is more stable. Pricing at Mac Mini M4 rental pricing; order at cloud order page.

Huawei openPangu 2.0 Open Source: The First Frontier MoE Trained Without NVIDIA GPUs

Five misconceptions about openPangu 2.0 that break deployment decisions

Timeline, Pro vs Flash specs, and seven open components

Key timeline

Pro vs Flash core parameters

Seven open components (full-stack release plan)

Architecture deep dive: MoE design, Ascend training, and developer stack

Architecture innovations

Hardware and training breakthroughs

Software stack and deployment platforms

Competitor comparison: parameters, capability matrix, and scenario picker

Headline parameter comparison

Capability matrix (architecture inference, pending benchmark validation)

Scenario picker

Six-step runbook: ModelArts API and GitCode self-deployment

Hardware requirements reference

Six-step deployment guide

API call example (ModelArts)

Flash single-card inference example

Citable hard numbers

Strategic significance: supply-chain independence, HarmonyOS Agent, and cross-platform hosts

Three questions readers ask most