Urdu alphabet

Urdu alphabet

The Urdu letters in order is the right-to-left letter set utilized for the Urdu language. It is an adjustment of the Persian letter set known as Perso-Arabic, which is itself a subsidiary of the Arabic letter set. The Urdu letters in order has up to 40 letters.[1] With 39 fundamental letters and no unmistakable letter cases, the Urdu letters in order is ordinarily written in the calligraphic Nastaʿlīq content, though Arabic is all the more usually in the Naskh style.


Typically, uncovered transliterations of Urdu into Roman letters (called Roman Urdu) discard numerous phonemic components that have no comparable in English or different dialects normally written in the Latin content. The National Language Authority of Pakistan has built up various frameworks with explicit documentations to imply non-English sounds, yet these must be legitimately perused by somebody effectively acquainted with the advance letters



The Urdu language developed as a particular register of Hindustani well before the Partition of India. It is recognized most by its broad Persian impacts (Persian having been the official language of the Mughal government and the most unmistakable most widely used language of the Indian subcontinent for a few centuries before the hardening of British pioneer rule amid the nineteenth century). The standard Urdu content is a changed adaptation of the Perso-Arabic content and has its roots in thirteenth century Iran. It is firmly identified with the advancement of the Nastaliq style of Perso-Arabic content. Urdu content in its all-encompassing structure is known as Shahmukhi content and is utilized for composing other Indo-Aryan dialects of North Indian subcontinent like Punjabi and Saraiki too.


Regardless of the creation of the Urdu  in 1911, Urdu papers kept on distributing prints of manually written contents by calligraphers known as katibs or khush-navees until the late 1980s. The Pakistani national paper Daily Jang was the principal Urdu paper to utilize Nastaʿlīq PC based sythesis. There are endeavors under approach to grow increasingly complex and easy to use Urdu support on PCs and the web. These days, almost all Urdu papers, magazines, diaries, and periodicals are formed on PCs with Urdu programming programs.


Urdu and Hindi, an official government language of India, are distinctive registers of a similar language, and therefore they are commonly understandable and can utilize each other’s content to compose the other’s language. Utilization of content for the most part implies the client’s confidence: Muslims by and large utilize the Urdu (Perso-Arabic) content, while Hindus utilize the Devanagari content. Notwithstanding Pakistan, the Urdu content is authentic in five conditions of India with a generous level of Hindustani-speaking Muslims: Bihar, Delhi, Jammu and Kashmir, Telangana, and Uttar Pradesh.


Other than the Indian subcontinent, the Urdu content is likewise utilized by Pakistan’s huge diaspora, incorporating into the United Kingdom, the United Arab Emirates, the United States, Canada, Saudi Arabia, and other places.[2]


The Nastaʿlīq calligraphic composition style started as a Persian blend of contents Naskh and Ta’liq. After the Mughal victory, Nasta’liq turned into the favored composition style for Urdu. It is the predominant style in Pakistan, and numerous Urdu authors somewhere else on the planet use it. Nastaʿlīq is more cursive and streaming than its Naskh partner.


The Urdu content is an abjad content gotten from Perso-Arabic content, which is itself a subordinate of the Arabic content. The Urdu letters in order was institutionalized in 2004 by the National Language Authority, which is in charge of institutionalizing Urdu in Pakistan. As per the National Language Authority, Urdu has 58 letters of which 39 are essential letters while 18 are digraphs to speak to suctioned consonants made by joining fundamental consonant letters with a variation of He called do chashmi he.[3][4][1] Tāʼ marbūṭah is likewise now and then considered a letter however it is once in a while utilized aside from in certain credit words from Arabic.


As an abjad, the Urdu content just shows consonants and long vowels; short vowels must be induced by the consonants’ connection to one another. While this kind of content is helpful in Semitic dialects like Arabic and Hebrew, whose consonant roots are the key of the sentence, Urdu is an Indo-European language, which does not have a similar extravagance, subsequently requiring more remembrance.

Differences from Persian alphabet

Urdu has more letters added to the Persian base to represent sounds not present in Persian, which already has additional letters added to the Arabic base itself to represent sounds not present in Arabic. The letters added include: ٹ to represent /ʈ/, ڈ to represent /ɖ/, ڑ to represent /ɽ/, ں to represent /◌̃/, and ے to represent /ɛ:/ or /e:/. Furthermore, a separate do-cashmi-he letter, ھ, exists to denote a /ʰ/ or a /ʱ/. This letter is mainly used as part of the multitude of digraphs, detailed below.

No. Name[5] ALA-LC[6] Hunterian[7] IPA Isolated glyph
1 الف alif ā, ʾ, – /ɑː, ʔ, ∅/ ا
2 با ba b /b/ ب
3 پا pa p /p/ پ
4 تا ta t /t/ ت
5 ٹا ṭa t /ʈ/ ٹ
6 ثا sa s s /s/ ث
7 جيم jīm j /d͡ʒ/ ج
8 چيم cīm c ch /t͡ʃ/ چ
9 بڑی حا baṛī ḥa h /ɦ/ ح
10 خا kha kh kh /x/ خ
11 دال dāl d /d/ د
12 ڈال ḍāl d /ɖ/ ڈ
13 ذال zāl z z /z/ ذ
14 را ra r /r/ ر
15 ڑا ṛa r /ɽ/ ڑ
16 زاين zain z /z/ ز
17 ژاين zhain zh zh /ʒ/ ژ
18 سین sīn s /s/ س
19 شین shīn sh sh /ʃ/ ش
20 صاد suād s s /s/ ص
21 ضاد zuād z z /z/ ض
22 طو to’e t t /t/ ط
23 ظو zo’e z z /z/ ظ
24 عین ʿain āoe, ʿ, – /ɑː, oː, eː, ʔ, ʕ, ∅/ ع
25 غین ghain gh gh /ɣ/ غ
26 فا fa f /f/ ف
27 قاف qāf q /q/ ق
28 کاف kāf k /k/ ک
29 گاف gāf g /ɡ/ گ
30 لام lām l /l/ ل
31 میم mīm m /m/ م
32 نون nūn n /n, ɲ, ɳ, ŋ/ ن
33 نون غنّه nūn ghunnah n /◌̃/ ں
34 واؤ wāʾo vūoau wūoau /ʋ, uː, oː, ɔː/ و
35 چھوٹی ہا
گول ہا
choṭī ha
gol ha
h /ɦ/ or /∅/ ہ
36 دو چشمی ها do-cashmī ha h /ʰ/ or /ʱ/ ھ
37 ہمزہ hamzah ʾ, – /ʔ/, /∅/ ء
38 چھوٹی يا choṭī ya yīá /j, iː, ɑː/ ی
39 بڑی يا baṛī ya ai, e /ɛː, eː/ ے

No Urdu word begins with ں, ھ, ڑ or ے. The digraphs of aspirated consonants are as follows.

No. Digraph[6] Transcription[6] IPA Example
1 بھ bh [bʱ] بھاری
2 پھ ph [pʰ] پھول
3 تھ th [tʰ] تھم
4 ٹھ ṭh [ʈʰ] ٹھیس
5 جھ jh [d͡ʒʱ] جھاڑی
6 چھ ch [t͡ʃʰ] چھوکرا
7 دھ dh [dʱ] دھوبی
8 ڈھ ḍh [ɖʱ] ڈھول
9 رھ rh [rʱ] no example?
10 ڑھ ṛh [ɽʱ] کڑھنا
11 کھ kh [kʰ] کھولنا
12 گھ gh [ɡʱ] گھبراہٹ
13 لھ lh [lʱ] no example?
14 مھ mh [mʱ] no example?
15 نھ nh [nʱ] ننھا
16 هھ hh [hʱ] no example?
17 وھ wh [ʋʱ] no example?
18 یھ yh [jʱ] no example?


Retroflex letters

Old Hindustani utilized four specks more than three Arabic letters to speak to retroflex consonants (ٿ, ڐ, ڙ).[8] In penmanship those spots was regularly composed like a little vertical line joined to a little triangle. In this manner, this shape has turned out to be indistinguishable to a little letter ط.[9] (It is generally and mistakenly expected that ṭāʾ itself was utilized to show retroflex dentals as a result of its being an unequivocal dental consonant that Arabic copyists thought approximated the retroflex dentals.


The Urdu language has 10 vowels and 10 nasalized vowels. Every vowel has four structures relying upon its position: starting, center, last and confined. Like in its parent Arabic letter set, Urdu vowels are spoken to utilizing a blend of digraphs and diacritics. Alif, Waw, Ye, He and their variations are utilized to speak to vowels.

Vowel chart

Urdu doesn’t have independent vowel letters. Short vowels (an, I, u) are spoken to by discretionary diacritics (zabar, zer, pesh) upon the former consonant or a placeholder consonant (alif, ain, or hamzah) if the syllable starts with the vowel, and long vowels by consonants alif, ain, ye, and wa’o as matres lectionis, with disambiguating diacritics, some of which are discretionary (zabar, zer, pesh), though some are not (madd, hamzah). Urdu does not have short vowels toward the finish of words. This is a table of Urdu vowels:


Romanization Pronunciation Final Middle Initial
a /ʌ/ N/A ـَ اَ
ā /aː/ ـَا، ـَی، ـَہ ـَا آ
i /ɪ/ N/A ـِ اِ
ī /iː/ ـِى ـِيـ اِی
e /eː/ ـے ـيـ اے
ai /ɛː/ ـَے ـَيـ اَے
u /ʊ/ N/A ـُ اُ
ū /uː/ ـُو اُو
o /oː/ ـو او
au /ɔː/ ـَو اَو


Alif is the first letter of the Urdu alphabet, and it is used exclusively as a vowel. At the beginning of a word, alif can be used to represent any of the short vowels: اب ab, اسم ism, اردو Urdū. For long ā at the beginning of words alif-mad is used: آپ āp, but a plain alif in the middle and at the end: بھاگنا bhāgnā.


Wāʾo is used to render the vowels “ū”, “o”, “u” and “au” ([uː], [oː], [ʊ] and [ɔː] respectively), and it is also used to render the labiodental approximant, [ʋ].


Ye is divided into two variants: choṭī ye (“little ye”) and baṛī ye (“big ye”).

Choṭī ye (ی) is written in all forms exactly as in Persian. It is used for the long vowel “ī” and the consonant “y”.

Baṛī ye (ے) is used to render the vowels “e” and “ai” (/eː/ and /ɛː/ respectively). Baṛī ye is distinguishable in writing from choṭī ye only when it comes at the end of a word/ligature. Additionally, Baṛī ye is never used to begin a word/ligature, unlike choṭī ye.

Letter’s name Final Form Middle Form Initial Form Isolated Form
چھوٹی يے

Choṭī ye

ـی ـیـ یـ ی
بڑی يے

Baṛī ye

ـے ے

The 2 he’s

He is divided into two variants: gol he (“round he”) and do-cashmī he (“two-eyed he”).

Gol he (ہ) is written round and zigzagged. It can only be used as in Persian.

Do-cashmī he (ھ) is written as in Arabic Naskh style (as a loop), in order to create the aspirate consonants and write Arabic words.

Letter’s name Final Form Middle Form Initial Form Isolated Form
گول ہے

Gol he

ـہ ـہـ ہـ ہ
دو چشمی ہے

Do-cashmī he

ـھ ـھـ هـ ھ


Ayn in its initial and final position is silent in pronunciation and is replaced by the sound of its preceding or succeeding vowel.

Nun Ghunnah

Nasalized vowels are represented by Nun Ghunnah written after their non nasalized versions, for example: ہَے when nasalized would become ہَیں. In middle form Nun Gunnah is written just like Nun and is differentiated by a diacritic called Maghnoona or Ulta Jazm which is a superscript V symbol above the ن٘.


Form Urdu Transcription
Orthography ں
End Form میں maiṉ
Middle Form کن٘ول kaṉwal


In Urdu Hamza is silent in all its forms except for when it is used as Hamza-e-Izafat. The main use of Hamza in Urdu is to indicate a vowel cluster.


Urdu uses the same subset of diacritics used in Arabic based on Persian conventions. Urdu also uses Persian names of the diacritics instead of Arabic names. Commonly used diacritics are Zabar (Arabic Fatḥah), Zer (Arabic Kasrah), Pesh (Arabic Ḍammah) which are used to clarify the pronunciation of vowels. Jazam (Arabic Sukun) is used to indicate a Consonant Cluster and Shad (Arabic Tashdid) which is used to indicate a Gemination. Other diacritics include Khari Zabar (Arabic Dagger alif), Do Zabar (Arabic Fathatan) which are found in some common Arabic loan words. Other Arabic diacritics are also sometimes used though very rarely in loan words from Arabic. Zer-e-Izafat and Hamza-e-Izafat are described in next section.

Other than common diacritics, Urdu also has special diacritics, which are often found only in dictionaries for the clarification of irregular pronunciation. These diacritics include Kasrah-e-Majhool, Fathah-e-Majhool, Dammah-e-Majhool, Maghnoona, Ulta Jazam, Alif-e-Wavi and some other very rare diacritics. Among these, only Maghnoona is used commonly in dictionaries and has a Unicode representation at U+0658. Other diacritics are only rarely written in printed form mainly in some advanced dictionaries.


Iẓāfat is a syntactical construction of two nouns, where the first component is a determined noun, and the second is a determiner. This construction was borrowed from Persian. A short vowel “i” is used to connect these two words. It may be written as zer ( ِ) at the end of the first word, but usually is not written at all. If the first word ends in choṭī he (ه) or ye (ی) then hamzā (ء) is used above the last letter (ۂ or ئ). If the first word ends in a long vowel then baṛī ye (ے) with hamzā on top (ئے) is written.[11]

Forms Example Transliteration Meaning
ــِ شیرِ پنجاب sher-e Panjāb the lion of Punjab
ۂ غزوهٔ ہند ghazwā-ye Hind the Conquest of India
ئ ولئ کامل walī-ye kāmil perfect saint
ئے روئے زمین -ye zamīn the surface of the Earth
صدائے بلند sadā-ye buland a high voice

Computers and the Urdu alphabet

In the early days of computers, Urdu was not properly represented on any code page. One of the earliest code pages to represent Urdu was IBM Code Page 868 which dates back to 1990.[12] Other early code pages which represented Urdu alphabets were Windows-1256 and MacArabic encoding both of which date back to the mid 1990s. In Unicode, Urdu is represented inside the Arabic block. Another code page for Urdu, which is used in India, is Perso-Arabic Script Code for Information Interchange. In Pakistan, the 8-bit code page which is developed by National Language Authority is called Urdu Zabta Takhti (اردو ضابطہ تختی) (UZT) [13] which represents Urdu in its most complete form including some of its specialized diacritics, though UZT is not designed to coexist with the Latin alphabet.

Encoding Urdu in Unicode

Like other writing systems derived from the Arabic script, Urdu uses the 0600–06FF Unicode range.[14] Certain glyphs in this range appear visually similar (or identical when presented using particular fonts) even though the underlying encoding is different. This presents problems for information storage and retrieval. For example, the University of Chicago’s electronic copy of John Shakespear’s “A Dictionary, Hindustani, and English”[15]includes the word ‘بهارت’ (India). Searching for the string “بھارت” returns no results, whereas querying with the (identical-looking in many fonts) string “بهارت” returns the correct entry.[16] This is because the medial form of the Urdu letter do chashmi he (U+06BE)—used to form aspirate digraphs in Urdu—is visually identical in its medial form to the Arabic letter hāʾ (U+0647; phonetic value /h/). In Urdu, the /h/ phoneme is represented by the character U+06C1, called gol he (round he), or chhoti he (small he).

Confusable glyphs in Urdu and Arabic script
Characters in Urdu Characters in Arabic

(U+06C1), ھ (U+06BE)






(U+0649), ي (U+064A)





In 2003, the Center for Research in Urdu Language Processing (CRULP)[17]—a research organisation affiliated with Pakistan’s National University of Computer and Emerging Sciences—produced a proposal for mapping from the 1-byte UZT encoding of Urdu characters to the Unicode standard  This proposal suggests a preferred Unicode glyph for each character in the Urdu alphabet.


Leave a Reply

Your email address will not be published. Required fields are marked *

Translate »
Social media & sharing icons powered by UltimatelySocial