Arabic Case Insensitive In Database Systems: How To Solve Alef With and Without Hamza Problem

  • Create a custom collation in your database system
  • Add a normalized field to your table
  • Use regular expressions in queries
+----+--------------+
| id | name |
+----+--------------+
| 1 | احمد |
| 2 | أحمد |
| 3 | أسامه |
| 4 | أسامة |
| 5 | اسامه |
| 6 | اسَامه |
+----+--------------+
6 rows in set

1. Create a custom collation

SHOW VARIABLES LIKE 'character_sets_dir';
+--------------------+----------------------------+
| Variable_name | Value |
+--------------------+----------------------------+
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------+----------------------------+
1 row in set (0.00 sec)
<collation name="utf8_arabic_ci" id="1029">
<rules>
<reset>\u0627</reset> <!-- Alef 'ا' -->
<i>\u0623</i> <!-- Alef With Hamza Above 'أ' -->
<i>\u0625</i> <!-- Alef With Hamza Below 'إ' -->
<i>\u0622</i> <!-- Alef With Madda Above 'آ' -->
</rules>
<rules>
<reset>\u0629</reset> <!-- Teh Marbuta 'ة' -->
<i>\u0647</i> <!-- Heh 'ه' -->
</rules>
<rules>
<reset>\u0000</reset> <!-- Unicode value of NULL -->
<i>\u064E</i> <!-- Fatha 'َ' -->
<i>\u064F</i> <!-- Damma 'ُ' -->
<i>\u0650</i> <!-- Kasra 'ِ' -->
<i>\u0651</i> <!-- Shadda 'ّ' -->
<i>\u064F</i> <!-- Sukun 'ْ' -->
<i>\u064B</i> <!-- Fathatan 'ً' -->
<i>\u064C</i> <!-- Dammatan 'ٌ' -->
<i>\u064D</i> <!-- Kasratan 'ٍ' -->
</rules>
</collation>
  • 'أ','إ','آ' is the same character as 'ا' (All the Alef forms are one character)
  • 'ه' is the same character as 'ة' (so "نسمة" is the same as "نسمه")
  • Tashkil characters are ignored completely as if they aren’t there
sudo service mysql restart
ALTER TABLE persons
MODIFY name VARCHAR(50)
CHARACTER SET 'utf8'
COLLATE 'utf8_arabic_ci';
SELECT * FROM persons WHERE name = "اسامة";+----+--------------+
| id | name |
+----+--------------+
| 3 | أسامه |
| 4 | أسامة |
| 5 | اسامه |
| 6 | اسَامه |
+----+--------------+
4 rows in set (0.00 sec)

2. Add a normalized field

function normalize_name($name) {
$patterns = array( "/إ|أ|آ/" ,"/ة/", "/َ|ً|ُ|ِ|ٍ|ٌ|ّ/" );
$replacements = array( "ا" , "ه" , "" );
return preg_replace($patterns, $replacements, $name);
}
normalize_name("أحمد");  // return: احمد
normalize_name("آمنة"); // return: امنه
normalize_name("أسامه"); // return: اسامه
normalize_name("مٌحَمَّد"); // return: محمد
1  احمد   احمد
2 أحمد احمد
3 أسامه اسامه
4 أسامة اسامه
5 اسامه اسامه
6 اسَامه اسامه
SELECT id, name FROM persons WHERE normalized_name = "اسامه";+----+--------------+
| id | name |
+----+--------------+
| 3 | أسامه |
| 4 | أسامة |
| 5 | اسامه |
| 6 | اسَامه |
+----+--------------+
4 rows in set (0.00 sec)

3. Use Regular Expressions in queries

"[ا|أ|إ|آ]سام[ه|ة]"
SELECT id, name FROM persons WHERE name REGEXP "[ا|أ|إ|آ]سام[ه|ة]";+----+------------+
| id | name |
+----+------------+
| 3 | أسامه |
| 4 | أسامة |
| 5 | اسامه |
+----+------------+
3 rows in set (0.01 sec)
function generate_pattern($search_string) {
$patterns = array( "/(ا|إ|أ|آ)/", "/(ه|ة)/" );
$replacements = array( "[ا|إ|أ|آ]", "[ه|ة]" );
return preg_replace($patterns, $replacements, $search_string);
}
generate_pattern("أسامة"); // return '[ا|إ|أ|آ]س[ا|إ|أ|آ]م[ة|ه]'
generate_pattern("أسامه"); // return '[ا|إ|أ|آ]س[ا|إ|أ|آ]م[ة|ه]'
generate_pattern("احمد"); // return '[ا|إ|أ|آ]حمد'

Conclusion

 by the author.

--

--

--

Web developer, Arabic Native, open-source enthusiastic, trying to do things the right way.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

SSR-based Optimization of Double 11 Virtual Venue — A More Complex Rendering Architecture

Prepare GCP Associate Cloud Engineer Certification Exam — 1

Introducing Forge: A Neblio Powered NFT And Blockchain Gaming Platform

Agile in MeetU Engineering

Try Habitat with Container Service

Broadcasting and multicasting millions of clients using TCP

Setting up PostgreSQL in Debian-based Linux

Why Open Source ?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Essam

Ahmed Essam

Web developer, Arabic Native, open-source enthusiastic, trying to do things the right way.

More from Medium

DESIGN PATTERNS IN JAVA

Varsity Feature: Swimming by Sabee D.G. (8–1)

Everything You Need To Know When Migrating to Oracle Cloud — Opkey Blog

DOCKER ARCHITECTURE