Skip to main content

Introduction of Unicode (python)

Introduction of Unicode

Vikas Tomar's DEV Profile
The American Standard Code for Information Interchange, better known by its acronym ASCII, was standardized. ASCII defined numeric codes for various characters, with the numeric values running from 0 to 127

  • Unicode Provide the Unique number for every Character, like as the Uppercase letter 'A' is assigned the '65' Code Value. 
  • Unicode consist of Uppercase ,lowercase letter, emoji , each each every character as code value.
  • Unicode may be 8 bit ,16 bit Or 32 bit..




Unicode has lot of benefit in programming and Unicode doesn't matter what the Platform, what Language , what  Program.

  • Let See Example Of Using ASCII Value:
  •  ASCII Code(65  to 90)

for i in range(65,91):
    print(chr(i),end=" ")
#OUTPUT A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

  • Similar we can use the lowercase alphabets by using (97 to 121) ASCII code value.


UNICODE TABLE :            
                                           

Properties :

UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.) UTF-8 uses the following rules:

If the code point is <128, it’s represented by the corresponding byte value.
If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.
Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

UTF-8 has several convenient properties:

It can handle any Unicode code point.
A Unicode string is turned into a string of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes.

A string of ASCII text is also valid UTF-8 text.
UTF-8 is fairly compact; the majority of code points are turned into two bytes, and values less than 128 occupy only a single byte.

If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8.

Unicode Properties

The Unicode specification includes a database of information about code points.
import unicodedata
u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
for i, c in enumerate(u):
    print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
    print(unicodedata.name(c))


  • Explanation : 


%04x is a format specifier, which only makes sense with the % operator. It specifies that the number to be formatted (in this case, ord(c)  should be formatted as a hexadecimal with four digits and zero-padding on the left. So if c is "A" , whose Unicode code point is 65, it will be printed as 0041 .


  • Unicodedata.category(chr)


This function returns the general category assigned to the given character as string. For example, it returns ‘L’ for letter and ‘u’ for uppercase.

  • Example :

import unicodedata 
print unicodedata.category(u'A') 
print unicodedata.category(u'b') 

  • Output : Lu                                            # Letter Lowercase .


                       Ll                                             # Letter Uppercase .







Popular posts from this blog

Introduction to Java Security

Introduction to Java Security The Java security architecture includes a large set of application programming interfaces (APIs), tools, and implementations of commonly-used security algorithms, mechanisms, and protocols. The Java security APIs span a wide range of areas. Cryptographic and public key infrastructure (PKI) interfaces provide the underlying basis for developing secure applications. Interfaces for performing authentication and access control enable applications to guard against unauthorized access to protected resources. The JDK includes a number of providers that implement a core set of security services. It also allows for additional custom providers to be installed. This enables developers to extend the platform with new security mechanisms. The JDK is divided into modules. Modules that contain security APIs include the following:

Module Description java.base Defines the foundational APIs of Java SE;  contained packages include java.securityjavax.cryptojavax.net.ssl,  and…

SQL Injection

Overview A SQL injection attack consists of insertion or "injection" of a SQL query via the input data from the client to the application. A successful SQL injection exploit can read sensitive data from the database, modify database data (Insert/Update/Delete), execute administration operations on the database (such as shutdown the DBMS), recover the content of a given file present on the DBMS file system and in some cases issue commands to the operating system. SQL injection attacks are a type of injection attack, in which SQL commands are injected into data-plane input in order to effect the execution of predefined SQL commands. Threat ModelingSQL injection attacks allow attackers to spoof identity, tamper with existing data, cause repudiation issues such as voiding transactions or changing balances, allow the complete disclosure of all data on the system, destroy the data or make it otherwise unavailable, and become administrators of the database server.SQL Injection is ve…

Insertion Node in the Linkelist.

In this post, methods to insert a new node in linked list are discussed. A node can be added in three ways
1) At the front of the linked list
2) After a given node.
3) At the end of the linked list
public class Linkedlist { Node head; class Node{ int data; Node next; Node(int d){ data =d; next=null; } } // INSERT THE NODE AT THE BEGIN OF LINKEDLIST. public void insertAtfront(int new_data){ // Node temp = head; Node new_node = new Node(new_data); new_node.next = head; head = new_node; }  // INSERT THE NODE AT THE GIVEN POSITION IN LINKEDLIST.
public void insertAtGiven(Node prev_node,int new_data) { if(prev_node == null){ System.out.print("previous node can't be null"); return; } Node new_node = new Node(new_data); new_node.next = prev_node.next; prev_node.next = new_node; } // INSERT THE NODE  AT THE END OF THE LINKEDLIST.   public void insertAtEnd(int new_data){ Node new_node = new Node(new_data); new_node.nex…