Apache Cassandra is a data management system designed and developed to handle huge amounts of data across multiple servers. It is open source, meaning its source code is freely available for anyone to study, modify and use.
Unlike the traditional Relation Database Management Systems (RDMS), Apache Cassandra does not use Structured Query Language (SQL), instead it implements Cassandra Query language (CQL, which experts say is a subset of SQL.
Cassandra is a well-known NoSQL system. Some of the NoSQL solutions currently in use include Mongo DB, Cassandra, Dynamo DB, Couch DB and HBase. Among all these, Apache Cassandra stands out in terms of popularity and use. A Cassandra set up of approximately 300TB is said to be in use, as per the official site.
Companies using Apache Cassandra
Some of the popular companies whose applications are known to use Cassandra include: Facebook, Twitter, Cisco, Netflix, eBay, Comcast, Reddit and Adobe among others. Most of these companies are internet companies and handle large amounts of data. Facebook is the pioneer user.
How is Cassandra Database Different from other Databases?
The singularity of Apache Cassandra comes from its unique architecture. It employs a peer-to-peer architecture unlike most management systems that embrace a master-slave form of arrangement. In a master-slave format, all requests are made to the master server. You realize in such a case, that if a master server is affected, some slaves are also affected, and thus the general performance of the system is impacted. Scalability is also an issue when some factors like performance speed are considered.
In contrast, the peer-to-peer arrangement that is favored by Cassandra allows peers to share data among themselves. This way, single points of failure threats are eliminated, and as a result the performance of the system is improved, because at no time will a slave be rendered useless.
Each node in a peer to peer format is equal. It can share or request data at any time. This is a strong feature, especially when it comes to scalability, as nodes can be added to any cluster easily. This is the reason Apache Cassandra is suitable for applications that process huge amounts of data or those whose growth is expected in the near future.
Advantages of Apache Cassandra
- Allows scalability. One can easily expand the system components to accommodate more.
- No single point of failure. This ensures that applications are always running.
- Flexibility in data storage. Both Structured and Unstructured data can be stored.
- Simplifies data management. Does not require complex configurations to manage stored data, because this data is in a cluster where nodes are equal.
- Simplifies data distribution by allowing one to replicate data to whatever point he/she desires.
- Better performance. Cassandra offers high write and read throughput.
- Supports replication, hence disasters can be recovered easily.
Disadvantages of Apache Cassandra
- It does not work well on existing applications. Needs to be used right from the initial stage of developing the application.
- Does not handle many-to-many requests well.