Java开发网 - Re:我的缓存思路和源代码（Java版的，含测试JSP）

Java开发网

您没有登录

» Java开发网 » 高性能Java数据库缓存

打印话题 寄给朋友 订阅主题

作者

Re:我的缓存思路和源代码（Java版的，含测试JSP） [Re:liuaike]

floater

Java Jedi

总版主

发贴: 3233

于 2008-07-31 03:32

There are two aspects on this subjects:
1. being distributed: across the JVM boundary to form a larger, unified system.
2. data management: how to manage the data across the JVM bounday.
Each of these is already complicated enough. When combining them together, it could go only worse.

I am trying to expand more, with either a concern or a requirement, to provide some background explanation.

The 11 points that I posted are not my imagination, or copied from textbooks. But they are from my experience at work, reading on the internet, tryout of some APIs, requirements from work, etc. I think there are more points, but they are beyond my experience and expertise. If you want to go beyond these points, you need to consult more experienced, specialized engineers, such as folks working at Tangosol.

1. clustered: another term is data coherence. We need to duplicate data across JVMs with n copies, where n can be specified by end users. Otherwise we can't sustain a single JVM failure. JBoss, etc is using JGroup to do this. The tricky part is that while we duplicate data, we need to minimize the network traffic, otherwise the performance is terrible. Terracotta is sending delta(on the change, either new or updated) across the network. I tested with Tangosol before, its performance is great(xMB/second, x is confidential).

2. scalable: adding new machines to expand the total memory of the entire cache. All the open source solutions do not have this feature. Another term associated with this is data grid. The idea is that we have so many cheap machines and if we can hook them together, we can form a huge usable fast memory cache(100 times faster than disks). The hard part to implement this is how to determine which machine to store values. Normally they come up some kind of algorithm taking the keys to hash to a different key to decide which machine to go. This is where the cache providers get paid big time. I know a lot of companies are using hundreds of GB, and they are still growing. TB is not something is the far futures, likely to be next year.

3. It would be better to support both UDP/TCP because UDP is easier when adding new machines, but sometimes UDP is just not available due to company policies. So it would be better to use TCP for testing and UDP for production.
Commercial solutions always have both. JBoss is using JGroup and thus have both too, but others are lazy to include TCP.

4. Transactions is very very crucial because of the data integrity. However, this is related to the data granularity and how atomic the set/get methods are. Global transaction(such as JDBC + JMS) is another reason for this.

5. back up to the second storage: most of the cache implementations have this feature. My reason for this is that I need to maintain certain number of days of data because of the business requirements. They could be regulatory, back testing, etc. Back up could be asynchronous, multi-threaded.

6. Some of the cache implementations can have ODBC/JDBC drivers, and they can be treated as a relational cache, meaning using sql language to search through the value objects. These caches are termed data stores. One simply way to implement them is to use javacc write a sql parser and treat tables/columns as java objects/attributes and then run this parser across the entire cache. Since everything is in memory, a trivial implementation is not slow, though could be faster. memcached has a lot of apis for different languages. All commercial solutions have java/.net/c interfaces. Since a lot of financial applications have .net/excel interface and java backend.

7. The near/far cache management: JBoss has the TcpCacheLoader class. My scenario is that I have some memory hungry process, by itselt it will take nearly all the JVM memory, so I can't spend extra on cache in the same thread. I want to throw my results to the external cache, external to my current process. Most of the implementations will take some memory from the current process and join the cache cluster. They don't have the option to externalize the entire cache. Tangosol has an option to specify how much memory you want to give to the near cache(the cache setting in the current process). This is a vital feature in the distributed environment because most of the time when we go to distributed is due to either short of memory or long running time(in this case we go grid computing), and in either case, we need to collect the result back to a central location, either a database, or a cache.

8. Lock is the minimal requirement for concurrent access. In a distributed environment, we need more than a lock, much more than that because we have not only concurrent access, but also distributed access. So we will have deadlock, starvation, etc.

9. distributed event handling through JMS. It's kind of surprise that no implementation exception 1 does this. I encounter four times in the last two years where JMS + cache would solve a hell of problems, and yet kind of ironic. This is not a new idea. Rod Johnson implemented in his first book source code, just copy and paste. But this feature is just as powerful as its nondistributed version. However, this can be added outside caches as add-ons. But a built-in would be a timesaver.

Tangosol has all but last one. And the performance is good in my view. Another one is gigaspaces, but I don't like their salefolks and talks. I talked to both companies sale folks before.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

话题树型展开

人气	标题	作者	字数	发贴时间
32792	我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	4539	2008-07-24 13:48
28021	Re:我的缓存思路和源代码（Java版的，含测试JSP）	ababcc	16	2008-07-25 12:16
28272	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	143	2008-07-26 11:06
27610	Re:我的缓存思路和源代码（Java版的，含测试JSP）	Breeze	62	2008-07-26 11:56
27334	Re:我的缓存思路和源代码（Java版的，含测试JSP）	wes109	169	2008-07-30 10:11
27432	Re:我的缓存思路和源代码（Java版的，含测试JSP）	jameszhang	377	2008-07-30 19:12
27452	Re:我的缓存思路和源代码（Java版的，含测试JSP）	floater	420	2008-07-31 02:28
27717	Re:我的缓存思路和源代码（Java版的，含测试JSP）	floater	5478	2008-07-31 03:32
27380	Re:我的缓存思路和源代码（Java版的，含测试JSP）	floater	690	2008-07-31 03:53
27816	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	56	2008-07-31 10:04
27418	Re:我的缓存思路和源代码（Java版的，含测试JSP）	haibo	301	2008-08-01 10:00
27236	Re:我的缓存思路和源代码（Java版的，含测试JSP）	floater	51	2008-08-03 02:58
27252	Re:我的缓存思路和源代码（Java版的，含测试JSP）	haibo	6513	2008-08-04 10:43
28039	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	68	2008-08-07 17:24
27920	Re:我的缓存思路和源代码（Java版的，含测试JSP）	airport	25	2009-03-04 15:02
27858	Re:我的缓存思路和源代码（Java版的，含测试JSP）	bluepure	293	2008-07-26 16:23
27562	Re:我的缓存思路和源代码（Java版的，含测试JSP）	jameszhang	10	2008-07-27 16:40
27533	Re:我的缓存思路和源代码（Java版的，含测试JSP）	floater	2161	2008-07-29 02:17
28245	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	18	2008-07-29 11:50
27456	Re:我的缓存思路和源代码（Java版的，含测试JSP）	wes109	478	2008-07-29 17:27
28018	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	908	2008-07-29 18:57
27416	Re:我的缓存思路和源代码（Java版的，含测试JSP）	jameszhang	134	2008-07-29 21:07
27916	Re:我的缓存思路和源代码（Java版的，含测试JSP）	liuaike	114	2008-07-30 09:30

已读帖子

新的帖子

被删除的帖子