Java开发网 Java开发网
注册 | 登录 | 帮助 | 搜索 | 排行榜 | 发帖统计  

您没有登录

» Java开发网 » 高性能Java数据库缓存  

按打印兼容模式打印这个话题 打印话题    把这个话题寄给朋友 寄给朋友    该主题的所有更新都将Email到你的邮箱 订阅主题
reply to postflat modethreaded modego to previous topicgo to next topicgo to back
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-07-31 03:32 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
There are two aspects on this subjects:
1. being distributed: across the JVM boundary to form a larger, unified system.
2. data management: how to manage the data across the JVM bounday.
Each of these is already complicated enough. When combining them together, it could go only worse.

I am trying to expand more, with either a concern or a requirement, to provide some background explanation.

The 11 points that I posted are not my imagination, or copied from textbooks. But they are from my experience at work, reading on the internet, tryout of some APIs, requirements from work, etc. I think there are more points, but they are beyond my experience and expertise. If you want to go beyond these points, you need to consult more experienced, specialized engineers, such as folks working at Tangosol.

1. clustered: another term is data coherence. We need to duplicate data across JVMs with n copies, where n can be specified by end users. Otherwise we can't sustain a single JVM failure. JBoss, etc is using JGroup to do this. The tricky part is that while we duplicate data, we need to minimize the network traffic, otherwise the performance is terrible. Terracotta is sending delta(on the change, either new or updated) across the network. I tested with Tangosol before, its performance is great(xMB/second, x is confidential).

2. scalable: adding new machines to expand the total memory of the entire cache. All the open source solutions do not have this feature. Another term associated with this is data grid. The idea is that we have so many cheap machines and if we can hook them together, we can form a huge usable fast memory cache(100 times faster than disks). The hard part to implement this is how to determine which machine to store values. Normally they come up some kind of algorithm taking the keys to hash to a different key to decide which machine to go. This is where the cache providers get paid big time. I know a lot of companies are using hundreds of GB, and they are still growing. TB is not something is the far futures, likely to be next year.

3. It would be better to support both UDP/TCP because UDP is easier when adding new machines, but sometimes UDP is just not available due to company policies. So it would be better to use TCP for testing and UDP for production.
Commercial solutions always have both. JBoss is using JGroup and thus have both too, but others are lazy to include TCP.

4. Transactions is very very crucial because of the data integrity. However, this is related to the data granularity and how atomic the set/get methods are. Global transaction(such as JDBC + JMS) is another reason for this.

5. back up to the second storage: most of the cache implementations have this feature. My reason for this is that I need to maintain certain number of days of data because of the business requirements. They could be regulatory, back testing, etc. Back up could be asynchronous, multi-threaded.

6. Some of the cache implementations can have ODBC/JDBC drivers, and they can be treated as a relational cache, meaning using sql language to search through the value objects. These caches are termed data stores. One simply way to implement them is to use javacc write a sql parser and treat tables/columns as java objects/attributes and then run this parser across the entire cache. Since everything is in memory, a trivial implementation is not slow, though could be faster. memcached has a lot of apis for different languages. All commercial solutions have java/.net/c interfaces. Since a lot of financial applications have .net/excel interface and java backend.

7. The near/far cache management: JBoss has the TcpCacheLoader class. My scenario is that I have some memory hungry process, by itselt it will take nearly all the JVM memory, so I can't spend extra on cache in the same thread. I want to throw my results to the external cache, external to my current process. Most of the implementations will take some memory from the current process and join the cache cluster. They don't have the option to externalize the entire cache. Tangosol has an option to specify how much memory you want to give to the near cache(the cache setting in the current process). This is a vital feature in the distributed environment because most of the time when we go to distributed is due to either short of memory or long running time(in this case we go grid computing), and in either case, we need to collect the result back to a central location, either a database, or a cache.

8. Lock is the minimal requirement for concurrent access. In a distributed environment, we need more than a lock, much more than that because we have not only concurrent access, but also distributed access. So we will have deadlock, starvation, etc.

9. distributed event handling through JMS. It's kind of surprise that no implementation exception 1 does this. I encounter four times in the last two years where JMS + cache would solve a hell of problems, and yet kind of ironic. This is not a new idea. Rod Johnson implemented in his first book source code, just copy and paste. But this feature is just as powerful as its nondistributed version. However, this can be added outside caches as add-ons. But a built-in would be a timesaver.

Tangosol has all but last one. And the performance is good in my view. Another one is gigaspaces, but I don't like their salefolks and talks. I talked to both companies sale folks before.



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

话题树型展开
人气 标题 作者 字数 发贴时间
32792 我的缓存思路和源代码(Java版的,含测试JSP) liuaike 4539 2008-07-24 13:48
28021 Re:我的缓存思路和源代码(Java版的,含测试JSP) ababcc 16 2008-07-25 12:16
28272 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 143 2008-07-26 11:06
27610 Re:我的缓存思路和源代码(Java版的,含测试JSP) Breeze 62 2008-07-26 11:56
27334 Re:我的缓存思路和源代码(Java版的,含测试JSP) wes109 169 2008-07-30 10:11
27432 Re:我的缓存思路和源代码(Java版的,含测试JSP) jameszhang 377 2008-07-30 19:12
27452 Re:我的缓存思路和源代码(Java版的,含测试JSP) floater 420 2008-07-31 02:28
27717 Re:我的缓存思路和源代码(Java版的,含测试JSP) floater 5478 2008-07-31 03:32
27380 Re:我的缓存思路和源代码(Java版的,含测试JSP) floater 690 2008-07-31 03:53
27816 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 56 2008-07-31 10:04
27418 Re:我的缓存思路和源代码(Java版的,含测试JSP) haibo 301 2008-08-01 10:00
27236 Re:我的缓存思路和源代码(Java版的,含测试JSP) floater 51 2008-08-03 02:58
27252 Re:我的缓存思路和源代码(Java版的,含测试JSP) haibo 6513 2008-08-04 10:43
28039 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 68 2008-08-07 17:24
27920 Re:我的缓存思路和源代码(Java版的,含测试JSP) airport 25 2009-03-04 15:02
27858 Re:我的缓存思路和源代码(Java版的,含测试JSP) bluepure 293 2008-07-26 16:23
27562 Re:我的缓存思路和源代码(Java版的,含测试JSP) jameszhang 10 2008-07-27 16:40
27533 Re:我的缓存思路和源代码(Java版的,含测试JSP) floater 2161 2008-07-29 02:17
28245 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 18 2008-07-29 11:50
27456 Re:我的缓存思路和源代码(Java版的,含测试JSP) wes109 478 2008-07-29 17:27
28018 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 908 2008-07-29 18:57
27416 Re:我的缓存思路和源代码(Java版的,含测试JSP) jameszhang 134 2008-07-29 21:07
27916 Re:我的缓存思路和源代码(Java版的,含测试JSP) liuaike 114 2008-07-30 09:30

reply to postflat modethreaded modego to previous topicgo to next topicgo to back
  已读帖子
  新的帖子
  被删除的帖子
Jump to the top of page

   Powered by Jute Powerful Forum® Version Jute 1.5.6 Ent
Copyright © 2002-2021 Cjsdn Team. All Righits Reserved. 闽ICP备05005120号-1
客服电话 18559299278    客服信箱 714923@qq.com    客服QQ 714923