Java开发网 Java开发网
注册 | 登录 | 帮助 | 搜索 | 排行榜 | 发帖统计  

您没有登录

» Java开发网 » 高性能Java数据库缓存  

按打印兼容模式打印这个话题 打印话题    把这个话题寄给朋友 寄给朋友    该主题的所有更新都将Email到你的邮箱 订阅主题
reply to topicflat modethreaded modego to previous topicgo to next topicgo to back
作者 我的缓存思路和源代码(Java版的,含测试JSP)
liuaike



版主


发贴: 20
于 2008-07-24 13:48 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
说是支持1亿pv/天,也许有点夸张,但如果您能认真看完相信也不会让您失望。

如果大家真想支持我、支持中国人开源项目,请把该文贴到自己的博客中或者收藏本文,记得包含文档的下载地址!!!!!!!谢谢。

我说的系统主要是构建在hibernate之上的高效数据库缓存系统,其中包含了分布式解决方案,该系统已经应用在舍得网上了,没有发现大问题,本人也相信该系统已经足够强大,应付数百万IP/天的应用都不是问题,我这么说肯定有人会对此表示怀疑,其实系统到底能撑多少IP/天不在于系统本身而是在于使用该系统的人。

代码看上去很简单,其实却是两年经验的总结,整过过程也遇到了很多难点,最后一一解决了,所以也请各位珍惜他人的劳动成果。本系统非常简洁易用,主程序BaseManager.java不到1000行代码,用“精悍”来形容绝对不为过,1000行代码却包含了数据库对象的缓存、列表和长度的缓存、按字段散列缓存、update延时更新、自动清除列表缓存等功能,用它来实现像论坛、博客、校友录、交友社区等绝大部分应用网站都足够了。

我在理想状态下做了压力测试,在没有数据库操作的jsp页面(舍得网新首页)里可以完成2000多requests每秒(正常情况可能有1/1000的request有数据库查询,其余999/1000都是直接从缓存里读取),物品详情页每秒可完成3000多requests,纯静态html页面也只能完成7000多requests/秒,我对首页进行了三个小时的压力测试,完成了24850800个requests,java一点事都没有,内存没有上涨。按照2000个requests/秒算,一天按15小时计算,那么每天能完成3600*15*2000=1亿零8百万requests,当然这是理想状态,实际状态就算打一折,还能完成1000万pv/天,要知道,这只是一个普通1万3千块钱买的服务器,内存4G,CPU2个,LinuxAS4系统,apache2.0.63/resin2.1.17/jdk6.0的环境。

现在进入正题。。。。。。。。。。。。。。。。。。。。。。。。

为什么要用缓存?如果问这个问题说明你还是新手,数据库吞吐量毕竟有限,每秒读写5000次了不起了,如果不用缓存,假设一个页面有100个数据库操作,50个用户并发数据库就歇菜,这样最多能支撑的pv也就50*3600*15=270万,而且数据库服务器累得半死,搞不好什么时候就累死了。我的这套缓存系统比单独用memcached做缓存还要强大,相当于在memcached上再做了两级缓存,大家都知道memcached很强了,但是吞吐量还是有限,每秒20000次get和put当遇到超大规模的应用时还是会歇菜,本地HashMap每秒可执行上百万次put和get,在这上面损耗的性能几乎可以忽略不记了。温馨提示:能不用分布式的时候就不要用分布式,非用分布式的时候再考虑用memcached,我的缓存系统在这方面都已经实现了,改个配置就可以了,有兴趣的可以仔细测试测试!

一般数据库缓存在我看来包含四种。第一种:单个对象的缓存(一个对象就是数据库一行记录),对于单个对象的缓存,用HashMap就可以了,稍微复杂一点用LRU算法包装一个HashMap,再复杂一点的分布式用memcached即可,没什么太难的;第二种:列表缓存,就像论坛里帖子的列表;第三种:长度的缓存,比如一个论坛板块里有多少个帖子,这样才方便实现分页。第四种:复杂一点的group,sum,count查询,比如一个论坛里按点击数排名的最HOT的帖子列表。第一种比较好实现,后面三种比较困难,似乎没有通用的解决办法,我暂时以列表缓存(第二种)为例分析。

mysql和hibernate的底层在做通用的列表缓存时都是根据查询条件把列表结果缓存起来,但是只要该表的记录有任何变化(增加/删除/修改),列表缓存要全部清除,这样只要一个表的记录经常变化(通常情况都会这样),列表缓存几乎失效,命中率太低了。

本人想了一个办法改善了列表缓存,当表的记录有改变时,遍历所有列表缓存,只有那些被影响到的列表缓存才会被删除,而不是直接清除所有列表缓存,比如在一个论坛版(id=1)里增加了一个帖子,那么只要清除id=1这个版对应的列表缓存就可以了,版id=2就不用清除了。这样处理有个好处,可以缓存各种查询条件(如等于、大于、不等于、小于)的列表缓存,但也有个潜在的性能问题,由于需要遍历,CPU符合比较大,如果列表缓存最大长度设置成10000,两个4核的CPU每秒也只能遍历完300多次,这样如果每秒有超过300个insert/update/delete,系统就吃不消了。

在前面两种解决办法都不完美的情况下,本人和同事经过几个星期的思索,总算得出了根据表的某几个字段做散列的缓存办法,这种办法无需大规模遍历,所以CPU符合非常小,由于这种列表缓存按照字段做了散列,所以命中率极高。思路如下:每个表有3个缓存Map(key=value键值对),第一个Map是对象缓存A,在A中,key是数据库的id,Value是数据库对象(也就是一行数据);第二个Map是通用列表缓存B,B的最大长度一般1000左右,在B中,key是查询条件拼出来的String(如start=0,length=15#active=0#state=0),Value是该条件查询下的所有id组成的List;第三个Map是散列缓存C,在C中,key是散列的字段(如根据userId散列的话,其中某个key就是userId=109这样的String)组成的String,value是一个和B类似的HashMap。其中只有B这个Map是需要遍历的,不知道说明白了没有,看完小面这个例子应该就明白了,就用论坛的回复表作说明,假设回复表T中假设有字段id,topicId,postUserId等字段(topicId就是帖子的id,postUserId是发布者id)。

第一种情况,也是最常用的情况,就是获取一个帖子对应的回复,sql语句应该是象
select id from T where topicId=2008 order by createTime desc limit 0,5
select id from T where topicId=2008 order by createTime desc limit 5,5
select id from T where topicId=2008 order by createTime desc limit 10,5
的样子,那么这种列表很显然用topicId做散列是最好的,把上面三个列表缓存(可以是N个)都散列到key是topicId=2008这一个Map中,当id是2008的帖子有新的回复时,系统自动把key是topicId=2008的散列Map清除即可。由于这种散列不需要遍历,因此可以设置成很大,例如100000,这样10万个帖子对应的所有回复列表都可以缓存起来,当有一个帖子有新的回复时,其余99999个帖子对应的回复列表都不会动,缓存的命中率极高。

第二种情况,就是后台需要显示最新的回复,sql语句应该是象
select id from T order by createTime desc limit 0,50
的样子,这种情况不需要散列,因为后台不可能有太多人访问,常用列表也不会太多,所以直接放到通用列表缓存B中即可。

第三种情况,获取一个用户的回复,sql语句象
select id from T where userId=2046 order by createTime desc limit 0,15
select id from T where userId=2046 order by createTime desc limit 15,15
select id from T where userId=2046 order by createTime desc limit 30,15
的样子,那么这种列表和第一种情况类似,用userId做散列即可。

第四种情况,获取一个用户对某个帖子的回复,sql语句象
select id from T where topicId=2008 and userId=2046 order by createTime desc limit 0,15
select id from T where topicId=2008 and userId=2046 order by createTime desc limit 15,15
的样子,这种情况比较少见,一般以topicId=2008为准,也放到key是topicId=2008这个散列Map里即可。

总结:这种缓存思路可以存储大规模的列表,缓存命中率极高,因此可以承受超大规模的应用,但是需要技术人员根据自身业务逻辑来配置需要做散列的字段,一般用一个表的索引键做散列(注意顺序,最散的字段放前面),假设以userId为例,可以存储N个用户的M种列表,如果某个用户的相关数据发生变化,其余N-1个用户的列表缓存纹丝不动。以上说明的都是如何缓存列表,缓存长度和缓存列表思路完全一样,如缓存象select count( * ) from T where topicId=2008这样的长度,也是放到topicId=2008这个散列Map中。如果再配合好使用mysql的内存表和memcached,加上F5设备做分布式负载均衡,该系统对付像1000万IP/天这种规模级的应用都足够了,除搜索引擎外一般的应用网站到不了这种规模。

再次申明:系统到底是不是强大不在系统本身而在于使用该系统的人!!!

这个缓存系统是我和同事几年经验的总结,看似简单,其实也没那么简单,把它作为开源有下面几个目的:第一,真的希望有很多人能用它;第二:希望更多的人能够完善和改进它;第三:希望大家能聚到一起为通用高效数据库缓存构架作出贡献,毕竟,数据库操作是各种应用最常用的操作,也是最容易产生性能瓶颈的地方。

Zip包中包含了配置方法和测试用的jsp,只要把它配置成一个web应用就可以快速调试并看到缓存的力量了,文档和下载地址是http://shedewang.com/akaladocs/api/com/akala/dbcache/core/BaseManager.html

配置说明文件在docs/开始配置.txt里有说明。

最后啰嗦一句,如果大家真想支持我、支持中国人开源项目,请把该文贴到自己的博客中或者收藏本文,记得包含文档的下载地址!!!!!!!谢谢。thank you and Good luck。

可以直接看附件的word文档,有表格,更容易理解。

QQ群:24561583

缓存思路.doc (61.5k)


liuaike edited on 2008-07-24 13:50


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
ababcc





发贴: 1
于 2008-07-25 12:16 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
能不能讲讲哪儿有实际应用的例子?



都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:ababcc]
liuaike



版主


发贴: 20
于 2008-07-26 11:06 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
ababcc wrote:
能不能讲讲哪儿有实际应用的例子?

下载源代码,配置好后就有jsp例子,你也可以参考http://shedewang.com,像这种网站首页是纯动态jsp的,但没有一个数据库操作。有张图,可以帮助理解缓存逻辑

(缩略图,点击图片链接看原图)




都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
Breeze





发贴: 143
于 2008-07-26 11:56 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
想了解一下,舍得网前台用什么框架了没,还是直接用jsp/servlet取得spring的bean?能否讲下技术架构实现,呵呵

Breeze edited on 2008-07-26 11:59


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
bluepure

pureblue



发贴: 509
于 2008-07-26 16:23 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
看到了楼主在csdn发的帖子,也看了下面的一些回复。
不想在那边回复,在这里说两句吧。

大致看了以下楼主的代码,楼主的思路是比较正确的,其实很多很牛的系统,并不是用了什么牛x的框架或者牛x的组件就搞定的,更多的是靠架构师结合实际应用的情况进行定制优化。
对于互联网web应用,大多系统都是读多写少,缓存机制是提高性能的最简单最有效的方式这之一。
因此通过缓存机制,可以将系统的性能成百上千倍的提高。

在多核 CPU + linux 2.6 + jdk 1.6 的环境下,优化比较好的web应用,以一个分配2G内存的虚拟机来算,实际环境下,大约每天可承受500万的pv.


bluepure edited on 2008-07-26 16:27


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
jameszhang



CJSDN高级会员


发贴: 1594
于 2008-07-27 16:40 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
怎样进行事务处理回滚


"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-07-29 02:17 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
No intention to throw flares. Just a discussion:
The central point of this post is to use a composite key(user id + range) to increase the hit rate. I don't see the discussion on 分布式解决方案.

If I understand the distributed cache correctly, the following are the key points on a distributed cache solution:
1. clusted: all different JVMs carry the same content, replicated.
2. fault tolerant: >1 copies in the cache. Normally this feature and #1 are implemented as one, i.e., users can specify how many copies they want in the entire cache.
3. scalable: meaning, increase machines/JVMs to expand memory, not copies. Whether this is transparent to the users, i.e., whether users need to change code for new cache servers.
4. network protocols: UDP/TCP
5. transactions
6. backup to files/databases asynchronously.
7. API for other languages, JDBC/ODBC drivers
8. SQL language manipulations.
9. near cache/far cache memory management.
10. locking
11. distributed event handling, such as JMS. This is essential for intersystem updates.

Another term is data grid.

Another thought is that a cache component should be a reusable component. On the user API side, there should be only a handful methods, such as get/set, getAllkeys, lock, unlock, etc. On the config side, network, backup copies, memory size, etc.

From users' perspective, a cache should be just a large memory chunk, doesn't matter where it sits.

The composite key idea has been around for more than 15 years. The credit card number has 19 digits internally, they have to use this technique to decompose the huge table to many small tables so that they can quickly locate your record in millis. It's working well.

In my experience, the best solution so far, based on the above conditions, is still Tangosol, of course, this doesn't mean it has all the listed features, but it's close enough, some of the features can be added from outside. I've been using >30 fields composite key to locate data. From what I heard, this has been the case in most big financial firms since the data is huge - size does matter and it gets ugly very quickly.

Just some of experience to share.



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
liuaike



版主


发贴: 20
于 2008-07-29 11:50 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
英文看不懂,前台就是用纯JSP做的。



都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
wes109

以梦为马

CJSDN高级会员


发贴: 857
于 2008-07-29 17:27 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
1、同意floater的说法,这根本就不是一个分布式缓存解决方案
分布式是一个很复杂的话题,floater列出了11项,同时满足这些需求的系统貌似还不存在。

memcached也不是一个完整意义上的分布式缓存系统,因为你知道,memcached server节点之间是不会相互复制数据的。仅仅是,这个server失效了,我还可以用另一个,但数据就需要去db或者其他地方重新获取了。

2、数据会丢失
楼主用Runtime hook的方式保存没有同步到数据的数据,是非常危险的,如果server异常退出,数据就会丢失,这在重要系统中是不能容忍的。

3、事务的问题。和小james同问

4、楼主要尽量低调呀。
楼主在如何cache和如何更新cache上的确作出了一些努力,我们是要肯定的。可以兴奋,但不要激动哦。像“系统到底是不是强大不在系统本身而在于使用该系统的人!!!”,或者什么“支持中国人开源项目”之类的话不说也罢。不止source要open。。。

另外,还是支持楼主的,楼主的作品在一定的应用场景下还是有很大用场的。





都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
liuaike



版主


发贴: 20
于 2008-07-29 18:57 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
楼上说的很对,对于像wes109 和floater这样看了系统并且提出的建议的我是非常接受的。的确,该系统用来做安全级别比较高的应用是不够的。一般来说,安全级别比较高的应用也不必要用缓存,像银行系统,几乎没有公共显示区,也没有太多公共列表,没什么好缓存的,直接用数据库的事务处理会更安全。

floater 说的11点,我再仔细看了看,我觉得奇怪的是能看懂我的文章为什么用英文回复。

第一:集群,系统是支持一点的,一个JVM一条数据库记录只有一个instance,多个JVM之间通过UDP通知来同步,中央缓存用memcached。

第二:不支持,一个JVM系统同一条记录只有一个instance,如果有多个Instance问题才会变得复杂,如果用户需要多个instance,可以用Java的cloneable来实现。

第三:扩展性,转到其他缓存系统上,暂时没做那么复杂,如果非要用其他缓存系统,改两个方法就可以。

第四:网络协议,没太明白和分布式有什么关系,我的缓存同步是用UDP通知,hibernate的分布式似乎也是这么做的。

第五:事务处理,暂时不支持。安全级别比较高的系统不建议用缓存。

第六:异步备份,没太明白这和数据库底层有什么关系。

第七:其他驱动程序,这个是支持的,修改hibernate配置文件即可。

第八:sql语言操作,不支持,否则就和缓存不同步了。

第九:cache 内存管理,没太明白。

第十:锁,不支持这种东西。

第十一:分布式event处理,不支持。

综合评价:floater先生说的要求没有哪个系统能做到,hibernate不行,ejb也不行吧?这是一个完美的方案,如果哪个系统能把这些问题都解决了,性能估计也就上不去了,所以为什么mysql3.23,mysql4.1,mysql5.0的速度越来越慢(相同硬件环境测试)。

估计我这么一说也没人敢用我的系统了,呵呵,其实我这个系统也不弱,做论坛、SNS、博客性质的网站足够了。我提供的其实也只是一个缓存思路,说不定以后有闲功夫也可以进化到支持事务处理。




都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
jameszhang



CJSDN高级会员


发贴: 1594
于 2008-07-29 21:07 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
还是float 老大专业,呵呵,把 这11项搞清楚,才可以说:
高效数据库缓存系统,其中包含了分布式解决方案

另外 ,ejb 不行,sun 就别混了 ,呵呵

其实 这种东西 非常又实用价值,但不是很简单,希望楼主,一定要深入研究


jameszhang edited on 2008-07-29 21:14

"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
liuaike



版主


发贴: 20
于 2008-07-30 09:30 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
其实,解决问题有很多种方式,看哪样最简单,有太多的人喜欢把简单的事情复杂化,也有很多人喜欢把复杂的事情看得太简单,工作也好、做老板也好、做生意的也好。EJB那么牛的话,那就完全不用考虑其他解决方案了。

我会深入研究的^_^




都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
wes109

以梦为马

CJSDN高级会员


发贴: 857
于 2008-07-30 10:11 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
我倒是不建议楼主深入研究什么gp分布式缓存系统

任何技术或者解决方案都有其适用范围和生存空间,你只需要专注于你care的领域就可以了,然后把它做到最好

所以,我建议楼主在介绍自己的“宝贝”之前,先介绍一下问题域会更好一些,面向什么类型的系统?解决了什么问题?目前还存在什么问题?等等,免得人家老拿大象级的解决方案来说事,哈哈。





都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:wes109]
jameszhang



CJSDN高级会员


发贴: 1594
于 2008-07-30 19:12 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
wes109 wrote:
我倒是不建议楼主深入研究什么gp分布式缓存系统

任何技术或者解决方案都有其适用范围和生存空间,你只需要专注于你care的领域就可以了,然后把它做到最好

所以,我建议楼主在介绍自己的“宝贝”之前,先介绍一下问题域会更好一些,面向什么类型的系统?解决了什么问题?目前还存在什么问题?等等,免得人家老拿大象级的解决方案来说事,哈哈。

你不是也看到了他说的
我说的系统主要是构建在hibernate之上的高效数据库缓存系统,其中包含了分布式解决方案,该系统已经应用在舍得网上了 .......


不然 floater 也不会跟他说那么多啊!英文也不好敲啊!呵呵
另外说是 数据库缓存系统 连事务都不支持?还是叫做 数据缓存 为好


jameszhang edited on 2008-07-30 19:16

"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"


都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-07-31 02:28 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
was kind of in a rush at work, so just threw some highlights there.

Try to expand a little bit.
Problem: to improve performance, reduce the bottleneck around databases.
Open source solutions:
JCS, Terracotta, OSCache, EHCache, WhirlyCache, JCache, SwarmCache, JBoss Cache, memcached, etc.

Commercial solutions: Tangosol, Gigaspaces, etc
Other candidates: in memory database, sybase RAP, but is not distributed.



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-07-31 03:32 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
There are two aspects on this subjects:
1. being distributed: across the JVM boundary to form a larger, unified system.
2. data management: how to manage the data across the JVM bounday.
Each of these is already complicated enough. When combining them together, it could go only worse.

I am trying to expand more, with either a concern or a requirement, to provide some background explanation.

The 11 points that I posted are not my imagination, or copied from textbooks. But they are from my experience at work, reading on the internet, tryout of some APIs, requirements from work, etc. I think there are more points, but they are beyond my experience and expertise. If you want to go beyond these points, you need to consult more experienced, specialized engineers, such as folks working at Tangosol.

1. clustered: another term is data coherence. We need to duplicate data across JVMs with n copies, where n can be specified by end users. Otherwise we can't sustain a single JVM failure. JBoss, etc is using JGroup to do this. The tricky part is that while we duplicate data, we need to minimize the network traffic, otherwise the performance is terrible. Terracotta is sending delta(on the change, either new or updated) across the network. I tested with Tangosol before, its performance is great(xMB/second, x is confidential).

2. scalable: adding new machines to expand the total memory of the entire cache. All the open source solutions do not have this feature. Another term associated with this is data grid. The idea is that we have so many cheap machines and if we can hook them together, we can form a huge usable fast memory cache(100 times faster than disks). The hard part to implement this is how to determine which machine to store values. Normally they come up some kind of algorithm taking the keys to hash to a different key to decide which machine to go. This is where the cache providers get paid big time. I know a lot of companies are using hundreds of GB, and they are still growing. TB is not something is the far futures, likely to be next year.

3. It would be better to support both UDP/TCP because UDP is easier when adding new machines, but sometimes UDP is just not available due to company policies. So it would be better to use TCP for testing and UDP for production.
Commercial solutions always have both. JBoss is using JGroup and thus have both too, but others are lazy to include TCP.

4. Transactions is very very crucial because of the data integrity. However, this is related to the data granularity and how atomic the set/get methods are. Global transaction(such as JDBC + JMS) is another reason for this.

5. back up to the second storage: most of the cache implementations have this feature. My reason for this is that I need to maintain certain number of days of data because of the business requirements. They could be regulatory, back testing, etc. Back up could be asynchronous, multi-threaded.

6. Some of the cache implementations can have ODBC/JDBC drivers, and they can be treated as a relational cache, meaning using sql language to search through the value objects. These caches are termed data stores. One simply way to implement them is to use javacc write a sql parser and treat tables/columns as java objects/attributes and then run this parser across the entire cache. Since everything is in memory, a trivial implementation is not slow, though could be faster. memcached has a lot of apis for different languages. All commercial solutions have java/.net/c interfaces. Since a lot of financial applications have .net/excel interface and java backend.

7. The near/far cache management: JBoss has the TcpCacheLoader class. My scenario is that I have some memory hungry process, by itselt it will take nearly all the JVM memory, so I can't spend extra on cache in the same thread. I want to throw my results to the external cache, external to my current process. Most of the implementations will take some memory from the current process and join the cache cluster. They don't have the option to externalize the entire cache. Tangosol has an option to specify how much memory you want to give to the near cache(the cache setting in the current process). This is a vital feature in the distributed environment because most of the time when we go to distributed is due to either short of memory or long running time(in this case we go grid computing), and in either case, we need to collect the result back to a central location, either a database, or a cache.

8. Lock is the minimal requirement for concurrent access. In a distributed environment, we need more than a lock, much more than that because we have not only concurrent access, but also distributed access. So we will have deadlock, starvation, etc.

9. distributed event handling through JMS. It's kind of surprise that no implementation exception 1 does this. I encounter four times in the last two years where JMS + cache would solve a hell of problems, and yet kind of ironic. This is not a new idea. Rod Johnson implemented in his first book source code, just copy and paste. But this feature is just as powerful as its nondistributed version. However, this can be added outside caches as add-ons. But a built-in would be a timesaver.

Tangosol has all but last one. And the performance is good in my view. Another one is gigaspaces, but I don't like their salefolks and talks. I talked to both companies sale folks before.



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-07-31 03:53 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
I can't type in chinese, this is already asked thousands of times. I can read chinese, both simplified and traditional. But constantly pronounce them wrong, and my wife laughs at me(and my daughter laughs at my english too, what am I doing? Smile).

EJB and hibernate are just using cache at most, they have nothing to do with cache implementation.

I happened to work in a distributed environment for the last 3 years, accumulated some experience. But the internet is still a greater place to learn. There are so many nice folks, including folks in here, who are willing to share the knowledge and experience. Did you cuil today? Smile

Keep it simple, but not simpler - Albert Einstein



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
liuaike



版主


发贴: 20
于 2008-07-31 10:04 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
感谢floater先生的精彩点评。有时间的话我会改进我的系统,比如说事务处理和容错处理,毕竟是1.0版本^_^。



都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
haibo



CJSDN高级会员


发贴: 322
于 2008-08-01 10:00 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
很好,我现在维护的站点就是每天超过2,3百万的pv,什么缓存框架也没用,最土的方法----load数据到hashmap,3台前置机,1台hp6850,2台del,其中有一步骤需要多表连查,最后被我们公司的牛人扩展了一下hashmap的java实现,将多表连查的数据写成文件,每次都读取文件,,,现在有一点让我头痛的就是那两台dell前置机的loadbalance 老是下不来,一直在1以上,cpu占用率也不高于90%,hp那台就好点,,不过每次pv都有3个 socket操作,,我在想是不是由于这个造成的,顺便问问各位有没有寻找java/jsp下的机器loadbalance 下不来的经验,给俺介绍一下




都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
floater

Java Jedi

总版主


发贴: 3233
于 2008-08-03 02:58 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
maybe slow network?

maybe hashcode not balanced?



"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:floater]
haibo



CJSDN高级会员


发贴: 322
于 2008-08-04 10:43 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
floater wrote:
maybe slow network?

maybe hashcode not balanced?


令我头痛的事,,下边就是top后Shift H 后的结果,,debian linux,
应该不是网络的问题,,因为我看apache延时超过10妙的很少,

top - 10:25:10 up 310 days, 14:39, 1 user, load average: 1.61, 2.01, 1.96
Tasks: 851 total, 3 running, 847 sleeping, 0 stopped, 1 zombie
CpuMoon: 11.4%us, 0.6%sy, 0.0%ni, 87.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 16433220k total, 12911932k used, 3521288k free, 158728k buffers
Swap: 1951888k total, 48k used, 1951840k free, 4562588k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18996 sms 18 0 3735m 2.0g 46m S 53 12.5 8383:25 java
3144 sms 15 0 130m 13m 796 S 7 0.1 0:02.21 httpd
28960 sms 15 0 245m 33m 812 S 6 0.2 1:16.81 httpd
30492 sms 15 0 1023m 165m 1396 S 5 1.0 9:19.39 httpd
16878 sms 15 0 476m 73m 1392 S 5 0.5 3:42.12 httpd
23049 sms 15 0 344m 50m 1396 R 5 0.3 2:20.75 httpd
1843 sms 15 0 152m 17m 812 S 5 0.1 0:16.52 httpd
22690 sms 15 0 353m 52m 1392 S 5 0.3 2:25.94 httpd
23261 sms 15 0 336m 49m 1396 S 5 0.3 2:15.12 httpd
2564 sms 15 0 138m 15m 800 S 4 0.1 0:07.50 httpd
18610 sms 15 0 437m 66m 1396 S 4 0.4 3:17.87 httpd
22686 sms 15 0 350m 52m 1396 S 4 0.3 2:23.86 httpd
25431 sms 15 0 307m 44m 1396 S 4 0.3 1:57.44 httpd
26398 sms 15 0 286m 40m 812 S 4 0.3 1:43.56 httpd
28723 sms 15 0 1063m 172m 1396 S 4 1.1 9:45.60 httpd
3354 sms 18 0 127m 13m 768 S 4 0.1 0:00.12 httpd
12019 sms 15 0 669m 105m 1396 S 4 0.7 5:37.15 httpd
25364 sms 15 0 300m 43m 1392 S 4 0.3 1:51.99 httpd
31076 sms 15 0 203m 27m 1392 S 4 0.2 0:49.51 httpd
3205 sms 15 0 3735m 2.0g 46m S 3 12.5 0:00.18 java
3256 sms 15 0 0 0 0 Z 3 0.0 0:00.84 httpd <defunct>
18519 sms 15 0 430m 65m 1396 S 3 0.4 3:13.08 httpd
18805 sms 15 0 420m 63m 1396 S 3 0.4 3:07.44 httpd
20240 sms 15 0 404m 61m 1392 S 3 0.4 2:58.74 httpd
2806 sms 15 0 134m 14m 808 S 2 0.1 0:04.72 httpd
2662 sms 15 0 3735m 2.0g 46m S 2 12.5 0:01.04 java
3222 sms 15 0 3735m 2.0g 46m S 2 12.5 0:00.19 java
15854 sms 16 0 521m 80m 1396 S 2 0.5 4:08.46 httpd
18518 sms 15 0 440m 67m 1396 S 2 0.4 3:19.58 httpd
22687 sms 15 0 346m 51m 1396 S 2 0.3 2:21.96 httpd
25679 sms 15 0 289m 41m 812 S 2 0.3 1:45.03 httpd
1700 sms 15 0 152m 18m 1392 S 2 0.1 0:16.16 httpd
2619 sms 15 0 137m 15m 800 S 2 0.1 0:06.47 httpd
1512 sms 15 0 3735m 2.0g 46m S 2 12.5 0:02.07 java
2692 sms 15 0 3735m 2.0g 46m S 2 12.5 0:00.88 java
26200 sms 15 0 287m 41m 1396 R 2 0.3 1:44.63 httpd
2929 sms 16 0 3735m 2.0g 46m S 1 12.5 0:00.52 java
3000 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.41 java
3219 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.08 java
3266 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.10 java
3198 sms 15 0 11256 1936 952 R 1 0.0 0:01.11 top
2296 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.47 java
2322 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.33 java
2334 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.19 java





都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
liuaike



版主


发贴: 20
于 2008-08-07 17:24 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
舍得网新版上线,使用修改后的缓存系统,尤其加上锁和长度缓存后,速度杠杠的。mysql的load average也比较低,平均0.2左右。



都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]
airport





发贴: 5
于 2009-03-04 15:02 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
缓存方案那么多,每个项目都需要实际情况进行分析的。


水无鱼至清

都2021年了,这是准备一年只发一个贴的节奏么?弄个微信群腐败活动起来啊!

reply to topicflat modethreaded modego to previous topicgo to next topicgo to back
  已读帖子
  新的帖子
  被删除的帖子
Jump to the top of page

   Powered by Jute Powerful Forum® Version Jute 1.5.6 Ent
Copyright © 2002-2021 Cjsdn Team. All Righits Reserved. 闽ICP备05005120号-1
客服电话 18559299278    客服信箱 714923@qq.com    客服QQ 714923